10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      Koch et al. describe a valuable novel methodology, SynSAC, to synchronise cells to analyse meiosis I or meiosis II or mitotic metaphase in budding yeast. The authors present convincing data to validate abscisic acid-induced dimerisation to induce a synthetic spindle assembly checkpoint (SAC) arrest that will be of particular importance to analyse meiosis II. The authors use their approach to determine the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II that will be of interest to the broader meiosis research community.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system, but more work is needed to validate these results, particularly in normal cells.

      Overall, the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      Significance:

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner. Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      Significance:

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

    5. Author response:

      General Statements

      We are delighted that all reviewers found our manuscript to be a technical advance by providing a much sought after method to arrest budding yeast cells in metaphase of mitosis or both meiotic metaphases. The reviewers also valued our use of this system to make new discoveries in two areas. First, we provided evidence that the spindle checkpoint is intrinsically weaker in meiosis I and showed that this is due to PP1 phosphatase. Second, we determined how the composition and phosphorylation of the kinetochore changes during meiosis, providing key insights into kinetochore function and providing a rich dataset for future studies.

      The reviewers also made some extremely helpful suggestions to improve our manuscript, which we will now implement:

      (1) Improvements to the discussion throughout the manuscript. The reviewers recommended that we focus our discussion on the novel findings of the manuscript and drew out some key points of interest that deserve more attention. We fully agree with this and we will address this in a revised version.

      (2) We will add a new supplemental figure to help interpret the mass spectrometry data, to address Reviewer #3, point 4.

      (3) We are currently performing an additional control experiment to address the minor point 1 from reviewer #3. Our experiment to confirm that SynSAC relies on endogenous checkpoint proteins was missing the cell cycle profile of cells where SynSAC was not induced for comparison. We will add this control to our full revision.

      (4) In our full revision we will also include representative images of spindle morphology as requested by Reviewer #1, point 2

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is that it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division. Overall, I have only a few minor suggestions.

      We appreciate the reviewers’ support of our study.

      (1) In wild-type - Pds1 levels are high during M1 and A1, but low in MII. Can the authors comment on this? In line 217, what is meant by "slightly attenuated? Can the authors comment on how anaphase occurs in presence of high Pds1? There is even a low but significant level in MII.

      The higher levels of Pds1 in meiosis I compared to meiosis II has been observed previously using immunofluorescence and live imaging[1–3]. Although the reasons are not completely clear, we speculate that there is insufficient time between the two divisions to re-accumulate Pds1 prior to separase re-activation.

      We agree “slightly attenuated” was confusing and we have re-worded this sentence to read “Addition ABA at the time of prophase release resulted in Pds1securin stabilisation throughout the time course, consistent with delays in both metaphase I and II”.

      We do not believe that either anaphase I or II occur in the presence of high Pds1. Western blotting represents the amount of Pds1 in the population of cells at a given time point. The time between meiosis I and II is very short even when treated with ABA. For example, in Figure 2B, spindle morphology counts show that the anaphase I peak is around 40% at its maxima (105 min) and around 40% of cells are in either metaphase I or metaphase II, and will be Pds1 positive. In contrast, due to the better efficiency of meiosis II, anaphase II hardly occurs at all in these conditions, since anaphase II spindles (and the second nuclear division) are observed at very low frequency (maximum 10%) from 165 minutes onwards. Instead, metaphase II spindles partially or fully breakdown, without undergoing anaphase extension. Taking Pds1 levels from the western blot and the spindle data together leads to the conclusion that at the end of the time-course, these cells are biochemically in metaphase II, but unable to maintain a robust spindle. Spindle collapse is also observed in other situations where meiotic exit fails, and potentially reflects an uncoupling of the cell cycle from the programme governing gamete differentiation[3–5]. We will explain this point in a revised version while referring to representative images that from evidence for this, as also requested by the reviewer below.

      (2) The figures with data characterizing the system are mostly graphs showing time course of MI and MII. There is no cytology, which is a little surprising since the stage is determined by spindle morphology. It would help to see sample sizes (ie. In the Figure legends) and also representative images. It would also be nice to see images comparing the same stage in the SynSAC cells versus normal cells. Are there any differences in the morphology of the spindles or chromosomes when in the SynSAC system?

      This is an excellent suggestion and will also help clarify the point above. We will provide images of cells at the different stages. For each timepoint, 100 cells were scored. We have already included this information in the figure legends 

      (3) A possible criticism of this system could be that the SAC signal promoting arrest is not coming from the kinetochore. Are there any possible consequences of this? In vertebrate cells, the RZZ complex streams off the kinetochore. Yeast don't have RZZ but this is an example of something that is SAC dependent and happens at the kinetochore. Can the authors discuss possible limitations such as this? Does the inhibition of the APC effect the native kinetochores? This could be good or bad. A bad possibility is that the cell is behaving as if it is in MII, but the kinetochores have made their microtubule attachments and behave as if in anaphase.

      In our view, the fact that SynSAC does not come from kinetochores is a major advantage as this allows the study of the kinetochore in an unperturbed state. It is also important to note that the canonical checkpoint components are all still present in the SynSAC strains, and perturbations in kinetochore-microtubule interactions would be expected to mount a kinetochore-driven checkpoint response as normal. Indeed, it would be interesting in future work to understand how disrupting kinetochore-microtubule attachments alters kinetochore composition (presumably checkpoint proteins will be recruited) and phosphorylation but this is beyond the scope of this work. In terms of the state at which we are arresting cells – this is a true metaphase because cohesion has not been lost but kinetochore-microtubule attachments have been established. This is evident from the enrichment of microtubule regulators but not checkpoint proteins in the kinetochore purifications from metaphase I and II. While this state is expected to occur only transiently in yeast, since the establishment of proper kinetochore-microtubule attachments triggers anaphase onset, the ability to capture this properly bioriented state will be extremely informative for future studies. We appreciate the reviewers’ insight in highlighting these interesting discussion points which we will include in a revised version.

      Reviewer #1 (Significance):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

      We appreciate the reviewer’s enthusiasm for our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so-named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      I have only a couple of minor comments:

      (1) I would add the Suppl Figure 1A to main Figure 1A. What is really exciting here is the arrest in metaphase II, so I don't understand why the authors characterize metaphase I in the main figure, but not metaphase II. But this is only a suggestion.

      This is a good suggestion, we will do this in our full revision.

      (2) Line 197, the authors state: “...SyncSACinduced a more pronounced delay in metaphase II than in metaphase I”. However, line 229 and 240 the authors talk about a "longer delay in metaphase <i compared to metaphase II"... this seems to be a mix-up.

      Thank you for pointing this out, this is indeed a typo and we have corrected it.

      (3) The authors describe striking differences for both protein abundance and phosphorylation for key kinetochore associated proteins. I found one very interesting protein that seems to be very abundant and phosphorylated in metaphase I but not metaphase II, namely Sgo1. Do the authors think that Sgo1 is not required in metaphase II anymore? (Top hit in suppl Fig 8D).

      This is indeed an interesting observation, which we plan to investigate as part of another study in the future. Indeed, data from mouse indicates that shugoshin-dependent cohesin deprotection is already absent in meiosis II in mouse oocytes[6], though whether this is also true in yeast is not known. Furthermore, this does not rule out other functions of Sgo1 in meiosis II (for example promoting biorientation). We will include this point in the discussion.

      Reviewer #2 (Significance):

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner.

      We are grateful to the reviewer for their support.

      Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      For many purposes the enrichment and extended time for sample collection is sufficient, as we demonstrate here. However, as pointed out by the reviewer below, the system can be improved by use of the 4A-RASA mutations to provide a stronger arrest (see our response below). We did not experiment with higher ABA concentrations or repeated addition since the very robust arrest achieved with the 4A-RASA mutant deemed this unnecessary.

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      We agree that the 4A-RASA mutant is the best tool to use for the arrest and going forward this will be our approach. We collected the proteomics data and the data on the SynSAC mutant variants concurrently, so we did not know about the improved arrest at the time the proteomics experiment was done. Because very good arrest was already achieved with the unmutated SynSAC construct, we could not justify repeating the proteomics experiment which is a large amount of work using significant resources. However, we will highlight the potential of the 4A-RASA mutant more prominently in our full revision.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      We agree these are intriguing findings that highlight key differences as to the wiring of the spindle checkpoint in meiosis and mitosis and potential for future studies, however, currently we can only speculate as to the underlying cause. The effect of the RASA mutation in mitosis is unexpected and unexplained. However, the fact that the 4A-RASA mutation causes a stronger delay in meiosis I compared to mitosis can be explained by a greater prominence of PP1 phosphatase in meiosis. Indeed, our data (Figure 4A) show that the PP1 phosphatase Glc7 and its regulatory subunit Fin1 are highly enriched on kinetochores at all meiotic stages compared to mitosis.

      We agree that the improved growth of the RVAF mutant is intriguing and points to a role of Aurora B-mediated phosphorylation, though previous work has not supported such a role [7].

      We will include a discussion of these important points in a revised version.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      While we agree with the reviewer that at first glance, normalising to no tag appears to be the most appropriate normalisation, in practice there is very low background signal in the no tag sample which means that any random fluctuations have a big impact on the final fold change used for normalisation. This approach therefore introduces artefacts into the data rather than improving normalisation.

      To provide reassurance that our kinetochore immunoprecipitations are specific, and that the background (no tag) signal is indeed very low, we will provide a new supplemental figure showing the volcanos comparing kinetochore purifications at each stage with their corresponding no tag control.

      It is also important to note that our experiment looks at relative changes of the same protein over time, which we expect to be relatively small in the whole cell lysate. We previously documented proteins that change in abundance in whole cell lysates throughout meiosis[8]. In this study, we found that relatively few proteins significantly change in abundance.

      Our aim in the current study was to understand how the relative composition of the kinetochore changes and for this, we believe that a direct comparison to Dsn1, a central kinetochore protein which we immunoprecipitated is the most appropriate normalisation.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      We strongly agree with this point and we will re-frame the discussion to focus on the novel findings, as also raised by the other reviewers.

      Finally, minor concerns are:

      (1) Meiotic progression in SynSAC strains lacking Mad1, Mad2 or Mad3 is severely affected (Fig. 1D and Supp. Fig. 1), making it difficult to assess whether, as the authors state, the metaphase delays depend on the canonical SAC cascade. In addition, as a general note, graphs displaying meiotic time courses could be improved for clarity (e.g., thinner data lines, addition of axis gridlines and external tick marks, etc.).

      We will generate the data to include a checkpoint mutant +/- ABA for direct comparison. We will take steps to improve the clarity of presentation of the meiotic timecourse graphs, though our experience is that uncluttered graphs make it easier to compare trends.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (3) It is surprising that, although SAC activity is proposed to be weaker in metaphase I, the levels of CPC/SAC proteins seem to be higher at this stage of meiosis than in metaphase II or mitotic metaphase (Fig. 4A-B).

      We agree, this is surprising and we will point this out in the revised discussion. We speculate that the challenge in biorienting homologs which are held together by chiasmata, rather than back-to-back kinetochores results in a greater requirement for error correction in meiosis I. Interestingly, the data with the RASA mutant also point to increased PP1 activity in meiosis I, and we additionally observed increased levels of PP1 (Glc7 and Fin1) on meiotic kinetochores, consistent with the idea that cycles of error correction and silencing are elevated in meiosis I.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (5) Several typographical errors should be corrected (e.g., "Knetochores" in Fig. 4 legend, "250uM ABA" in Supp. Fig. 1 legend, etc.)

      Thank you for pointing these out, they have been corrected.

      Reviewer #3 (Significance):

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have only corrected minor typos as detailed above.

      Description of analyses that authors prefer not to carry out

      The revisions we plan are detailed above. There are just two revisions we believe are either unnecessary or beyond the scope, both minor concerns of Reviewer #3. For clarity we have reproduced them, along with our justification below. In the latter case, the reviewer also acknowledged that further work in this direction is beyond the scope of the current study.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (1) Salah, S.M., and Nasmyth, K. (2000). Destruction of the securin Pds1p occurs at the onset of anaphase during both meiotic divisions in yeast. Chromosoma 109, 27–34.

      (2) Matos, J., Lipp, J.J., Bogdanova, A., Guillot, S., Okaz, E., Junqueira, M., Shevchenko, A., and Zachariae, W. (2008). Dbf4-dependent CDC7 kinase links DNA replication to the segregation of homologous chromosomes in meiosis I. Cell 135, 662–678.

      (3) Marston, A.L.A.L., Lee, B.H.B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Developmental cell 4, 711–726. https://doi.org/10.1016/S1534-5807(03)00130-8.

      (4) Attner, M.A., and Amon, A. (2012). Control of the mitotic exit network during meiosis. Molecular Biology of the Cell 23, 3122–3132. https://doi.org/10.1091/mbc.E12-03-0235.

      (5) Pablo-Hernando, M.E., Arnaiz-Pita, Y., Nakanishi, H., Dawson, D., del Rey, F., Neiman, A.M., and de Aldana, C.R.V. (2007). Cdc15 Is Required for Spore Morphogenesis Independently of Cdc14 in Saccharomyces cerevisiae. Genetics 177, 281–293. https://doi.org/10.1534/genetics.107.076133.

      (6) El Jailani, S., Cladière, D., Nikalayevich, E., Touati, S.A., Chesnokova, V., Melmed, S., Buffin, E., and Wassmann, K. (2025). Eliminating separase inhibition reveals absence of robust cohesin protection in oocyte metaphase II. EMBO J 44, 5187–5214. https://doi.org/10.1038/s44318-025-00522-0.

      (7) Rosenberg, J.S., Cross, F.R., and Funabiki, H. (2011). KNL1/Spc105 Recruits PP1 to Silence the Spindle Assembly Checkpoint. Current Biology 21, 942–947. https://doi.org/10.1016/j.cub.2011.04.011.

      (8) Koch, L.B., Spanos, C., Kelly, V., Ly, T., and Marston, A.L. (2024). Rewiring of the phosphoproteome executes two meiotic divisions in budding yeast. EMBO J 43, 1351–1383. https://doi.org/10.1038/s44318-024-00059-8.

    1. eLife Assessment

      This work offers important insights into the protein CHD4's function in chromatin remodeling and gene regulation in embryonic stem cells, supported by extensive biochemical, genomic, and imaging data. The use of an inducible degron system allows precise functional analysis, and the datasets generated represent a key resource for the field. The revised study offers compelling evidence and makes a significant contribution to understanding CHD4's role in epigenetic regulation. This work will be of interest to the epigenetics and stem biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed an elegant investigation to clarify the roles of CHD4 in chromatin accessibility and transcription regulation. In addition to the common mechanisms of action through nucleosome repositioning and opening of transcriptionally active regions, the authors considered here a new angle of CHD4 action through modulating the off rate of transcription factor binding. Their suggested scenario is that the action of CHD4 is context-dependent and is different for highly-active regions vs low-accessibility regions.

      Strengths:

      This is a very well-written paper that will be of interest to researchers working in this field. The authors performed large work with different types of NGS experiments and the corresponding computational analyses. The combination of biophysical measurements of the off-rate of protein-DNA binding with NGS experiments is particularly commendable.

      Comments on revised version:

      The authors have addressed all my points

    3. Reviewer #2 (Public review):

      This study leverages acute protein degradation of CHD4 to define its role in chromatin and gene regulation. Previous studies have relied on KO and/or RNA interference of this essential protein and as such are hampered by adaptation, cell population heterogeneity, cell proliferation and indirect effects. The authors have established an AID2-based method to rapidly deplete the dMi-2 remodeller to circumvent these problems. CHD4 is gone within an hour, well before any effects on cell cycle or cell viability can manifest. This represents an important technical advance that, for the first time, allows a comprehensive analysis of the immediate and direct effect of CHD4 loss of function on chromatin structure and gene regulation.

      Rapid CHD4 degradation is combined with ATAC-seq, CUT&RUN, (nascent) RNA-seq and single molecule microscopy to comprehensively characterise the impact on chromatin accessibility, histone modification, transcription and transcription factor (NANOG, SOX2, KLF4) binding in mouse ES cells.

      The data support the previously developed model that high levels of CHD4/NuRD maintain a degree of nucleosome density to limit TF binding at open regulatory regions (e.g. enhancers). The authors propose that CHD4 activity at these sites is an important prerequisite for enhancers to respond to novel signals that require an expanded or new set of TFs to bind.

      What I find even more exciting and entirely novel is the finding that CHD4 removes TFs from regions of limited accessibility to repress cryptic enhancers and to suppress spurious transcription. These regions are characterised by low CHD4 binding and have so far never been thoroughly analysed. The authors correctly point out that the general assumption that chromatin regulators act on regions where they seem to be concentrated (i.e. have high ChIP-seq signals) runs the risk of overlooking important functions elsewhere. This insight is highly relevant beyond the CHD4 field and will prompt other chromatin researchers to look into low level binding sites of chromatin regulators.

      The biochemical and genomic data presented in this study is of high quality (I cannot judge single microscopy experiments due to my lack of expertise). This is an important and timely study that is of great interest to the chromatin field.

      Comments on revised version:

      All my comments below have been addressed in the revised version of the manuscript.

      The revised manuscript provides a significant advance of our understanding of how the nucleosome remodeler CHD4 exerts its function. In particular, the findings suggest an intriguing role of CHD4 in TF removal at genomic regions where only low levels of CHD4 can be detected. In the future, it will be interesting to see if this activity is shared by other ATP-dependent nucleosome remodelers.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript an inducible degron approach is taken to investigate the function of the CHD4 chromatin remodelling complex. The cell lines and approaches used are well thought out and the data appear to be of high quality. They show that loss of CHD4 results in rapid changes to chromatin accessibility at thousands of sites. At the majority of locations where changes are detected, chromatin accessibility is decreased and these sites are strongly bound by CHD4 prior to activation of the degron and so likely represent primary sites of action. Somewhat surprisingly while chromatin accessibility is reduced at these sites transcription factor occupancy is little changed. Following CHD4 degradation occupancy of the key pluripotency transcription factors NANOG and SOX2 increases at many locations genome wide and at many of these sites chromatin accessibility increases. These represent important new insights into the function of CHD4 complexes.

      Strengths:

      The experimental approach is well suited to providing insight into a complex regulator such as CHD4. The data generated to characterise how cells respond to loss of CHD4 is of high quality. The study reveals major changes in transcription factor occupancy following CHD4 depletion.

      Weaknesses:

      The main weakness can be summarised as relating to the fact authors favour the interpretation that all rapid changes following CHD4 degradation occur as a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially very low (e.g sites where accessibility is gained, in comparison to that at sites where chromatin acdessibility is lost). The revised discussion acknowledges rapid indirect effects cannot be excluded.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level). 

      This is now expanded in the Discussion

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable. 

      We have normalised figure text as much as is feasible.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text. 

      We have removed references to these terms from the text and included a definition in the figure legend. 

      (4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG". 

      We have removed this panel as it was confusing and did not demonstrate any robust conclusion. 

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section. 

      We have included an explanation of the curve fitting equation in the Methods as suggested.

      The apparent dissociation rate observed is a sum of multiple rates of decay – true dissociation rate (k<sub>off</sub>), signal loss caused by photobleaching k<sub>pb</sub>, and signal loss caused by defocusing/tracking error (k<sub>tl</sub>).

      k<sub>off</sub><sup>app</sup> = k<sub>off</sub>+ k<sub>pb</sub> + k<sub>tl</sub>

      We are making conclusions about relative changes in k<sub>off</sub><sup>app</sup> upon CHD4 depletion, not about the absolute magnitude of true in k<sub>off</sub> or TF residence times.Our conclusions extend to true in k<sub>off</sub> on the assumption that k<sub>pb</sub> and k<sub>tl</sub> are equal across all samples imaged due to identical experimental conditions and analysis. k<sub>pb</sub> and k<sub>tl</sub> vary hugely across experimental set-ups, especially with different laser powers, so other k<sub>off</sub> or k<sub>off</sub><sup>app</sup> values reported in the literature would be expected to differ from ours. Time-lapse experiments or independent determination of k<sub>pb</sub> (and k<sub>tl</sub>) would be required to make any statements about absolute values of k<sub>off</sub>

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1). 

      We have now included a discussion of this point and referenced both papers.

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text. 

      We have endeavoured to define all relevant terms in the figure legends. 

      Reviewer #2 (Public review): 

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of folddifference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation. 

      The heatmap displays z-scores, meaning expression for each gene has been centred and scaled across the entire time course. As a result, time zero is not a true baseline, it simply shows whether the gene’s expression at that moment is above or below its own mean. A transition from blue to red therefore indicates that the gene increases relative to its overall average, which typically corresponds to upregulation, but it doesn’t directly represent fold-change from the 0-hour time point. We have now included a brief explanation of this in the figure legend to make this point clear.  

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)? 

      We have edited the text to more accurately reflect what is going on in the screen shot. We have also replaced “WT” with “0” as this more accurately reflects the status of these cells. 

      (3) There is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself. 

      We now include more speculation on this point in the Discussion.

      Reviewer #3 (Public review): 

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low. 

      We acknowledge that we cannot definitively say any effect is a direct consequence of CHD4 depletion and have mitigated statements in the Results and Discussion. 

      Reviewing Editor Comments: 

      I am pleased to say all three experts had very complementary and complimentary comments on your paper - congratulations. Reviewer 3 does suggest toning down a few interpretations, which I suggest would help focus the manuscript on its greater strengths. I encourage a quick revision to this point, which will not go back to reviewers, before you request a version of record. I would also like to take this opportunity to thank all three reviewers for excellent feedback on this paper. 

      As advised we have mitigated the points raised by the reviewers. 

      Reviewer #2 (Recommendations for the authors): 

      p9, top: The sentence starting with "Genes increasing in expression after four hours...." is very difficult to understand and should be rephrased or broken up. 

      We agree. This has been completely re-written. 

      Reviewer #3 (Recommendations for the authors): 

      Sites of increased chromatin accessibility emerge more slowly than sites of lost chromatin accessibility. Figure 1D, a little increase in accessibility at 30min, but a more noticeable decrease at 30min. The sites of increased accessibility also have lower absolute accessibility than observed at locations where accessibility is lost. This raises the possibility that the sites of increased accessibility represent rapid but indirect changes occurring following loss of CHD4. Consistent with this, enrichment for CHD4 and MDB3 by CUT and TAG is far higher at sites of decreased accessibility. The low level of CHD4 occupancy observed at sites where accessibility increases may not be relevant to the reason these sites are affected. Such small enrichments can be observed when aligning to other genomic features. The authors interpret their findings as indicating that low occupancy of CHD4 exerts a long-lasting repressive effect at these locations. This is one possible explanation; however, an alternative is that these effects are indirect. Perhaps driven by the very large increase in TF binding that is observed following CHD4 degradation and which appears to occur at many locations regardless of whether CHD4 is present. 

      The reviewer is right to point out that we don’t know what is direct and what is indirect. All we know is that changes happen very rapidly upon CHD4 depletion. The changes in standard ATAC-seq signal appear greater at the sites showing decreased accessibility than those increasing, however the starting points are very different: a small increase from very low accessibility will likely be a higher fold change than a more visible decrease from very high accessibility (Fig. 1D). In contrast, Figure 6 shows a more visible increase in Tn5 integrations at sites increasing in accessibility at 30 minutes than the change in sites decreasing in accessibility at 30 minutes. We therefore disagree that the sites increasing in accessibility are more likely to be indirect targets. In further support of this, there is a rapid increase in MNase resistance at these sites upon MBD3 reintroduction (Fig. 6I), possibly indicating a direct impact of NuRD on these sites. 

      Substantial changes in Nanog and SOX2 binding are observed across the time course. These changes are very large, with 43k or 78k additional sites detected. How is this possible? Does the amount of these TF's present in cells change? The argument that transient occupancy of CHD4 acts to prevent TF's binding to what is likely to be many 100's of thousands of sites (if the data for Nanog and SOX2 are representative of other transcription factors such as KLF4) seems unlikely. 

      The large number of different sites identified gaining TF binding is likely to be a reflection of the number of cells being analysed: within the 10<sup>5</sup>-10<sup>6</sup> cells used for a Cut&Run experiment we detect many sites gaining TF binding. In individual cells we agree it would be unlikely for that many sites to become bound at the same time. We detect no changes in the amounts of Nanog or Sox2 in our cells across 4 hour CHD4 depletion time course. However, we maintain that low frequency interactions of CHD4 with a site can counteract low frequency TF binding and prevent it from stimulating opening of a cryptic enhancer. 

      While increased TF binding is observed at sites of gained accessibility, the changes in TF occupancy at the lost sites do not progress continuously across the time course. In addition, the changes in occupancy are small in comparison to those observed at the gained sites. The text comments on an increase in SOX2 and Nanog occupancy at 30 min, but there is either no change or a loss by 4 hours. It's difficult to know what to conclude from this. 

      At sites losing accessibility the enrichment of both Nanog and Sox2 increases at 30 minutes. We suspect this is due to the loss of CHD4’s TF-removal activity. Thereafter the two TFs show different trends: Nanog enrichment then decreases again, probably due to the decrease in accessibility at these sites. Sox2, by contrast, does not change very much, possibly due to its higher pioneering ability. It is true that the amounts of change are very small here, however Cut&Run was performed in triplicate and the summary graphs are plotted with standard error of the mean (which is often too small to see), demonstrating that the detected changes are highly significant. (We neglected to refer to the SEM  in our figure legends: this has now been corrected.) At sites where CHD4 maintains chromatin compaction, the amount of transcription factor binding goes from zero or nearly zero to some finite number, hence the fold change is very large. In contrast the changes at sites losing accessibility starts from high enrichment so fold changes are much smaller. 

      Changes in the diffusive motion of tagged TF's are measured. The data is presented as an average of measurements of individual TF's. What might be anticipated is that subpopulations of TF's would exhibit distinct behaviours. At many locations, occupancy of these TF's are presumably unchanged. At 1 hour, many new sites are occupied, and this would represent a subpopulation with high residence. A small population of TF's would be subject to distinct effects at the sites where accessibility reduces at the onehour time point. The analysis presented fails to distinguish populations of TF's exhibiting altered mobility consistent with the proportion of the TF's showing altered binding. 

      We agree that there are likely subpopulations of TFs exhibiting distinct binding behaviours, and our modality of imaging captures this, but to distinguish subpopulations within this would require a lot more data.

      However, there is no reason to believe that the TF binding at the new sites being occupied at 1 hr would have a difference in residence time to those sites already stably bound by TFs in the wildtype, i.e. that they would exhibit a different limitation to their residence time once bound compared to those sites. We do capture more stably bound trajectories per cell, but that’s not what we’re reporting on - it’s the dissociation rate of those that have already bound in a stable manner at sites where TF occupancy is detected also by ChIP.

      The analysis of transcription shown in Figure 2 indicates that high-quality data has been obtained, showing progressive changes to transcription. The linkage of the differentially expressed genes to chromatin changes shown in Figure 3 is difficult to interpret. The curves showing the distance distribution for increased or decreased DARs are quite similar for up- and down-regulated genes. The frequency density for gained sites is slightly higher, but not as much higher as would be expected, given these sites are c6fold more abundant than the sites with lost accessibility. The data presented do not provide a compelling link between the CHD4-induced chromatin changes and changes to transcription; the authors should consider revising to accommodate this. It is possible that much of the transcriptional response even at early time points is indirect. This is not unprecedented. For example, degradation of SOX2, a transcriptional activator, results in both repression and activation of similar numbers of genes https://pmc.ncbi.nlm.nih.gov/articles/PMC10577566/ 

      We agree that these figures do not provide a compelling link between the observed chromatin changes and gene expression changes. That 50K increased sites are, on average, located farther away from misregulated genes than are the 8K decreasing sites highlights that this is rarely going to be a case of direct derepression of a silenced gene, but rather distal sites could act as enhancers to spuriously activate transcription. This would certainly be a rare event, but could explain the low-level transcriptional noise seen in NuRD mutants. We have edited the wording to make this clearer.

      The model presented in Figure 7 includes distinct roles at sites that become more or less accessible following inactivation of CHD4. This is perplexing as it implies that the same enzymes perform opposing functions at some of the different sites where they are bound. 

      Our point is that it does the same thing at both kinds of sites, but the nature of the sites means that the consequences of CHD4 activity will be different. We have tried to make this clear in the text. 

      At active sites, it is clear that CHD4 is bound prior to activation of the degron and that chromatin accessibility is reduced following depletion. Changes in TF occupancy are complex, perhaps reflecting slow diffusion from less accessible chromatin and a global increase in the abundance of some pluripotency transcription factors such as SOX2 and Nanog that are competent for DNA binding. The link between sites of reduced accessibility and transcription is less clear. 

      At the inactive sites, the increase in accessibility could be driven by transcription factor binding. There is very little CHD4 present at these sites prior to activation of the degron, and TF binding may induce chromatin opening, which could be considered a rapid but indirect effect of the CHD4 degron. The link to transcription is not clear from the data presented, but it would be anticipated that in some cases it would drive activation. 

      We acknowledge these points and have indicated this possibility in the Results and the Discussion.

      No Analysis is performed to identify binding sequences enriched at the locations of decreased accessibility. This could potentially define transcription factors involved in CHD4 recruitment or that cause CHD4 to function differently in different contexts. 

      HOMER analyses failed to provide any unique insights. The sites going down are highly accessible in ES cells: they have TF binding sites that one would expect in ES cells. The increasing sites show an enrichment for G-rich sequences, which reflects the binding preference of CHD4.

    1. eLife Assessment

      This valuable study presents Altair-LSFM, a well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and reduced cost. The approach provides compelling evidence of its strengths, including the use of custom-machined baseplates, detailed assembly instructions, and demonstrated live-cell imaging capabilities. This manuscript will be of interest to microscopists and potentially biologists seeking accessible LSFM tools.

    2. Reviewer #1 (Public review):

      Summary:

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths:

      The article includes extensive supplementary material that complements the information in the main article.

      Live imaging has been incorporated, as requested, increasing the value of the paper.

      Weaknesses:

      None

    3. Reviewer #2 (Public review):

      Summary:

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source light-sheet microscope, that may be relatively easy to align and construct due to a custom-designed mounting plate. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or achieve high-resolution but are difficult to construct and are unstable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for a high-resolution, economical and easy to implement LSFM systems and address this need with Altair.

      Strengths:

      The authors succeed in their goals of implementing a relatively low cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      Weaknesses:

      There is still a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is now discussed in the manuscript but remains a limitation in the currently implemented design.

      (2) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. In the revised manuscript the authors now implement temperature control, but ideal live cell imaging conditions that would include gas and humidity control are not implemented. While, as the authors note, other microscopes that lack full environmental control have achieved widespread adoption, in my view this still limits the use cases of this microscope. There is no discussion on how this limitation of environmental control may be overcome in future iterations.

      (3) While the microscope is well designed and completely open source it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested even if they can afford it. Claims on how easy it may be to align the system for a "Novice" in supplementary table 5, appear to be unsubstantiated and should be removed unless a Novice was indeed able to assemble and validate the system in 2 weeks. It seems that these numbers were just arbitrarily proposed in the current version without any testing. In our experience it's hard to predict how long an alignment will take for a novice.

      (4) There is no quantification on field uniformity and the tunability of the light sheet parameters (FOV, thickness, PSF, uniformity). There is no quantification on how much improvement is offered by the resonant and how its operation may alter the light-sheet power, uniformity and the measured PSF.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging.

      The system is designed for ease of assembly and use, incorporating a custom-machined baseplate and in silico optimized optical paths to ensure robust alignment and performance.

      The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy.

      Compelling validation using fluorescent beads multicolor cellular imaging and dual-color live-cell imaging highlights the system's performance. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers that want to implement such a system, providing also estimate costs and a detailed description of needed expertises.

      Strengths:

      - Strong and accessible technical innovation.

      With an elegant combination of beam shaping and optical modelling, the authors provide a high resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of thin light-sheet and small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      - Impeccable optical performances and ease of mounting of samples

      The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity. At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      -Transparency and comprehensiveness of documentation and resources.

      A very detailed protocol provides detailed documentation about the setup, the optical modeling and the total cost.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. At this stage, I believe the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community.

      Comments on revisions:

      I appreciate the details and the care expressed by the authors in answering all my concerns, both the bigger ones (lack of live cell imaging demonstration) and to the smaller ones (about data storage, costs, expertise needed, and so on). The manuscript has been greatly improved, and I have no other comments to make.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the editors and reviewers for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to promote accessibility and reproducibility. Below, we provide point-by-point responses to each referee comment. In the process, we have significantly revised the manuscript to include live-cell imaging data and a quantitative evaluation of imaging speed. We now more explicitly describe the different variants of lattice light-sheet microscopy—highlighting differences in their illumination flexibility and image acquisition modes—and clarify how Altair-LSFM compares to each. We further discuss challenges associated with the 5 mm coverslip and propose practical strategies to overcome them. Additionally, we outline cost-reduction opportunities, explain the rationale behind key equipment selections, and provide guidance for implementing environmental control. Altogether, we believe these additions have strengthened the manuscript and clarified both the capabilities and limitations of AltairLSFM.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths: 

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      We thank the reviewer for their thoughtful assessment and for recognizing the strengths of our manuscript, including the extensive supplementary material. Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. We would therefore greatly appreciate the reviewer’s guidance on which sections were perceived as superficial so that we can expand them to better support readers and builders of the system.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to substantially reduce the barrier to entry for non-specialist laboratories. Many open-source implementations, such as OpenSPIM, OpenSPIN, and Benchtop mesoSPIM, similarly focused on accessibility and reproducibility rather than introducing new optical modalities, yet have had a measureable impact on the field by enabling broader community participation. Altair-LSFM follows this tradition, providing sub-cellular resolution performance comparable to advanced systems like LLSM, while emphasizing reproducibility, ease of construction through a precision-machined baseplate, and comprehensive documentation to facilitate dissemination and adoption.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We appreciate the reviewer’s comment and agree that there are practical challenges associated with handling 5 mm diameter coverslips in this configuration. In the revised manuscript, we now explicitly describe these challenges and provide practical solutions. Specifically, we highlight the use of a custommachined coverslip holder designed to simplify mounting and handling, and we direct readers to an alternative configuration using the Zeiss W Plan-Apochromat 20×/1.0 objective, which eliminates the need for small coverslips altogether.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We appreciate the reviewer’s perspective and understand the concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. Our decision to use these components was intentional: relying on a unified, professionally supported and maintained platform minimizes complexity associated with sourcing, configuring, and integrating hardware from multiple vendors, thereby reducing non-financial barriers to entry for non-specialist users.

      Importantly, these components are not the primary cost driver of Altair-LSFM (they represent roughly 18% of the total system cost). Nonetheless, for individuals where the price is prohibitive, we also outline several viable cost-reduction options in the revised manuscript (e.g., substituting manual stages, omitting the filter wheel, or using industrial CMOS cameras), while discussing the trade-offs these substitutions introduce in performance and usability. These considerations are now summarized in Supplementary Note 1, which provides a transparent rationale for our design and cost decisions.

      Finally, we note that even with these professional-grade components, Altair-LSFM remains substantially less expensive than commercial systems offering comparable optical performance, such as LLSM implementations from Zeiss or 3i.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our data. As noted, the current manuscript focuses on validating the optical performance and resolution of the system using fixed specimens to ensure reproducibility and stability.

      We fully agree on the importance of environmental control for live-cell imaging. In the revised manuscript, we now describe in detail how temperature regulation can be achieved using a custom-designed heated sample chamber, accompanied by detailed assembly instructions on our GitHub repository and summarized in Supplementary Note 2. For pH stabilization in systems lacking a 5% CO₂ atmosphere, we recommend supplementing the imaging medium with 10–25 mM HEPES buffer. Additionally, we include new live-cell imaging data demonstrating that Altair-LSFM supports in vitro time-lapse imaging of dynamic cellular processes under controlled temperature conditions.

      Reviewer #2 (Public review): 

      Summary: 

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems. 

      We thank the reviewer for their thoughtful summary. We agree that existing open-source systems primarily emphasize imaging of large specimens, whereas commercial systems that achieve sub-cellular resolution remain costly and complex. Our aim with Altair-LSFM was to bridge this gap—providing LLSM-level performance in a substantially more accessible and reproducible format. By combining high-NA optics with a precision-machined baseplate and open-source documentation, Altair offers a practical, high-resolution solution that can be readily adopted by non-specialist laboratories.

      Strengths: 

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful and generous assessment of our work. We are pleased that the manuscript’s emphasis on fundamental optical principles, design rationale, and practical implementation was clearly conveyed. We agree that Altair’s modular and accessible architecture provides a strong foundation for future variants tailored to specific experimental needs. To facilitate this, we have made all Zemax simulations, CAD files, and build documentation openly available on our GitHub repository, enabling users to adapt and extend the system for diverse imaging applications.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      We thank the reviewer for this thoughtful and constructive comment. We have revised the manuscript to more clearly distinguish between the original open-source implementation of LLSM and subsequent commercial versions by 3i and ZEISS. The revised Introduction and Discussion now explicitly note that while open-source and early implementations of LLSM can require expert alignment and maintenance, commercial systems—particularly the ZEISS Lattice Lightsheet 7—are designed for automated operation and stable, turn-key use, albeit at higher cost and with limited modifiability. We have also moderated earlier language regarding usability and stability to avoid anecdotal phrasing.

      We also now provide a more objective proxy for system complexity: the number of optical elements that require precise alignment during assembly and maintenance thereafter. The original open-source LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair-LSFM system contains only nine such elements. By this metric, Altair-LSFM is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories.

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We thank the reviewer for this helpful comment. We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may pose a practical limitation for some users. We now discuss this more explicitly in the revised manuscript. Specifically, we note that replacing the detection objective provides a straightforward solution to this constraint. For example, as demonstrated by Moore et al. (Lab Chip, 2021), pairing the Zeiss W Plan-Apochromat 20×/1.0 detection objective with the Thorlabs TL20X-MPL illumination objective allows imaging beyond the physical surfaces of both objectives, eliminating the need for small-format coverslips. In the revised text, we propose this modification as an accessible path toward greater compatibility with conventional sample mounting formats. We also note in the Discussion that Oblique Plane Microscopy (OPM) inherently avoids such nonstandard mounting requirements and, owing to its single-objective architecture, is fully compatible with standard environmental chambers.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We thank the reviewer for this important observation and agree that environmental control is critical for live-cell imaging applications. It is worth noting that the original open-source LLSM design, as well as the commercial version developed by 3i, provided temperature regulation but did not include integrated control of CO2 or humidity. Despite this limitation, these systems have been widely adopted and have generated significant biological insights. We also acknowledge that both OPM and the ZEISS implementation of LLSM offer clear advantages in this respect, providing compatibility with standard commercial environmental chambers that support full regulation of temperature, CO₂, and humidity.

      In the revised manuscript, we expand our discussion of environmental control in Supplementary Note 2, where we describe the Altair-LSFM chamber design in more detail and discuss its current implementation of temperature regulation and HEPES-based pH stabilization. Additionally, the Discussion now explicitly notes that OPM avoids the challenges associated with non-standard sample mounting and is inherently compatible with conventional environmental enclosures.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We agree that the original LLSM design offers substantially greater flexibility than what is reflected in our initial comparison, including the ability to generate multiple lattice geometries (e.g., square and hexagonal), operate in structured illumination mode, and acquire volumes using both sample- and lightsheet–scanning strategies. To address this, we now include Supplementary Note 3 that provides a detailed overview of the illumination modes and imaging flexibility afforded by the original LLSM implementation, and how these capabilities compare to both the commercial ZEISS Lattice Lightsheet 7 and our AltairLSFM system. In addition, we have revised the discussion to explicitly acknowledge that the original LLSM could operate in alternative scan strategies beyond sample scanning, providing greater context for readers and ensuring a more balanced comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we now include a demonstration of live-cell imaging to directly validate AltairLSFM’s suitability for dynamic biological applications. We also explicitly discuss the temporal resolution of the system in the main text (see Optoelectronic Design of Altair-LSFM), where we detail both software- and hardware-related limitations. Specifically, we evaluate the maximum imaging speed achievable with Altair-LSFM in conjunction with our open-source control software, navigate.

      For simplicity and reduced optoelectronic complexity, the current implementation powers the piezo through the ASI Tiger Controller, which modestly reduces its bandwidth. Nonetheless, for a 100 µm stroke typical of light-sheet imaging, we achieved sufficient performance to support volumetric imaging at most biologically relevant timescales. These results, along with additional discussion of the design trade-offs and performance considerations, are now included in the revised manuscript and expanded upon in the supplementary material.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a basic understanding of (or willingness to learn) optics, electronics, and instrumentation. Such a barrier exists for all open-source microscopes, and our goal is not to eliminate this requirement entirely but to substantially reduce the technical and logistical challenges that typically accompany the construction of custom light-sheet systems.

      Importantly, no machining experience or in-house fabrication capabilities are required. Users can simply submit the provided CAD design files and specifications directly to commercial vendors for fabrication. We have made this process as straightforward as possible by supplying detailed build instructions, recommended materials, and vendor-ready files through our GitHub repository. Our dissemination strategy draws inspiration from other successful open-source projects such as mesoSPIM, which has seen widespread adoption—over 30 implementations worldwide—through a similar model of exhaustive documentation, open-source software, and community support via user meetings and workshops.

      We also recognize that documentation alone cannot fully replace hands-on experience. To further lower barriers to adoption, we are actively working with commercial vendors to streamline procurement and assembly, and Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant that provides resources for hosting workshops, offering real-time community support, and developing supplementary training materials.

      In the revised manuscript, we now expand the Discussion to explicitly acknowledge these implementation considerations and to outline our ongoing efforts to support a broad and diverse user base, ensuring that laboratories with varying levels of technical expertise can successfully adopt and maintain the Altair-LSFM platform.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We thank the reviewer for this insightful comment and agree that our original language regarding adaptability may have overstated the degree to which Altair-LSFM can be modified without prior experience. It was not our intention to imply that the system can be easily redesigned by users with limited technical background. Meaningful adaptations of the optical or mechanical design do require expertise in optical layout, optomechanical design, and alignment.

      That said, for laboratories with such expertise, we aim to facilitate modifications by providing comprehensive resources—including detailed Zemax simulations, complete CAD models, and alignment documentation. These materials are intended to reduce the development burden for expert users seeking to tailor the system to specific experimental requirements, without necessitating a complete re-optimization of the optical path from first principles.

      In the revised manuscript, we clarify this point and temper our language regarding adaptability to better reflect the realistic scope of customization. Specifically, we now state in the Discussion: “For expert users who wish to tailor the instrument, we also provide all Zemax illumination-path simulations and CAD files, along with step-by-step optimization protocols, enabling modification and re-optimization of the optical system as needed.” This revision ensures that readers clearly understand that Altair-LSFM is designed for reproducibility and straightforward assembly in its default configuration, while still offering the flexibility for modification by experienced users.

      Reviewer #3 (Public review):

      Summary: 

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging. The system is designed for ease of assembly and use, incorporating a custommachined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells. The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy. Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      We thank the reviewer for their thoughtful and positive assessment of our work. We appreciate their recognition of Altair-LSFM’s design and performance, including its ability to achieve high-resolution, imaging throughout a 266-micron field of view. While Altair-LSFM approaches the practical limits of diffraction-limited performance, it does not exceed the fundamental diffraction limit; rather, it achieves near-theoretical resolution through careful optical optimization, beam shaping, and alignment. We are grateful for the reviewer’s acknowledgment of the accessibility and comprehensive documentation that make this system broadly implementable.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity.

      At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      We thank the reviewer for their thoughtful and encouraging comments. We are pleased that the technical innovation, optical performance, and accessibility of Altair-LSFM were recognized. Our goal from the outset was to develop a diffraction-limited, high-resolution light-sheet system that balances optical performance with reproducibility and ease of implementation. We are also pleased that the use of precisionmachined baseplates was recognized as a practical and effective strategy for achieving performance while maintaining ease of assembly.

      Weaknesses: 

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signalto-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we have significantly expanded our discussion of different light-sheet systems to provide clearer quantitative and conceptual context for Altair-LSFM. These comparisons are based on values reported in the literature, as we do not have access to many of these instruments (e.g., DaXi, diSPIM, or commercial and open-source variants of LLSM), and a direct experimental comparison is beyond the scope of this work.

      We note that while quantitative parameters such as signal-to-noise ratio are important, they are highly sample-dependent and strongly influenced by imaging conditions, including fluorophore brightness, camera characteristics, and filter bandpass selection. For this reason, we limited our comparison to more general image-quality metrics—such as light-sheet thickness, resolution, and field of view—that can be reliably compared across systems.

      Finally, per the reviewer’s recommendation, we have added additional discussion clarifying the differences between dual-objective and single-objective light-sheet architectures, outlining their respective strengths, limitations, and suitability for different experimental contexts.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We fully agree that environmental control is critical for live-cell imaging applications. In the revised manuscript, we now describe the design and implementation of a temperature-regulated sample chamber in Supplementary Note 2, which maintains stable imaging conditions through the use of integrated heating elements and thermocouples. This approach enables precise temperature control while minimizing thermal gradients and optical drift. For pH stabilization, we recommend the use of 10–25 mM HEPES in place of CO₂ regulation, consistent with established practice for most light-sheet systems, including the initial variant of LLSM. Although full humidity and CO₂ control are not readily implemented in dual-objective configurations, we note that single-objective designs such as OPM are inherently compatible with commercial environmental chambers and avoid these constraints. Together, these additions clarify how environmental control can be achieved within Altair-LSFM and situate its capabilities within the broader LSFM design space.

      (3) System cost and data storage cost: While the system presented has the advantage of being opensource, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We agree that cost considerations are critical for adoption in academic environments. We would also like to clarify that the quoted $150k includes the optical table and laser source. In the revised manuscript, Supplementary Note 1 now includes an expanded discussion of cost–performance trade-offs and potential paths for cost reduction.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      In the revised manuscript, we now include Supplementary Note 4, which provides a high-level discussion of data storage needs, approximate costs, and practical strategies for managing large datasets generated by light-sheet microscopy. This section offers general guidance—including file-format recommendations, and cost considerations—but we note that actual costs will vary by institution and contractual agreements.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) A picture, or full CAD design of the complete instrument, should be included as a main figure.

      A complete CAD rendering of the microscope is now provided in Supplementary Figure 4.

      (2) There is no quantitative comparison of the effects of the tilting resonant galvo; only a cartoon, a figure should be included.

      The cartoon was intended purely as an educational illustration to conceptually explain the role of the tilting resonant galvo in shaping and homogenizing the light sheet. To clarify this intent, we have revised both the figure legend and corresponding text in the main manuscript. For readers seeking quantitative comparisons, we now reference the original study that provides a detailed analysis of this optical approach, as well as a review on the subject.

      (3) Description of L4 is missing in the Figure 1 caption.

      Thank you for catching this omission. We have corrected it.

      (4) The beam profiles in Figures 1c and 3a, please crop and make the image bigger so the profile can be appreciated. The PSFs in Figure 3c-e should similarly be enlarged and presented using a dynamic range/LUT such that any aberrations can be appreciated.

      In Figure 1c, our goal was to qualitatively illustrate the uniformity of the light-sheet across the full field of view, while Figure 1d provided the corresponding quantitative cross-section. To improve clarity, we have added an additional figure panel offering a higher-magnification, localized view of the light-sheet profile. For Figure 3c–e, we have enlarged the PSF images and adjusted the display range to better convey the underlying signal and allow subtle aberrations to be appreciated.

      (5) It is unclear why LLSM is being used as the gold standard, since in its current commercial form, available from Zeiss, it is a turn-key system designed for core facilities. The original LLSM is also a versatile instrument that provides much more than the square lattice for illumination, including structured illumination, hexagonal lattices, live-cell imaging, wide-field illumination, different scan modes, etc. These additional features are not even mentioned when compared to the Altair-LSFM. If a comparison is to be provided, it should be fair and balanced. Furthermore, as outlined in the public review, anecdotal statements on "most used", "difficult to align", or "unstable" should not be provided without data.

      In the revised manuscript, we have carefully removed anecdotal statements and, where appropriate, replaced them with quantitative or verifiable information. For instance, we now explicitly report that the square lattice was used in 16 of the 20 figure subpanels in the original LLSM publication, and we include a proxy for optical complexity based on the number of optical elements requiring alignment in each system.

      We also now clearly distinguish between the original LLSM design—which supports multiple illumination and scanning modes—and its subsequent commercial variants, including the ZEISS Lattice Lightsheet 7, which prioritizes stability and ease of use over configurational flexibility (see Supplementary Note 3).

      (6) The authors should recognize that implementing custom optics, no matter how well designed, is a big barrier to cross for most cell biology labs.

      We fully understand and now acknowledge in the main text that implementing custom optics can present a significant barrier, particularly for laboratories without prior experience in optical system assembly. However, similar challenges were encountered during the adoption of other open-source microscopy platforms, such as mesoSPIM and OpenSPIM, both of which have nonetheless achieved widespread implementation. Their success has largely been driven by exhaustive documentation, strong community support, and standardized design principles—approaches we have also prioritized in Altair-LSFM. We have therefore made all CAD files, alignment guides, and detailed build documentation publicly available and continue to develop instructional materials and community resources to further reduce the barrier to adoption.

      (7) Statements on "hands on workshops" though laudable, may not be appropriate to include in a scientific publication without some documentation on the influence they have had on implanting the microscope.

      We understand the concern. Our intention in mentioning hands-on workshops was to convey that the dissemination effort is supported by an NIH Biomedical Technology Development and Dissemination grant, which includes dedicated channels for outreach and community engagement. Nonetheless, we agree that such statements are not appropriate without formal documentation of their impact, and we have therefore removed this text from the revised manuscript.

      (8) It is claimed that the microscope is "reliable" in the discussion, but with no proof, long-term stability should be assessed and included.

      Our experience with Altair-LSFM has been that it remains well-aligned over time—especially in comparison to other light-sheet systems we worked on throughout the last 11 years—we acknowledge that this assessment is anecdotal. As such, we have omitted this claim from the revised manuscript.

      (9) Due to the reliance on anecdotal statements and comparisons without proof to other systems, this paper at times reads like a brochure rather than a scientific publication. The authors should consider editing their manuscript accordingly to focus on the technical and quantifiable aspects of their work.

      We agree with the reviewer’s assessment and have revised the manuscript to remove anecdotal comparisons and subjective language. Where possible, we now provide quantitative metrics or verifiable data to support our statements.

      Reviewer #3 (Recommendations for the authors):

      Other minor points that could improve the manuscript (although some of these points are explained in the huge supplementary manual): 

      (1) The authors explain thoroughly their design, and they chose a sample-scanning method. I think that a brief discussion of the advantages and disadvantages of such a method over, for example, a laserscanning system (with fixed sample) in the main text will be highly beneficial for the users.

      In the revised manuscript, we now include a brief discussion in the main text outlining the advantages and limitations of a sample-scanning approach relative to a light-sheet–scanning system. Specifically, we note that for thin, adherent specimens, sample scanning minimizes the optical path length through the sample, allowing the use of more tightly focused illumination beams that improve axial resolution. We also include a new supplementary figure illustrating how this configuration reduces the propagation length of the illumination light sheet, thereby enhancing axial resolution.

      (2) The authors justify selecting a 0.6 NA illumination objective over alternatives (e.g., Special Optics), but the manuscript would benefit from a more quantitative trade-off analysis (beam waist, working distance, sample compatibility) with other possibilities. Within the objective context, a comparison of the performances of this system with the new and upcoming single-objective light-sheet methods (and the ones based also on optical refocusing, e.g., DAXI) would be very interesting for the goodness of the manuscript.

      In the revised manuscript, we now provide a quantitative trade-off analysis of the illumination objectives in Supplementary Note 1, including comparisons of beam waist, working distance, and sample compatibility. This section also presents calculated point spread functions for both the 0.6 NA and 0.67 NA objectives, outlining the performance trade-offs that informed our design choice. In addition, Supplementary Note 3 now includes a broader comparison of Altair-LSFM with other light-sheet modalities, including diSPIM, ASLM, and OPM, to further contextualize the system’s capabilities within the evolving light-sheet microscopy landscape.

      (3) The modularity of the system is implied in the context of the manuscript, but not fully explained. The authors should specify more clearly, for example, if cameras could be easily changed, objectives could be easily swapped, light-sheet thickness could be tuned by changing cylindrical lens, how users might adapt the system for different samples (e.g., embryos, cleared tissue, live imaging), .etc, and discuss eventual constraints or compatibility issues to these implementations.

      Altair-LSFM was explicitly designed and optimized for imaging live adherent cells, where sample scanning and short light-sheet propagation lengths provide optimal axial resolution (Supplementary Note 3). While the same platform could be used for superficial imaging in embryos, systems implementing multiview illumination and detection schemes are better suited for such specimens. Similarly, cleared tissue imaging typically requires specialized solvent-compatible objectives and approaches such as ASLM that maximize the field of view. We have now added some text to the Design Principles section that explicitly state this.

      Altair-LSFM offers varying levels of modularity depending on the user’s level of expertise. For entry-level users, the illumination numerical aperture—and therefore the light-sheet thickness and propagation length—can be readily adjusted by tuning the rectangular aperture conjugate to the back pupil of the illumination objective, as described in the Design Principles section. For mid-level users, alternative configurations of Altair-LSFM, including different detection objectives, stages, filter wheels, or cameras, can be readily implemented (Supplementary Note 1). Importantly, navigate natively supports a broad range of hardware devices, and new components can be easily integrated through its modular interface. For expert users, all Zemax simulations, CAD models, and step-by-step optimization protocols are openly provided, enabling complete re-optimization of the optical design to meet specific experimental requirements.

      (4) Resolution measurements before and after deconvolution are central to the performance claim, but the deconvolution method (PetaKit5D) is only briefly mentioned in the main text, it's not referenced, and has to be clarified in more detail, coherently with the precision of the supplementary information. More specifically, PetaKit5D should be referenced in the main text, the details of the deconvolution parameters discussed in the Methods section, and the computational requirements should also be mentioned. 

      In the revised manuscript, we now provide a dedicated description of the deconvolution process in the Methods section, including the specific parameters and algorithms used. We have also explicitly referenced PetaKit5D in the main text to ensure proper attribution and clarity. Additionally, we note the computational requirements associated with this analysis in the same section for completeness.

      (5)  Image post-processing is not fully explained in the main text. Since the system is sample-scanning based, no word in the main text is spent on deskewing, which is an integral part of the post-processing to obtain a "straight" 3D stack. Since other systems implement such a post-processing algorithm (for example, single-objective architectures), it would be beneficial to have some discussion about this, and also a brief comparison to other systems in the main text in the methods section. 

      In the revised manuscript, we now explicitly describe both deskewing (shearing) and deconvolution procedures in the Alignment and Characterization section of the main text and direct readers to the Methods section. We also briefly explain why the data must be sheared to correct for the angled sample-scanning geometry for LLSM and Altair-LSFM, as well as both sample-scanning and laser-scanning-variants of OPMs.

      (6) A brief discussion on comparative costs with other systems (LLSM, dispim, etc.) could be helpful for non-imaging expert researchers who could try to implement such an optical architecture in their lab.

      Unfortunately, the exact costs of commercial systems such as LLSM or diSPIM are typically not publicly available, as they depend on institutional agreements and vendor-specific quotations. Nonetheless, we now provide approximate cost estimates in Supplementary Note 1 to help readers and prospective users gauge the expected scale of investment relative to other advanced light-sheet microscopy systems.

      (7) The "navigate" control software is provided, but a brief discussion on its advantages compared to an already open-access system, such as Micromanager, could be useful for the users.

      In the revised manuscript, we now include Supplementary Note 5 that discusses the advantages and disadvantages of different open-source microscope control platforms, including navigate and MicroManager. In brief, navigate was designed to provide turnkey support for multiple light-sheet architectures, with pre-configured acquisition routines optimized for Altair-LSFM, integrated data management with support for multiple file formats (TIFF, HDF5, N5, and Zarr), and full interoperability with OMEcompliant workflows. By contrast, while Micro-Manager offers a broader library of hardware drivers, it typically requires manual configuration and custom scripting for advanced light-sheet imaging workflows.

      (8) The cost and parts are well documented, but the time and expertise required are not crystal clear.Adding a simple time estimate (perhaps in the Supplement Section) of assembly/alignment/installation/validation and first imaging will be very beneficial for users. Also, what level of expertise is assumed (prior optics experience, for example) to be needed to install a system like this? This can help non-optics-expert users to better understand what kind of adventure they are putting themselves through.

      We thank the reviewer for this helpful suggestion. To address this, we have added Supplementary Table S5, which provides approximate time estimates for assembly, alignment, validation, and first imaging based on the user’s prior experience with optical systems. The table distinguishes between novice (no prior experience), moderate (some experience using but not assembling optical systems), and expert (experienced in building and aligning optical systems) users. This addition is intended to give prospective builders a realistic sense of the time commitment and level of expertise required to assemble and validate AltairLSFM.

      Minor things in the main text:

      (1) Line 109: The cost is considered "excluding the laser source". But then in the table of costs, you mention L4cc as a "multicolor laser source", for 25 K. Can you explain this better? Are the costs correct with or without the laser source? 

      We acknowledge that the statement in line 109 was incorrect—the quoted ~$150k system cost does include the laser source (L4cc, listed at $25k in the cost table). We have corrected this in the revised manuscript.

      (2) Line 113: You say "lateral resolution, but then you state a 3D resolution (230 nm x 230 nm x 370 nm). This needs to be fixed.

      Thank you, we have corrected this.

      (3) Line 138: Is the light-sheet uniformity proven also with a fluorescent dye? This could be beneficial for the main text, showing the performance of the instrument in a fluorescent environment.

      The light-sheet profiles shown in the manuscript were acquired using fluorescein to visualize the beam. We have revised the main text and figure legends to clearly state this.

      (4) Line 149: This is one of the most important features of the system, defying the usual tradeoff between light-sheet thickness and field of view, with a regular Gaussian beam. I would clarify more specifically how you achieve this because this really is the most powerful takeaway of the paper.

      We thank the reviewer for this key observation. The ability of Altair-LSFM to maintain a thin light sheet across a large field of view arises from diffraction effects inherent to high NA illumination. Specifically, diffraction elongates the PSF along the beam’s propagation direction, effectively extending the region over which the light sheet remains sufficiently thin for high-resolution imaging. This phenomenon, which has been the subject of active discussion within the light-sheet microscopy community, allows Altair-LSFM to partially overcome the conventional trade-off between light-sheet thickness and propagation length. We now clarify this point in the main text and provide a more detailed discussion in Supplementary Note 3, which is explicitly referenced in the discussion of the revised manuscript.

      (5) Line 171: You talk about repeatable assembly...have you tried many different baseplates? Otherwise, this is a complicated statement, since this is a proof-of-concept paper. 

      We thank the reviewer for this comment. We have not yet validated the design across multiple independently assembled baseplates and therefore agree that our previous statement regarding repeatable assembly was premature. To avoid overstating the current level of validation, we have removed this statement from the revised manuscript.

      (6) Line 187: same as above. You mention "long-term stability". For how long did you try this? This should be specified in numbers (days, weeks, months, years?) Otherwise, it is a complicated statement to make, since this is a proof-of-concept paper.

      We also agree that referencing long-term stability without quantitative backing is inappropriate, and have removed this statement from the revised manuscript.

      (7) Line 198: "rapid z-stack acquisition. How rapid? Also, what is the limitation of the galvo-scanning in terms of the imaging speed of the system? This should be noted in the methods section.

      In the revised manuscript, we now clarify these points in the Optoelectronic Design section. Specifically, we explicitly note that the resonant galvo used for shadow reduction operates at 4 kHz, ensuring that it is not rate-limiting for any imaging mode. In the same section, we also evaluate the maximum acquisition speeds achievable using navigate and report the theoretical bandwidth of the sample-scanning piezo, which together define the practical limits of volumetric acquisition speed for Altair-LSFM.

      (8) Line 234: Peta5Kit is discussed in the additional documentation, but should be referenced here, as well.

      We now reference and cite PetaKit5D.

      (9) Line 256: "values are on par with LLSM", but no values are provided. Some details should also be provided in the main text.

      In the revised manuscript, we now provide the lateral and axial resolution values originally reported for LLSM in the main text to facilitate direct comparison with Altair-LSFM. Additionally, Supplementary Note 3 now includes an expanded discussion on the nuances of resolution measurement and reporting in lightsheet microscopy.

      Figures:

      (1) Figure 1 could be implemented with Figure 3. They're both discussing the validation of the system (theoretically and with simulations), and they could be together in different panels of the same figure. The experimental light-sheet seems to be shown in a transmission mode. Showing a pattern in a fluorescent dye could also be beneficial for the paper.

      In Figure 1, our goal was to guide readers through the design process—illustrating how the detection objective’s NA sets the system’s resolution, which defines the required pixel size for Nyquist sampling and, in turn, the field of view. We then use Figure 1b–c to show how the illumination beam was designed and simulated to achieve that field of view. In contrast, Figure 3 presents the experimental validation of the illumination system. To avoid confusion, we now clarify in the text that the light sheet shown in Figure 3 was visualized in a fluorescein solution and imaged in transmission mode. While we agree that Figures 1 and 3 both serve to validate the system, we prefer to keep them as separate figures to maintain focus within each panel. We believe this organization better supports the narrative structure and allows readers to digest the theoretical and experimental validations independently.

      (2) Figure 3: Panels d and e show the same thing. Why would you expect that xz and yz profiles should be different? Is this due to the orientation of the objectives towards the sample?

      In Figure 3, we present the PSF from all three orthogonal views, as this provides the most transparent assessment of PSF quality—certain aberration modes can be obscured when only select perspectives are shown. In principle, the XZ and YZ projections should be equivalent in a well-aligned system. However, as seen in the XZ projection, a small degree of coma is present that is not evident in the YZ view. We now explicitly note this observation in the revised figure caption to clarify the difference between these panels.

      (3) Figure 4's single boxes lack a scale bar, and some of the Supplementary Figures (e.g. Figure 5) lack detailed axis labels or scale bars. Also, in the detailed documentation, some figures are referred to as Figure 5. Figure 7 or, for example, figure 6. Figure 8, and this makes the cross-references very complicated to follow

      In the revised manuscript, we have corrected these issues. All figures and supplementary figures now include appropriate scale bars, axis labels, and consistent formatting. We have also carefully reviewed and standardized all cross-references throughout the main text and supplementary documentation to ensure that figure numbering is accurate and easy to follow.

    1. eLife Assessment

      In this study, the authors investigate the role of ZMAT3, a p53 target gene, in tumor suppression and RNA splicing regulation. Using quantitative proteomics, the authors uncover that ZMAT3 knockout leads to upregulation of HKDC1, a gene linked to mitochondrial respiration, and that ZMAT3 suppresses HKDC1 expression by inhibiting c-JUN-mediated transcription. This set of convincing evidence reveals a fundamental mechanism by which ZMAT3 contributes to p53-driven tumor suppression by regulating mitochondrial respiration.

    2. Reviewer #1 (Public review):

      Summary:

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53-mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and showed markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration. The data are novel, compelling and very interesting.

      Comments on revisions:

      The authors have done a thorough job addressing my comments. This manuscript is quite strong and will be highly cited for its novelty and rigor.

    3. Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.

      Comments on revisions:

      The authors have mostly addressed to the concerns raised previously by this reviewer. The lack of functional assays made the reported findings mostly mechanistic with no clear biological context.

      The present manuscript is certainly improved compared to the previous version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:  

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration.  

      Strengths:

      The authors use multiple orthogonal approaches to test the majority of their findings.  The authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration.  

      Weaknesses:

      Some indication as to whether other c-JUN target genes are also regulated by ZMAT3 would improve the broad relevance of the authors' findings.  

      We thank the reviewer for the kind words and the thoughtful suggestion. As recommended, to identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.  

      Strengths:

      Mechanistically, ZMAT3 suppresses HKDC1 transcription by sequestering JUN and preventing its binding to the HKDC1 promoter, resulting in reduced HKDC1 expression. Conversely, p53 mutation leads to ZMAT3 downregulation and HKDC1 overexpression, thereby promoting increased mitochondrial respiration and proliferation. This mechanism is novel; however, the authors should address several points.  

      Weaknesses:

      The authors conduct mechanistic experiments (e.g., transcript and protein quantification, luciferase assays) to demonstrate regulatory interactions between p53, ZMAT3, JUN, and HKDC1. These findings should be supported with functional assays, such as proliferation, apoptosis, or mitochondrial respiration analyses.  

      We thank the reviewer for appreciating our work and for this valuable suggestion. The reviewer rightly pointed out that supporting the regulatory interactions between p53, ZMAT3, JUN and HKDC1 with functional assays such as proliferation, apoptosis and mitochondrial respiration analyses would strengthen our mechanistic data. During the revision of our manuscript, we attempted to address this point by performing simultaneously knockdown of these proteins; however, we observed substantial toxicity under these conditions, making the functional assays technically unfeasible. This outcome was not unexpected as knockdown of JUN or HKDC1 individually results in growth defects.  We therefore focused our efforts on addressing the recommendation for authors.  

      Reviewer #3 (Public review):

      Summary:  

      In their manuscript, Kumar et al. investigate the mechanisms underlying the tumor suppressive function of the RNA binding protein ZMAT3, a previously described tumor suppressor in the p53 pathway. To this end, they use RNA-sequencing and proteomics to characterize changes in ZMAT3-deficient cells, leading them to identify the hexokinase HKDC1 as upregulated with ZMAT3 deficiency first in colorectal cancer cells, then in other cell types of both mouse and human origin. This increase in HKDC1 is associated with increased mitochondrial respiration. As ZMAT3 has been reported as an RNA-binding and DNA-binding protein, the authors investigated this via PAR-CLIP and ChIP-seq but did not observe ZMAT3 binding to HKDC1 pre-mRNA or DNA. Thus, to better understand how ZMAT3 regulates HKDC1, the authors used quantitative proteomics to identify ZMAT3interacting proteins. They identified the transcription factor JUN as a ZMAT3-interacting protein and showed that JUN promotes the increased HKDC1 RNA expression seen with ZMAT3 inactivation. They propose that ZMAT3 inhibits JUN-mediated transcriptional induction of HKDC1 as a mechanism of tumor suppression. This work uncovers novel aspects of the p53 tumor suppressor pathway.  

      Strengths:

      This novel work sheds light on one of the most well-established yet understudied p53 target genes, ZMAT3, and how it contributes to p53's tumor suppressive functions. Overall, this story establishes a p53-ZMAT3-HKDC1 tumor suppressive axis, which has been strongly substantiated using a variety of orthogonal approaches, in different cell lines and with different data sets.  

      Weaknesses:

      While the role of p53 and ZMAT3 in repressing HKDC1 is well substantiated, there is a gap in understanding how ZMAT3 acts to repress JUN-driven activation of the HKDC1 locus. How does ZMAT3 inhibit JUN binding to HKDC1? Can targeted ChIP experiments or RIP experiments be used to make a more definitive model? Can ZMAT3 mutants help to understand the mechanisms? Future work can further establish the mechanisms underlying how ZMAT3 represses JUN activity.  

      We thank the reviewer for the kind words and the invaluable suggestion. The reviewer has an excellent point regarding how ZMAT3 inhibits JUN binding to HKDC1 locus.Our new data included in the revised manuscript show that the ZMAT3-JUN interaction is lost in the presence of DNase or RNase, indicating that the interaction requires both DNA and RNA. This result suggests that ZMAT3 and JUN  form an RNA-dependent, chromatin- associated complex. Although not directly investigated in our study, this finding is consistent with emerging evidence that RBPs can function as chromatin-associated cofactors in transcription. For example, functional interplay between transcription factor YY1 and the RNA binding protein RBM25 co-regulates a broad set of genes, where RBM25 appears to engage promoters first and then recruit YY1, with RNA proposed to guide target recognition. We have discussed this possibility in the discussion section of revised manuscript (page 13). We agree that future work using ZMAT3 mutants and targeted ChIP or RIP assays will be valuable to delineate the precise mechanism by which ZMAT3 inhibits JUN binding to its target genes.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. HKDC1 is emerging as an important player in human cancer. Importantly, the authors show both acute (gene silencing) and chronic (CRISPR KO) approaches to silence ZMAT3, and they do this in several cell lines. Notably, they show that ZMAT3 silencing leads to impaired mitochondrial respiration, in a manner that is rescued by silencing of HKDC1. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter (intron 1), and altered mitochondrial respiration. The findings are compelling, and the authors use multiple orthogonal approaches to test most findings. And the authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration. As such, enthusiasm is high for this manuscript. 

      Addressing the following question would improve the manuscript. 

      It is not clear how many (other) c-JUN target genes might be impacted by ZMAT3; other important c-JUN targets in cancer include GLS1, WEE1, SREBP1, GLUT1, and CD36, so there could be a global impact on metabolism in ZMAT3 KO cells. Can the authors perform qPCR on these targets in ZMAT3 WT and KO cells and see if these target genes are differentially expressed? 

      We thank the reviewer for this thoughtful suggestion. As recommended, we examined the expression of key c-JUN target genes GLS1 (also known as GLS), WEE1, SREBP1, GLUT1, and CD36 in ZMAT3-WT and ZMAT3-KO cells. We first analyzed publicly available JUN ChIP-Seq data from three ENCODE cell lines, which revealed JUN binding peaks near or upstream of exon 1 for GLS1/GLS, SREBP1, and SLC2A1/GLUT1, but not for WEE1 or CD36 (Appendix 1, panels A-E). Based on these results, we performed RT-qPCR for GLS1/GLS, SREBP1 and SLC2A1 in ZMAT3-WT and ZMAT3-KO cells, with or without JUN knockdown. GLS mRNA was significantly reduced upon JUN knockdown in both ZMAT3-WT cells and ZMAT3-KO cells, but it was not upregulated upon loss of ZMAT3, indicating that GLS is a JUN target gene, but it is not regulated by ZMAT3. In contrast, SREBF1 or SLC2A1 expression remained unchanged upon ZMAT3 loss or JUN knockdown (Appendix 1 panels F-H). These data suggest that the ZMAT3/JUN axis does not regulate the expression of these genes.

      To identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Minor concerns: 

      (1) Line 150: observed a modest. 

      (2) Line 159: Figure 2G appears to be inaccurately cited. 

      (3) Line 191: assays to measure. 

      We thank the reviewer for pointing these out. These minor concerns have been addressed in the text.  

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 1E: Can the authors clarify what the numbers on the left side of the chart represent? Do they refer to the scale?

      The numbers on the Y-axis represent the -log 10 (p- value) where higher values correspond to more significant changes. For visualization purposes, the significant changes are shown in red.  

      (2) Page 5, line 123: The sentence "As expected, ZMAT3 mRNA levels were decreased in the ZMAT3-KO cells" is redundant, as this information was already mentioned on page 4, line 103.  

      We thank the reviewer for noticing this redundancy. The repeated sentence has been removed in the revised manuscript.  

      (3) Page 5: The authors state: "Transcriptome-wide, upon loss of ZMAT3, 606 genes were significantly up-regulated (adj. p < 0.05 and 1.5-fold change) and 552 were down-regulated, with a median fold change of 1.76 and 0.55 for the up- and down-regulated genes, respectively." Later, on page 6, they write: "Comparison of the RNA-seq data from ZMAT3WT vs. ZMAT3-KO and CTRL siRNA vs. ZMAT3 siRNA-transfected HCT116 cells indicated that 1023 genes were commonly up-regulated, and 1042 were commonly down-regulated upon ZMAT3 loss (Figure S2C and D)." Why is the number of deregulated transcripts higher in the ZMAT3-WT vs. ZMAT3-KO comparison than in the CTRL siRNA vs. ZMAT3 siRNA comparison? Are the authors using less stringent criteria in the second analysis? This point should be clarified. 

      We thank the reviewer for highlighting this point. The reviewer is correct that less stringent criteria were used in the second analysis. On page 5, we applied stringent thresholds (adjusted p-value < 0.05 and 1.5-fold change) to identify high-confidence transcriptome-wide changes upon ZMAT3 loss. In contrast, for the comparison of both RNA-seq datasets (ZMAT3-WT vs. KO and siCTRL vs. siZMAT3), we included genes that were consistently up- or downregulated, without applying a fold change threshold, focusing instead on significantly altered genes (adjusted p < 0.05) in both datasets. This allowed us to capture broader and more reproducible transcriptomic changes that occur upon ZMAT3 depletion, including modest but significant changes upon transient ZMAT3 knockdown with siRNAs. We have now clarified this distinction on page 6 of the revised manuscript.

      (4) Figures 2B and 2E: The authors should provide quantification of HKDC1 protein levels normalized to a loading control. In addition, they should assess HKDC1 protein abundance upon ZMAT3 interference in SWI1222 and HCEC1CT cells, not just in HepG2 and HCT116 cells. 

      We thank the reviewer for this suggestion. We have now quantified all immunoblots presented throughout the manuscript, including those shown in Figures 2B and 2E, and all other figures containing protein analyses. Band intensities were quantified using ImageJ densitometry and normalized to GAPDH as the loading control. In addition, as suggested, we examined HKDC1 protein levels following ZMAT3 knockdown in two additional cell lines, SW1222 and HCEC-1CT. Consistent with our observations in HepG2 and HCT116 cells, ZMAT3 depletion led to increased HKDC1 protein levels in both SW1222 and HCEC-1CT cells. These new data are now included in Figure 2-figure supplement 1F and G. We have updated the Results section, figure legends, and figures to reflect these additions.

      (5) Figure 3A: It is unclear which gene was knocked out in the "KO cells." The authors should clearly specify this.

      We thank the reviewer for pointing this out. We have now updated Figure 3A.

      (6) Figure 3D: The result appears counterintuitive in comparison to Figure 3E. Why does HKDC1 knockdown reduce cell confluency more in ZMAT3 KO cells than in control (ZMAT3 wild-type) cells? The authors should explain this discrepancy more clearly.

      We thank the reviewer for this insightful comment. As shown in Figure 3D and 3E, knockdown of HKDC1 resulted in a greater decrease in proliferation in ZMAT3-KO cells than in ZMAT3-WT cells. This observation was indeed unexpected, given that HKDC1 acts downstream of ZMAT3. One possible explanation is that elevated HKDC1 expression in ZMAT3-KO cells increases their reliance on HKDC1 for sustaining proliferation, and that HKDC1 may also participate in additional pathways in ZMAT3-KO cells. Consequently, transient knockdown of HKDC1 in ZMAT3-KO cells would have a more pronounced effect on proliferation due to their increased dependency on HKDC1 activity. In contrast, ZMAT3WT cells which express lower levels of HKDC1 are less dependent on its function and therefore less sensitive to its depletion. We have now clarified this point on page 8 of the revised manuscript.  

      Reviewer #3 (Recommendations for the authors):  

      (1) Why do the authors start their analysis by knocking out the p53 response element in Zmat3? That should be clarified. In addition, since clones were picked after CRISPR KO of Zmat3, were experiments done to confirm that p53 signaling was not disrupted?

      We thank the reviewer for this thoughtful question. We began our study by targeting the p53 response element (p53RE) in the ZMAT3 locus because the basal expression of ZMAT3 is regulated by p53 (Muys, Bruna R. et al., Genes & Development, 2021). Deleting the p53RE therefore allowed us to markedly reduce ZMAT3 expression without disrupting the entire ZMAT3 locus. We have clarified this rationale on page 4 of the revised manuscript. To ensure that p53 signaling was not affected by this modification, we verified that canonical p53 targets such as p21 were equivalently induced in both ZMAT3WT and KO cells following Nutlin treatment and that p53 induction was unchanged(Figure 4F and Figure 1 – figure supplement 1A).

      (2) Throughout the text, many immunoblots are used to validate the knockouts and knockdowns used, but some clarification is needed. In Figure S1A, the Zmat3-WT sample seems to have significantly more p53 than the Zmat3 KO sample. Does Zmat3 KO compromise p53 levels in other experiments? It would be good to understand if Zmat3 affects p53 function by affecting its levels. Also, the p21 blot is overloaded.

      We thank the reviewer for this helpful observation. To determine whether ZMAT3 knockout affects p53 function by affecting its levels, we repeated the experiment three independent times. Western blots from these biological replicates, together with protein quantification, are now included in Appendix-2 and Figure 1-figure supplement 1A. These data show no significant differences in p53 or p21 induction between ZMAT3-WT and ZMAT3-KO cells following Nutlin treatment. In the revised manuscript, we have replaced the blot in Figure 1-figure supplement 1A with a more representative image from one of these replicate experiments.

      In Figure 2E, HKDC1 protein levels are not shown for the SW1222 and HCEC-1CT cell lines, 

      We thank the reviewer for this suggestion. HKDC1 protein levels in SW1222 and HCEC1-CT cells following ZMAT3 knockdown are now included as Figure 2- figure supplement 1F and 1G, together with the corresponding quantification.

      and Zmat3 does not appear as its characteristic two bands on the blot. What does this signify?

      We thank the reviewer for this observation. Endogenous ZMAT3 typically appears as two closely migrating bands on immunoblots. As shown in Figure 4D and Appendix 2A and 2B, these two bands are observed at the expected molecular weight following Nutlin treatment and are specific to ZMAT3, as they are markedly reduced in ZMAT3-KO cells. In contrast, only a single ZMAT3 band is visible in Figure 2E. This likely reflects limited resolution of the two bands in some blots rather than a biological difference.   

      (3) Why does HKDC1 knockdown only have an effect on metabolic phenotypes when ZMAT3 is gone? In Figure 3A, there does not seem to be a decrease in hexokinase activity in the siCTRL + siHKDC1 condition compared to siCTRL alone. Also, in Figure 3A, does phosphorylation activity of HKDC1 necessarily reflect glucose uptake, as stated? Additionally, in Figure 3C, there is no effect on mitochondrial respiration with siHKDC1, even though recent studies have shown a significant effect of HKDC1 on this.

      We thank the reviewer for raising these important questions. As noted, HKDC1 knockdown alone in wild-type cells (siCTRL + siHKDC1) does not significantly reduce hexokinase activity (Figure 3A). This likely reflects the low basal expression of HKDC1 in these cells. Thus, the metabolic phenotype may only become apparent when HKDC1 expression exceeds a functional threshold, as observed in ZMAT3-KO cells where HKDC1 is upregulated.

      Regarding the glucose uptake assay, HKDC1 itself is not phosphorylated; rather, it phosphorylates a non-catabolizable glucose analog, 2-deoxyglucose (2-DG) upon cellular uptake. According to the manufacturer’s protocol, intracellular 2-DG is phosphorylated by hexokinases to 2-deoxyglucose-6-phosphate (2-DG6P), which cannot be further metabolized and therefore accumulates. The accumulated 2-DG6P is quantified using a luminescence-based readout. This assay is widely used as a surrogate for glucose uptake because it reflects both glucose import and phosphorylation — the first step of glycolytic flux. As for the lack of change in mitochondrial respiration (Figure 3C), we acknowledge that some studies have reported mitochondrial roles for HKDC1 under basal conditions; however, such effects may be cell type-specific.

      (4) The emphasis on glycolysis signatures is confusing, as in the end, glycolysis does not seem to be affected by ZMAT3 status, but mitochondrial respiration is affected. Can the text be clarified to address this? It is also difficult to understand the role of oxygen consumption rate (OCR) in ZMAT3 phenotypes, as it does not fully track with proliferation. For example, ZMAT3 KD has the highest OCR, and the other conditions have similar OCRs but different proliferative rates in Figure 3D. Also, the colors used in Figure 3 to denote different genotypes change between B/C and D, which is confusing.

      We thank the reviewer for pointing out the inconsistency in the colors of the graph in Figure 2, which we have now corrected. Our data indicates that ZMAT3 regulates mitochondrial respiration without significantly affecting glycolysis. It is possible that mitochondria in ZMAT3-KO cells are oxidizing more substrates that are not produced by glycolysis. Additional work will be required to fully determine these mechanisms. We have clarified this on page 8 of the revised manuscript.      

      (5) The lack of ZMAT3 binding to RNAs in PAR-CLIP is not proof that it does not do so. A more targeted approach should be used, using individual RIP assays. The authors should also analyze the splicing of HKDC1, which could be affected by ZMAT3.

      As suggested, we performed ZMAT3 RNA IP experiments (RIP) using doxycycline-inducible HCT116-ZMAT3-FLAG cells. However, we did not observe significant enrichment of HKDC1 mRNA in the ZMAT3 IPs (Figure 5 – figure supplement 1A), consistent with previously published ZMAT3 RIP-seq data (Bersani et al, Oncotarget, 2016). These findings further support the notion that ZMAT3 does not directly bind to HKDC1 mRNA in these cells. We Accordingly, we have modified the text on page 10 of the revised manuscript.

      In addition, as suggested by the reviewer, we analyzed changes in splicing of HKDC1 pre-mRNA using rMATS in HCT116 cells by comparing our previously published RNA-seq data from siCTRL and siZMAT3-transfected HCT116 cells (Muys et al, Genes Dev, 2021). We focused on splicing events with an FDR < 0.05 and a delta PSI > |0.1| (representing at least a 10% change in splicing). The splicing analysis (data not shown) did not reveal any significant alterations in HKDC1 pre-mRNA splicing upon ZMAT3 knockdown. Corresponding text has been updated on page 10 of the revised manuscript.

      (6) The authors say that they examine JUN binding at the HKDC1 promoter several times, but they focus on intron 1 in Figure 5. They should revise the text accordingly, and they should also show JUN ChIP data traces for the whole HKDC1 locus in Figure 5C.

      We thank the reviewer for this helpful suggestion. As recommended, we have revised the text throughout the manuscript and replaced HKDC1 promoter with HKDC1 intron 1 DNA to accurately reflect our analysis, and Figure 5 now shows the JUN ChIP-seq signal across the entire HKDC1 locus.

      (7) In the ZMAT3 and JUN interaction assays, were these tested in the presence of DNAse or RNAse to determine if nucleic acids mediate the interaction?

      We thank the reviewer for this valuable suggestion. To test whether nucleic acids mediate the ZMAT3-JUN interaction, we performed ZMAT3 immunoprecipitation (IPs) in the presence or absence of DNase and RNase from doxycycline-inducible ZMAT3-FLAG expressing HCT116 cells. The ZMAT3-JUN interaction was lost upon treatment with either DNase or RNase, indicating that the interaction is mediated by nucleic acids. This data has been added in the revised manuscript (Figure 5-figure supplement 1D and on page 11).

    1. eLife Assessment

      This important study provides the first putative evidence that alteration of the Hox code in neck lateral plate mesoderm is sufficient to induce ectopic development of forelimb buds at neck level. The authors use both gain-of-function (GOF) and loss-of-function (LOF) approaches in chick embryos to test the roles of Hox paralogy group (PG) 4-7 genes in limb development. The GOF data provide strong evidence that overexpression of Hox PG6/7 genes are sufficient to induce forelimb buds at neck level. However, the experiments using dominant negative constructs are lacking some key controls that are needed to demonstrate the specificity of the LOF effect rendering the work as a whole incomplete.

    2. Reviewer #2 (Public review):

      In the original review of this manuscript, I noted that this study provides the first evidence that alteration of the Hox code in neck lateral plate mesoderm is sufficient for ectopic forelimb budding. Their finding that ectopic expression of Hoxa6 or Hoxa7 induces wing budding at neck level, a demonstration of sufficiency, is of major significance. The experiments used to test the necessity of specific Hox genes for limb budding involved overexpression of dominant negative constructs, and there were questions about whether the controls were well designed. The reviewers made several suggestions for additional experiments that would address their concerns. In their responses to those comments, the authors indicated that they would conduct those experiments, and they acknowledged the requests for further discussion of a few points.

      In the revised version of the manuscript, the authors have provided additional RNA-seq data in Table 3, which lists 221 genes that are shared between the Hoxa6-induced limb bud and normal wing bud but not the neck. This shows that the ectopic limb bud has a limb-like character. The authors also expanded the discussion of their results in the context of previous work on the mouse. These changes have improved the paper.

      The authors elected not to conduct the co-transfection experiments that were suggested to test the ability of Hoxa4/a5 to block the limb-inducing ability of Hoxa6/a7. They also chose not to conduct the additional control experiments that were suggested for the dominant negative studies. The authors' justification for not conducting these experiments is provided in the responses to reviewers.

      The paper is improved over the previous version, but the conclusions, particularly regarding the dominant negative experiments, would have been strengthened by the additional experiments that were recommended by the reviewers. Under the current publishing model for eLife, it is the authors' prerogative to decide whether to revise in accordance with the reviewers' suggestions. Therefore, it seems to me that this version of the manuscript is the definitive version that the authors want to publish, and that eLife should publish it together with the reviewers' comments and the authors' responses.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      We thank the reviewer for emphasizing the importance of appropriate controls for the dominant-negative experiments. Dominant-negative Hox constructs have been successfully and widely used in previous studies, supporting the reliability of this approach. In our experiments, electroporation of the dominant-negative constructs into the limb field produced clear and reproducible effects when compared with both unoperated embryos and embryos electroporated with a GFP control construct. The GFP construct serves as an appropriate control, as it accounts for any effects of electroporation or exogenous protein expression without altering Hox gene function. We therefore conclude that the observed phenotypes specifically reflect dominant-negative Hox activity rather than procedural artifacts.

      The absence of overt limb phenotypes in PG4–PG7 mouse mutants likely reflects both functional redundancy among Hox paralogs and the difficulty of detecting subtle limbspecific effects in bilateral, systemically affected embryos. In contrast, the chick embryo system allows unilateral gene manipulation, providing an internal control and greater sensitivity for detecting weak or localized effects that may be masked in whole-animal mouse mutants.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      We thank the reviewer for this thoughtful suggestion. We fully agree that functional redundancy among Hox paralogs is an important consideration. However, Hox gene interactions are highly context-dependent and not strictly additive. Simultaneous interference with multiple Hox groups often leads to complex or compensatory effects that are difficult to interpret mechanistically, particularly when using dominant-negative constructs that may affect overlapping transcriptional networks.

      Our current experimental design, which targets individual paralog groups, allows us to attribute observed phenotypes to specific Hox activities and to interpret the results more precisely. Moreover, as shown in previous studies, simultaneous knockdown of multiple Hox genes does not necessarily produce stronger. For these reasons, we believe that the present single–dominant-negative experiments are the most informative and sufficient for addressing the specific questions in this study.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We thank the reviewer for this insightful suggestion. However, because of the extensive functional redundancy and regulatory interdependence within the Hox network, simultaneous inhibition of Hox4 and Hox5 is unlikely to produce a simple or interpretable outcome. Previous studies have shown that combinatorial Hox manipulations can trigger compensatory changes in other Hox genes, often obscuring rather than clarifying specific relationships.

      In our study, the proposed permissive role of Hox4/5 is supported by the spatial and temporal patterns of Hox expression and by the phenotypic effects observed upon individual dominant-negative perturbations. These data together suggest that Hox4/5 establish a forelimb-competent domain, on which Hox6/7 subsequently act to promote limb outgrowth. We therefore believe that the current evidence sufficiently supports this model without necessitating the additional combined experiment, which may not provide clear mechanistic insight due to redundancy effects.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      We thank the reviewer for this important comment. We agree that Tbx5 expression alone may be not sufficient to define forelimb identity. However, in our experiments, the induced bulge displays several additional characteristics consistent with early limb identity (in pre-AER stage). First, the Tbx5 expression we observe corresponds to the stage when the limb field is already specified, not the earlier broad mesodermal phase described in other systems. Second, the induced domain also expresses Lmx1, a marker of dorsal limb mesenchyme, further supporting its limb-specific nature. Third, our RNA sequencing analysis reveals upregulation of multiple genes associated with early limb development pathways, providing molecular evidence for limb-type identity rather than non-specific mesodermal expansion. Taken together, these results strongly indicate that the induced bulge represents a forelimb-like structure rather than a generic mesodermal thickening.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We thank the reviewer for this helpful suggestion. We have analyzed the cartilage structures of the operated embryos. No skeletal elements were detected within the ectopic wing bud in the neck region. Furthermore, we did not observe any significant structural changes in the wing skeleton following loss-of-function (dnHox) experiments. These observations indicate that the ectopic bulges do not progress to form skeletal elements, consistent with their identity as early limb-like outgrowths rather than fully developed limbs.

      Reviewer #2 (Public review):

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We thank the reviewer for raising this important point regarding the specificity of the dominant-negative constructs. The dnHox constructs used in this study were generated by truncating the C-terminal region of each Hox protein, a strategy that removes the homeodomain and has been demonstrated to act as a specific dominant-negative by interfering with the corresponding Hox function without broadly affecting unrelated Hox genes. This approach has been successfully validated and used in previous work (Moreau et al., Curr. Biol. 2019), where similar constructs effectively and specifically inhibited Hox activity in the chick embryo.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here)

      We thank the reviewer for this insightful suggestion. We agree that, in principle, coelectroporation of dnHox4/5 with Hox6/7 could test the hierarchical relationship between these genes. However, due to the extensive redundancy and regulatory interdependence among Hox genes, simultaneous manipulation of multiple genes often leads to compensatory effects or complex outcomes that are difficult to interpret mechanistically. As discussed in our response to Point 3 of the reviewer 1, inhibition of only one or two Hox4/5 paralogs is unlikely to completely abolish the permissive function of this group.

      Our current data — showing that Hox6/7 gain-of-function can induce ectopic limb-like outgrowths, while dnHox4/5 and dnHox6/7 lead to reduced limb formation — already provide strong evidence for both the necessity and sufficiency of these Hox activities in forelimb positioning. We therefore believe that the existing experiments adequately support our proposed model without the need for additional combinatorial manipulations.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We thank the reviewer for this constructive suggestion. In response, we have added a table (Table 3) listing the genes expressed in both the native limb/wing bud and the Hoxa6-induced wing bud, as identified from our RNA-Seq dataset. This table provides the underlying data for the Venn diagram, heatmap, and GO analysis presented in Figure 3. We agree that including these data improves transparency and helps readers better appreciate the molecular similarity between the induced and native limb buds.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      We thank the reviewer for this important point. We have addressed this issue in our response to Reviewer 1, Point 1, and have expanded the relevant discussion in the manuscript. Briefly, we believe that the apparent discrepancy between chick and mouse results arises from both the high degree of functional redundancy among Hox paralogs and the limitations of detecting subtle limb-specific effects in systemic mouse mutants, where both sides of the embryo are equally affected. In contrast, the chick system allows unilateral gene manipulation, providing an internal control and greatly enhancing sensitivity to detect weak or localized effects. Thus, the chick embryo model can reveal subtle Hox-dependent limb-induction activities that are masked in conventional mouse knockout approaches.

    1. eLife Assessment

      This study reports useful information on the mechanisms by which a high-fat diet induces arrhythmias in the model organism Drosophila. Specifically, the authors propose that adipokinetic hormone (Akh) secretion is increased with this diet, and through binding of Akh to its receptor on cardiac neurons, arrhythmia is induced. The authors have revised their manuscript, but in some areas the evidence remains incomplete, which the authors say future studies will be directed to closing the present gaps. Nonetheless, the data presented will be helpful to those who wish to extend the research to a more complex model system, such as the mouse.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

    3. Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Comments on revisions:

      The authors have addressed my other concerns. The only outstanding issue is in regard to the following comment:

      The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high fat diet compared to normal fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      The authors state that 8 sec time windows were selected at the discretion of the imager for analysis. I don't know how to avoid bias unless the person acquiring the imaging is blinded to the condition and the analysis is also done blind. Can you comment whether data acquisition and analysis was done in a blinded fashion? If not, this should be stated as a limitation of the study.

      Drosophila heart rate is highly variable. During the recording, we were biased to choose a time window when heartbeat was fairly stable. This is a limitation of the study, which we mentioned in the revised version. We chose to use intact over “semi-intact” flies with an intention to avoid damaging the cardiac neurons. 

      Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      We thank the reviewer for the positive comments. We believe that more signaling pathways are active in the AkhR neurons and regulate rhythmic heartbeat. We are current searching for the molecules and pathways that act on the AkhR cardiac neurons to regulate the heartbeat. Thus, AkhR neuron x shall have a more profound effect. Loss of AkhR is not equivalent to AkhR neuron ablation. 

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhRexpressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      We thank the reviewer for suggesting the detailed experiments and we believe that address these points shall consolidate the results. As AkhR-Gal4 also expresses in the fat body, we set out to build a more specific driver. We planned to use split-Gal4 system (Luan et al. 2006. PMID: 17088209). The combination of pan neuronal Elav-Gal4.DBD and AkhRp65.AD shall yield AkhR neuron specific driver. We selected 2580 bp AkhR upstream DNA and cloned into pBPp65ADZpUw plasmid (Addgene plasmid: #26234). After two rounds of injection, however, we were not able to recover a transgenic line.

      We used GCaMP to record the calcium signal in the AkhR neurons. AkhR-Gal4>GCaMP has extremely high levels of fluorescence in the cardiac neurons under normal condition.

      We are screening Gal4 drivers, trying to find one line that is specific to the cardiac neurons and has a lower level of driver activity.   

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UASrpr.

      We quantified the AkhR neuron ablation and found that about 69% (n=28) showed a single ACN in AkhR-Gal4>rpr flies. It is more challenging to quantify other AkhR-expressing cells, as they are wide-spread distributed. We tried to add more copies of UAS-rpr or AkhR-Gal4, which caused developmental defects (pupa lethality). Thus, as mentioned above, we are trying to find a more specific driver for targeting the cardiac neurons.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors refer 'crop' as the functional equivalent of the human stomach. Considering the difference in their primary functions, this cannot be justified.

      In Drosophila, the crop functions analogously to the stomach in vertebrates. It is a foregut storage and preliminary processing organ that regulates food passage into the midgut. It’s more than a simple reservoir. Crop engages in enzymatic mixing, neural control, and active motility.

      Line 163 and 166, APCs are not neurons.

      Akh-producing cells (APCs) in Drosophila are neuroendocrine cells, residing in the corpora cardiaca (CC). While they produce and secrete the hormone AKH (akin to glucagon), they are not brain interneurons per se. APCs share many neuronal features (vesicular release, axon-like projections) and receive neural inputs, effectively functioning as a peripheral endocrine center.

    1. eLife Assessment

      This study presents valuable findings by demonstrating that specific GPCR subtypes induce distinct extracellular vesicle miRNA signatures, highlighting a potential novel mechanism for intercellular communication with implications for receptor pharmacology within the field. The evidence is solid, however, more experiments are needed to determine whether the distinct extracellular vesicle miRNA signatures result from GPCR-dependent miRNA expression or GPCR-dependent incorporation of miRNAs into extracellular vesicles.

    2. Reviewer #1 (Public review):

      Summary:

      GPCRs affect the EV-miRNA cargoes

      Strengths:

      Novel idea of GPCRs-mediated control of EV loading of miRNAs

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is completely lacking.

      Comments on revisions:

      The revised version of the manuscript falls short of the required standard by lacking additional experiments. Some of the conditions for acceptability could have been met only through clarifying uncertainties via further experiments, which, unfortunately, have not been conducted.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathology processes.

      Methods:

      Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      No significant change in EV quantity or size following GPCR activation.

      Each GPCR triggered a distinct EV miRNA expression profile.

      miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Comments on revisions:

      All the comments have been taken into account. I wish the authors success in their future research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNAbinding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section (page 16-17).

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript (page 19-20). Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as an exploratory study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section (Page 19, second paragraph).

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies (page 19-20).  

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for Future Research:

      (1) Functionally validate top candidate miRNAs in recipient cells.

      We acknowledge that validating the target genes of the top candidate miRNAs is a crucial next step. In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      (2) Investigate other GPCR families and repeat in primary or disease-relevant cell lines.

      The inclusion of different GPCRs and cell lines is suggested as an area for further investigation in the discussion. (Page 19).

      (3) Apply similar approaches in in vivo models or patient samples to assess clinical relevance.

      In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      References

      Garcia-Martin, R., Wang, G., Brandão, B. B., Zanotto, T. M., Shah, S., Kumar Patel, S., Schilling, B., & Kahn, C. R. (2022). MicroRNA sequence codes for small extracellular vesicle release and cellular retention. Nature, 601(7893), 446-451. https://doi.org/10.1038/s41586021-04234-3  

      Jackson, L., Rennie, M., Poussaint, A., & Scarlata, S. (2022). Activation of Gαq sequesters specific transcripts into Ago2 particles. Sci Rep, 12(1), 8758. https://doi.org/10.1038/s41598022-12737-w  

      Liu, X.-M., & Halushka, M. K. (2025). Beyond the Bubble: A Debate on microRNA Sorting Into Extracellular Vesicles. Laboratory Investigation, 105(2), 102206. https://doi.org/https://doi.org/10.1016/j.labinv.2024.102206  

      McKenzie, A. J., Hoshino, D., Hong, N. H., Cha, D. J., Franklin, J. L., Coffey, R. J., Patton, J. G., & Weaver, A. M. (2016). KRAS-MEK Signaling Controls Ago2 Sorting into Exosomes. Cell  Rep, 15(5), 978-987. https://doi.org/10.1016/j.celrep.2016.03.085  

      Pultar, M., Oesterreicher, J., Hartmann, J., Weigl, M., Diendorfer, A., Schimek, K., Schädl, B., Heuser, T., Brandstetter, M., Grillari, J., Sykacek, P., Hackl, M., & Holnthoner, W. (2024).Analysis of extracellular vesicle microRNA profiles reveals distinct blood and lymphatic endothelial cell origins. J Extracell Biol, 3(1), e134. https://doi.org/10.1002/jex2.134  

      Teng, Y., Ren, Y., Hu, X., Mu, J., Samykutty, A., Zhuang, X., Deng, Z., Kumar, A., Zhang, L., Merchant, M. L., Yan, J., Miller, D. M., & Zhang, H.-G. (2017). MVP-mediated exosomal sorting of miR-193a promotes colon cancer progression. Nature Communications, 8(1), 14448. https://doi.org/10.1038/ncomms14448  

      Villarroya-Beltri, C., Gutiérrez-Vázquez, C., Sánchez-Cabo, F., Pérez-Hernández, D., Vázquez, J., Martin-Cofreces, N., Martinez-Herrera, D. J., Pascual-Montano, A., Mittelbrunn, M., & Sánchez-Madrid, F. (2013). Sumoylated hnRNPA2B1 controls the sorting of miRNAs into exosomes through binding to specific motifs. Nat Commun, 4, 2980. https://doi.org/10.1038/ncomms3980

      Welsh, J. A., Goberdhan, D. C. I., O'Driscoll, L., Buzas, E. I., Blenkiron, C., Bussolati, B., Cai, H., Di Vizio, D., Driedonks, T. A. P., Erdbrügger, U., Falcon-Perez, J. M., Fu, Q. L., Hill, A. F., Lenassi, M., Lim, S. K., Mahoney, M. G., Mohanty, S., Möller, A., Nieuwland, R., . . .Witwer, K. W. (2024). Minimal information for studies of extracellular vesicles (MISEV2023): From basic to advanced approaches. J Extracell Vesicles, 13(2), e12404. https://doi.org/10.1002/jev2.12404  

      Yoon, J. H., Jo, M. H., White, E. J., De, S., Hafner, M., Zucconi, B. E., Abdelmohsen, K., Martindale, J. L., Yang, X., Wood, W. H., 3rd, Shin, Y. M., Song, J. J., Tuschl, T., Becker, K. G., Wilson, G. M., Hohng, S., & Gorospe, M. (2015). AUF1 promotes let-7b loading on Argonaute 2. Genes Dev, 29(15), 1599-1604. https://doi.org/10.1101/gad.263749.115  

      Zubkova, E., Evtushenko, E., Beloglazova, I., Osmak, G., Koshkin, P., Moschenko, A., Menshikov, M., & Parfyonova, Y. (2021). Analysis of MicroRNA Profile Alterations in Extracellular Vesicles From Mesenchymal Stromal Cells Overexpressing Stem Cell Factor. Front Cell Dev Biol, 9, 754025. https://doi.org/10.3389/fcell.2021.754025

    1. eLife Assessment

      This fundamental study is part of an impressive, large-scale effort to assess the reproducibility of published findings in the field of Drosophila immunity. In a companion article, the authors analyze 400 papers published between 1959 and 2011, and assess how many of the claims in these papers have been tested in subsequent publications. In this article, the authors report the results of validation experiments to assess a subset of the claims that, according to the literature, have not been corroborated. While the evidence reported for some of these validation studies is convincing, it remains incomplete for others.

    2. Reviewer #1 (Public review):

      Summary:

      This work revisits a substantial part of the published literature in the field of Drosophila innate immunity from 1959 to 2011. The strategy has been to restrain the analysis to some 400 articles and then to extract a main claim, two to four major claims and up to four minor claims totaling some 2000 claims overall. The consistency of these claims with the current state-of-the-art has been evaluated and reported on a dedicated Web site known as ReproSci and also in the text as well as in the 28 Supplements that report experimental verification, direct or indirect, e.g., using novel null mutants unavailable at the time, of a selected set of claims made in several articles. Of note, this review is mostly limited to the manuscript and its associated supplements and does not integrally cover the ReproSci website.

      Strengths:

      One major strength of this article is that it tackles the issue of reproducibility/consistency on a large scale. Indeed, while many investigators have some serious doubts about some results found in the literature, few have the courage, or the means and time, to seriously challenge studies, especially if published by leaders in the field. The Discussion adequately states the major limitations of the ReproSci approach, which should be kept in mind by the reader to form their own opinion.

      This study also allows investigators not familiar with the field to have a clearer understanding of the questions at stake and to derive a more coherent global picture that allows them to better frame their own scientific questions. Besides a thorough and up-to-date knowledge of the literature used to assess the consistency of the claims with our current knowledge, a merit of this study is the undertaking of independent experiments to address some puzzling findings and the evidence presented is often convincing, albeit one should keep in mind the inherent limitations as several parameters are difficult to control, especially in the field of infections, as underlined by the authors themselves. Importantly, some work of the lead author has also been re-evaluated (Supplements S2-S4). Thus, while utmost caution should be exerted, and often is, in challenging claims, even if the challenge eventually proves to be not grounded, it is valuable to point out potential controversial issues to the scientific community.

      While this is not a point of this review, it should be acknowledged that the possibility to post comments on the ReproSci website will allow further readjustments by the community in the appreciation of the literature and also of the ReproSci assessments themselves and of its complementary additional experiments.

      Weaknesses:

      Challenging the results from articles is, by its very nature, a highly sensitive issue, and utmost care should be taken when challenging claims. While the authors generally acknowledge the limitations of their approach in the main text and Supplements, there are a few instances where their challenges remain questionable and should be reassessed. This is certainly the case for Supplement S18, for which the ReproSci authors make a claim for a point that was not made in the publication under scrutiny. The authors of that study (Ramet et al., Immunity, 2001) never claimed that scavenger receptor SR-CI is a phagocytosis receptor, but that it is required for optimal binding of S2 cells to bacteria. Westlake et al. here have tested for a role of this scavenger receptor in phagocytosis, which had not been tested by Ramet et al. Thus, even though the ReproSci study brings additional knowledge to our understanding of the function of SR-CI by directly testing its involvement in phagocytosis by larval hemocytes, it did not address the major point of the Ramet et al. study, SR-CI binding to bacteria, and thus inappropriately concludes in Supplement S18 that "Contrary to (Ramet et al., 2001, Saleh et al., 2006), we find that SR-CI is unlikely to be a major Drosophila phagocytic receptor for bacteria in vivo." It follows that the results of Ramet et al. cannot be challenged by ReproSci as it did not address this program. Of note, Saleh et al. (2006) also mistakenly stated that SR-CI impaired phagocytosis in S2 cells and could be used as a positive control to monitor phagocytosis in S2 cells. Their assay appears to have actually not monitored phagocytosis but the association of FITC-labeled bacteria to S2 cells by FACS, as they did not mention quenching the fluorescence of bacteria associated with the surface with Trypan blue.

      The inference method to assess the consistency of results with current knowledge also has limitations that should be better acknowledged. At times, the argument is made that the gene under scrutiny may not be expressed at the right time according to large-scale data or that the gene product was not detected in the hemolymph by a mass-spectrometry approach. While being in theory strong arguments, some genes, for instance, those encoding proteases at the apex of proteolytic activation cascades, need not necessarily be strongly expressed and might be released by a few cells. In addition, we are often lacking relevant information on the expression of genes of interest upon specific immune challenges such as infections with such and such pathogens.

      As regards mass spectrometry, there is always the issue of sensitivity that limits the force of the argument. Our understanding of melanization remains currently limited, and methods are lacking to accurately measure the killing activity associated with the triggering of the proPO activation cascade. In this study, the authors monitor only the blackening reaction of the wound site based on a semi-quantitative measurement. They are not attempting to use other assays, such as monitoring the cleavage of proPOs into active POs or measuring PO enzymatic activity. These techniques are sometimes difficult to implement, and they suffer at times from variability. Thus, caution should be exerted when drawing conclusions from just monitoring the melanization of wounds.

      Likewise, the study of phagocytosis is limited by several factors. As most studies in the field focus on adults, the potential role of phagocytosis in controlling Gram-negative bacterial infections is often masked by the efficiency of the strong IMD-mediated systemic immune response mediated by AMPs (Hanson et al, eLife, 2019). This problem can be bypassed in rare instances of intestinal infections by Gram-negative bacteria such as Serratia marcescens (Nehme et al., PLoS Pathogens, 2007) or Pseudomonas aeruginosa (Limmer et al. PNAS, 2011), which escape from the digestive tract into the hemocoel without triggering, at least initially, the systemic immune response. It is technically feasible to monitor bacterial uptake in adults by injecting fluorescently labeled bacteria and subsequently quenching the signal from non-ingested bacteria. Nonetheless, many investigators prefer to resort to ex vivo assays starting from hemocytes collected from third-instar wandering larvae as they are easier to collect and then to analyze, e.g., by FACS. However, it should be pointed out that these hemocytes have been strongly exposed to a peak of ecdysone, which may alter their properties. Like for S2 cells, it is thus not clear whether third-instar larval hemocytes faithfully reproduce the situation in adults. The phagocytic assays are often performed with killed bacteria. Evidence with live microorganisms is better, especially with pathogens. Assays with live bacteria require however, an antibody used in a differential permeabilization protocol. Furthermore, the killing method alters the surface of the microorganisms, a key property for phagocytic uptake. Bacterial surface changes are minimal when microorganisms are killed by X-ray or UV light. These limitations should be kept in mind when proceeding to inference analysis of the consistency of claims. Eater illustrates this point well. Westlake et al. state that:" [...] subsequent studies showed that a null mutation of eater does not impact phagocytosis". The authors refer here to Bretscher et al., Biology Open, 2015, in which binding to heat-killed E. coli was assessed in an ex vivo assay in third instar larvae. In contrast, Chung and Kocks (JBC, 2011) tested whether the recombinant extracellular N-terminal ligand-binding domain was able to bind to bacteria. They found that this domain binds to live Gram-positive bacteria but not to live Gram-negative bacteria. For the latter, killing bacteria with ethanol or heating, but not by formaldehyde treatment, allowed binding. More importantly, Chung and Kocks documented a complex picture in which AMPs may be needed to permeabilize the Gram-negative bacterial cell wall that would then allow access of at least the recombinant secreted Eater extracellular domain to peptidoglycan or peptidoglycan-associated molecules. Thus, the systemic Imd-dependent immune response would be required in vivo to allow Eater-dependent uptake of Gram-negative bacteria by adult hemocytes. In ex vivo assays, any AMPs may be diluted too much to effectively attack the bacterial membrane. A prediction is then that there should be an altered phagocytosis of Gram-negative bacteria in IMD-pathway mutants, e.g., an imd null mutant but not the hypomorphic imd[1] allele. This could easily be tested by ReproSci using the adult phagocytosis assay used by Kocks et al, Cell, 2005. At the very least, the part on the role of Eater in phagocytosis should take the Chung &Kocks study into account, and the conclusions modulated.

      Another point is that some mutant phenotypes may be highly sensitive to the genetic background, for instance, even after isogenization in two different backgrounds. In the framework of a Reproducibility project, there might be no other option for such cases than direct reproduction of the experiment as relying solely on inference may not be reliable enough.

      With respect to the experimental part, some minor weaknesses have been noted. The authors rely on survival to infection experiments, but often do not show any control experiments with mock-challenged or noninfected mutant fly lines. In some cases, monitoring the microbial burden would have strengthened the evidence. For long survival experiments, a check on the health status of the lines (viral microbiota, Wolbachia) would have been welcome. Also, the experimental validation of reagents, RNAi lines, or KO lines is not documented in all cases.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an ambitious and large-scale reproducibility analysis of 400 articles on Drosophila immunity published before 2011. They extract major and minor claims from each article, assess their verifiability through literature comparison and, when possible, through targeted experimental re-testing, and synthesize their findings in an openly accessible online database. The goal is to provide clarity to the community regarding claims that have been contradicted, incompletely supported, or insufficiently followed up in the literature, and to foster broader community participation in evaluating historical findings. The manuscript summarizes the major insights emerging from this systematic effort.

      Strengths:

      (1) Novelty and community value: This work represents a rare example of a systematic, transparent, and community-facing reproducibility project in a specific research domain. The creation of a dedicated public platform for disseminating and discussing these assessments is particularly innovative.

      (2) Breadth and depth: The authors analyze an impressive number of publications spanning multiple decades, and they couple literature-based assessments with new experimental data where follow-up is missing.

      (3) Clarity of purpose: The manuscript carefully distinguishes between assessing evidential support for claims and judging the scientific merit of historical work. This helps frame the project as constructive rather than punitive.

      (4) Metascientific relevance: The analysis identifies methodological and contextual factors that commonly underlie irreproducible claims, providing a useful guide for future study design and interpretation.

      (5) Transparency: Supplementary datasets and the public website provide an exceptional degree of openness, which should facilitate community engagement and further refinement.

      Weaknesses:

      (1) Subjectivity in selection: Despite the authors' efforts, the choice of which papers and claims to highlight cannot be entirely objective. This is an inherent limitation of any retrospective curation effort, but it remains important to acknowledge explicitly.

      (2) Emphasis on irreproducible claims: The manuscript focuses primarily on claims that are challenged or found to be weakly supported. While understandable from the perspective of novelty, this emphasis may risk overshadowing the value of claims that are well supported and reproducible.

      (3) Framing and language: Certain passages could benefit from more neutral phrasing and avoidance of binary terms such as "correct" or "incorrect," in keeping with the open-ended and iterative nature of scientific progress.

      (4) Community interaction with the dataset: While the website is an excellent resource, the manuscript could further clarify how the community is expected to contribute, challenge, or refine the annotations, especially given the large volume of supplementary data.

      (5) Minor inconsistency: The manuscript states that papers from 1959-2011 were included, but the Methods section mentions a range beginning in 1940. This should be aligned for clarity.

      Impact and significance:

      This contribution is likely to have a meaningful impact on both the Drosophila immunity community and the broader scientific ecosystem. It highlights methodological pitfalls, encourages transparent post-publication evaluation, and offers a reusable framework that other fields could adopt. The work also has pedagogical value for early-career researchers entering the field, who often struggle to navigate contradictory or outdated claims. By centralizing and contextualizing these discussions, the manuscript should help accelerate more robust and reproducible research.

    4. Reviewer #3 (Public review):

      Summary:

      In this ambitious study, the authors set out to analyse the validity of a number of claims, both minor and major, from 400 published articles within the field of Drosophila immunity that were published before 2011. The authors were able to determine initially if claims were supported by comparing them to other published literature in the field and, if required, by experimentally testing 'unchallenged' claims that had not been followed up in subsequent published literature. Using this approach, the authors identified a number of claims that had contradictory evidence using new methods or taking into account developments within the field post-initial publication. They put their findings on a publicly available website designed to enable the research community to assess published work within the field with greater clarity.

      Strengths:

      The work presented is rigorous and methodical, the data presentation is high quality, and importantly, the data presented support the conclusions. The discussion is balanced, and the study is written considerately and respectfully, highlighting that the aim of the study is not to assign merit to individual scientists or publications but rather to improve clarity for scientists across the field. The approach carried out by the researchers focuses on testing the validity of the claims made in the original papers rather than testing whether the original experimental methods produced reproducible results. This is an important point since there are many reasons why the original interpretation of data may have understandably led to the claims made. These potential explanations for irreproducible data or conclusions are discussed in detail by the authors for each claim investigated.

      The authors have generated an accompanying website, which provides a valuable tool for the Drosophila Immunity research community that can be used to fact-check key claims and encourages community engagement. This will achieve one important goal of this study - to prevent time loss for scientists who base their research on claims that are irreproducible. The authors rightly point out that it is impossible (and indeed undesirable) to avoid publication of irreproducible results within a field since science is 'an exploratory process where progress is made by constant course correction'. This study is, however, an important piece of work that will make that course correction more efficient.

      Weaknesses:

      I have little to recommend for the improvement of this manuscript. As outlined in my comments above, I am very supportive of this manuscript and think it is a bold and ambitious body of work that is important for the Drosophila immunity field and beyond.

    5. Reviewer #4 (Public review):

      This is an important paper that can do much to set an example for thoughtful and rigorous evaluation of a discipline-wide body of literature. The compiled website of publications in Drosophila immunity is by itself a valuable contribution to the field. There is much to praise in this work, especially including the extensive and careful evaluation of the published literature. However, there are also cautions.

      One notable concern is that the validation experiments are generally done at low sample sizes and low replication rates, and often lack statistical analysis. This is slippery ground for declaring a published study to be untrue. Since the conclusions reported here are nearly all negative, it is essential that the experiments be performed with adequate power to detect the originally described effects. At a minimum, they should be performed with the same sample size and replication structure as the originally reported studies.

      The first section of Results should be an overview of the general accuracy of the literature. Of all claims made in the 400 evaluated papers, what proportion fell into each category of "verified", "unchallenged", "challenged", "mixed", or "partially verified"? This summary overview would provide a valuable assessment of the field as a whole. A detailed dispute of individual highlighted claims could follow the summary overview.

      Section headings are phrased as declarative statements, "Gene X is not involved in process Y", which is more definitive phrasing than we typically use in scientific research. It implies proving a negative, which is difficult and rare, and the evidence provided in the present manuscript generally does not reach that threshold. A more common phrasing would be "We find no evidence that gene X contributes to process Y". A good model for this more qualified phrasing is the "We conclude that while Caspar might affect the Imd pathway in certain tissue-specific contexts, it is unlikely to act as a generic negative regulator of the Imd pathway," concluding the section on the role of Caspar. I am sure the authors feel that the softer, more qualified phrasing would undermine their article's goal of cleansing the literature of inaccuracies, but the hard declarative 'never' statements are difficult to justify unless every validation experiment is done with a high degree of rigor under a variety of experimental conditions. This caveat is acknowledged in the 3rd paragraph of the Discussion, but it is not reflected in the writing of the Results. The caveat should also appear in the Introduction.

      The article is clear that "Claims were assessed as verified, unchallenged, challenged, mixed, or partially verified," but the project is called "reproducibility project" in the 7th line of the abstract, and the website is "ReproSci". The fourth line of the abstract and the introduction call some published research "irreproducible". Most of the present manuscript does not describe reproduction or replication. It describes validation, or independent experimental tests for consistency. Published work is considered validated if subsequent studies using distinct approaches yielded consistent results. For work that the authors consider suspicious, or that has not been subsequently tested, the new experiments provided here do not necessarily recreate the published experiment. Instead, the published result is evaluated with experiments that use different tools or methods, again testing for consistency of results. This is an important form of validation, but it is not reproduction, and it should not be referred to as such. I strongly suggest that variations of the words "reproducible" or "replication" be removed from the manuscript and replaced with "validation". This will be more scientifically accurate and will have the additional benefit of reducing the emotional charge that can be associated with declaring published research to be irreproducible.

      The manuscript includes an explanatory passage in the Results section, "Our project focuses on assessing the strength of the claims themselves (inferential/indirect reproducibility) rather than testing whether the original methods produce repeatable results (results/direct reproducibility). Thus, our conclusions do not directly challenge the initial results leading to a claim, but rather the general applicability of the claim itself." Rather than first appearing in Results, this statement should appear prominently in the abstract and introduction because it is a core element of the premise of the study. This can be combined with the content of the present Disclaimer section into a single paragraph in the Introduction instead of appearing in two redundant passages. I would again encourage the authors to substitute the word validation for reproduction, which would eliminate the need for the invented distinction between indirect versus direct reproduction. It is notable that the authors have chosen to title the relevant Methods section "Experimental Validation" and not "Replication".

      Experimental data "from various laboratories" in the last paragraph of the Introduction and the first paragraph of the Results are ambiguous. Since these new experiments are part of the central core of the manuscript, the specific laboratories contributing them should be named in the two paragraphs. If experiments are being contributed by all authors on the manuscript, it would suffice to say "the authors' laboratories". The attribution to "various labs" appears to be contradicted by the Discussion paragraph 2, which states "the host laboratory has expertise in" antibacterial and antifungal defense, implying a single lab. The claim of expertise by the lead author's laboratory is unnecessary and can be deleted if the Lemaitre lab is the ultimate source of all validation experiments.

      The passage on the controversial role of Duox in the gut is balanced and scholarly, and stands out for its discussion of multiple alternative lines of evidence in the published literature and supplement. This passage may benefit from research by multiple groups following up on the original claims that are not available for other claims, but the tone of the Duox section can be a model for the other sections.

      Comments on other sections and supplements:

      I understand the desire to explain how original results may have been obtained when they are not substantiated by subsequent experiments. However, statements such as "The initial results may have been obtained due to residual impurities in preparations of recombinant GNBP1" and "Non-replicable results on the roles of Spirit, Sphinx and Spheroide in Toll pathway activation may be due to off-target effects common to first-generation RNAi tools" are speculation. No experimental data are presented to support these assertions, so these statements and others like them (currently at the end of most "insights" sections) should not appear in Results. I recognize that the authors are trying to soften their criticism of prior studies by providing explanations for how errors may have occurred innocently. If they wish to do so, the speculative hypotheses should appear in the Discussion.

      The statement in Results that "The initial claim concerning wntD may be explained by a genetic background effect independent of wntD" similarly appears to be a speculation based on the reading of the main text Results. However, the Discussion clarifies that "Here, we obtained the same results as the authors of the claim when using the same mutant lines, but the result does not stand when using an independent mutant of the same gene, indicating the result was likely due to genetic background." That additional explanation in the Discussion greatly increases reader confidence in the Result and should be explained with reference to S5 in the Results. Such complete explanations should be provided everywhere possible without requiring the reader to check the Supplement in each instance.

      In some cases, such as "The results of the initial papers are likely due to the use of ubiquitous overexpression of PGRP-LE, resulting in melanization due to overactivation of the Imd pathway and resulting tissue damage", the claim to explain the original finding would be easy to test. The authors should perform those tests where they can, if they wish to retain the statements in the manuscript. Similarly, the claim "The published data are most consistent with a scenario in which RNAi generated off-target knockdown of a protein related to retinophilin/undertaker, while Undertaker itself is unlikely to have a role in phagocytosis" would be stronger if the authors searched the Drosophila genome for a plausible homolog that might have been impacted by the RNAi construct, and then put forth an argument as to why the off-target gene is more likely to have generated the original phenotype than the nominally targeted gene. There is a brief mention in S19 that junctophilin is the authors' preferred off-target candidate, but no evidence or rationale is presented to support that assertion. If the original RNAi line is still available, it would be easy enough to test whether junctophilin is knocked down as an off-target, and ideally then to use an independent knockdown of junctophilin to recapitulate the original phenotype. Otherwise, the off-target knockdown hypothesis is idle speculation.

      A good model is the passage on extracellular DNA, which states, "experiments performed for ReproSci using the original DNAse IIlo hypomorph show that elevated Diptericin expression in the hypomorph is eliminated by outcrossing of chromosome II, and does not occur in an independent DNAse II null mutant, indicating that this effect is due to genetic background (Supplementary S11)." In this case, the authors have performed a clear experiment that explains the original finding, and inclusion of that explanation is warranted. Similar background replacement experiments in other validations are equally compelling.

      The statement "Analysis of several fly stocks expected to carry the PGRP-SDdS3 mutation used in the initial study revealed the presence of a wild-type copy PGRP-SD, suggesting that either the stock used in this study did not carry the expected mutation, or that the mutation was lost by contamination prior to sharing the stock with other labs" provides a documentable explanation of a potential error in the original two manuscripts, but the subsequent "analysis of several fly stocks" needs citations to published literature or explanation in the supplement. It is unclear from this passage how the wildtype allele in the purportedly mutant stocks could have led to the misattribution of function to PGRP-SD, so that should be explained more clearly in the manuscript.

      The originally claimed anorexia of the Gr28b mutation is explained as having been "likely obtained due to comparison to a wild-type line with unusually high feeding rates". This claim would be stronger if the wildtype line in question were named and data showing a high rate of feeding were presented in the supplement or cited from published literature. Otherwise, this appears to be speculation.

      In the section "The Toll immune pathway is not negatively regulated by wntD", FlyAtlas is cited as evidence that wntD is not expressed in adult flies. However, the FlyAtlas data is not adequately sensitive to make this claim conclusively. If the present authors wish to state that wntD is not expressed in adults, they should do a thorough test themselves and report it in the Supplement.

      Alternatively, the statement "data from FlyAtlas show that wntD is only expressed at the embryonic stage and not at the adult stage at which the experiments were performed by (Gordon et al., 2005a)" could be rephrased to something like "data from FlyAtlas show strong expression of wntD in the embryo but not the adult" and it should be followed by a direct statement that adult expression was also found to be near-undetectable by qPCR in supplement S5. That data is currently "not shown" in the supplement, but it should be shown because this is a central result that is being used to refute the original claim. This manuscript passage should also describe the expression data described in Gordon et al. (2005), for contrast, which was an experimental demonstration of expression in the embryo and a claim "RT-PCR was used to confirm expression of endogenous wntD RNA in adults (data not shown)."

      Inclusion of the section on croquemort is curious because it seems to be focused exclusively on clearance of apoptotic cells in the embryo, not on anything related to immunity. The subsection is titled "Croquemort is not a phagocytic engulfment receptor for apoptotic cells or bacteria", but the text passage contains no mention of phagocytosis of bacteria, and phagocytosis of bacteria is not tested in the S17 supplement. I would suggest deleting this passage entirely if there is not going to be any discussion of the immune-related phenotypes.

      The claim "Toll is not activated by overexpression of GNBP3 or Grass: Experiments performed for ReproSci find that contrary to previous reports, overexpression of GNBP3 (Gottar et al., 2006) or<br /> Grass (El Chamy et al., 2008) in the absence of immune challenge does not effectively activate Toll signaling (Supplementaries S6, S7)" is overly strongly stated unless the authors can directly repeat the original published studies with identical experimental conditions. In the absence of that, the claim in the present manuscript needs to be softened to "we find no evidence that..." or something similar. The definitive claim "does not" presumes that the current experiments are more accurate or correct than the published ones, but no explanation is provided as to why that should be the case. In the absence of a clear and compelling argument as to why the current experiment is more accurate, it appears that there is one study (the original) that obtained a certain result and a second study (the present one) that did not. This can be reported as an inconsistency, but the second experiment does not prove that the first was an error. The same comment applies to the refutation of the roles for Edin and IRC. Even though the current experiments are done in the context of a broader validation study, this does not automatically make them more correct. The present work should adhere to the same standards of reporting that we expect in any other piece of science.

      The statement "Furthermore, evidence from multiple papers suggests that this result, and other instances where mutations have been found to specifically eliminate Defensin expression, is likely due to segregating polymorphisms within Defensin that disrupt primer binding in some genetic backgrounds and lead to a false negative result (Supplementary S20)" should include citations to the multiple papers being referenced. This passage would benefit from a brief summary of the logic presented in S20 regarding the various means of quantifying Defensin expression.

      In S22 Results, the statement "For general characterization of the IrcMB11278 mutant, including developmental and motor defects and survival to septic injury, see additional information on the ReproSci website" is not acceptable. All necessary information associated with the paper needs to be included in the Supplement. There cannot be supporting data relegated to an independent website with no guaranteed stability or version control. The same comment applies to "Our results show that eiger flies do not have reduced feeding compared to appropriate controls (See ReproSci website)" in S25.

      Supplement S21 appears to show a difference between the wildtype and hemese mutants in parasitoid encapsulation, which would support the original finding. However, the validation experiment is performed at a small sample size and is not replicated, so there can be no statistical analysis. There is no reported quantification of lamellocytes or total hemocytes. The validation experiment does not support the conclusion that the original study should be refuted. The S21 evaluation of hemese must either be performed rigorously or removed from the Supplement and the main text.

      In S22, the second sentence of the passage "Due to the fact that IrcMB11278 flies always survived at least 24h prior to death after becoming stuck to the substrate by their wings, we do not attribute the increased mortality in Ecc15-fed IrcMB11278 flies primarily to pathogen ingestion, but rather to locomotor defects. The difference in survival between sucrose-fed and Ecc15-fed IrcMB11278 flies may be explained by the increased viscosity of the Ecc15-containing substrate compared to the sucrose-containing substrate" is quite strange. The first sentence is plausible and a reasonable interpretation of the observations. But to then conclude that the difference between the bacterial treatment versus the control is more plausibly due to substrate viscosity than direct action of the bacteria on the fly is surprising. If the authors wish to put forward that interpretation, they need to test substrate viscosity and demonstrate that fly mortality correlates with viscosity. Otherwise, they must conclude that the validation experiment is consistent with the original study.

      In S27, the visualization of eiger expression using a GFP reporter is very non-standard as a quantitative assay. The correct assay is qPCR, as is performed in other validation experiments, and which can easily be done on dissected fat body for a tissue-specific analysis. S27 Figure 1 should be replaced with a proper experiment and quantitative analysis. In S27 Figure 2, the authors should add a panel showing that eiger is successfully knocked down with each driver>construct combination. This is important because the data being reported show no effect of knockdown; it is therefore imperative to show that the knockdown is actually occurring. The same comment applies everywhere there is an RNAi to demonstrate a lack of effect.

      The Drosomycin expression data in S3 Figure 2A look extremely noisy and are presented without error bars or statistical analysis. The S4 claim that sphinx and spheroid are not regulators of the Toll pathway because quantitative expression levels of these genes do not correlate with Toll target expression levels is an extremely weak inference. The RNAi did not work in S4, so no conclusion should be inferred from those experiments. Although the original claims in dispute may be errors in both cases, the validation data used to refute the original claims must be rigorous and of an acceptable scientific standard.

      In S6 Figure 1, it is inappropriate to plot n=2 data points as a histogram with mean and standard errors. If there are fewer than four independent points, all points should be plotted as a dot plot. This comment applies to many qPCR figures throughout the supplement. In S7 Figure 1, "one representative experiment" out of two performed is shown. This strongly suggests that the two replicates are noisy, and a cynical reader might suspect that the authors are trying to hide the variance. This also applies to S5 Fig 3. Particularly in the context of a validation study, it is imperative to present all data clearly and objectively, especially when these are the specific data that are being used to refute the claim.

      Other comments:

      In S26, the authors suggest that much of the observed melanization arises from excessive tissue damage associated with abdominal injection contrasted to the lesser damage associated with thoracic injection. I believe there may be a methodological difference here. The Methods of S27 are not entirely clear, but it appears that the validation experiment was done with a pinprick, whereas the original Mabary and Schneider study was done with injection via a pulled capillary. My lab group (and I personally) have extensive experience with both techniques. In our hands, pinpricks to the abdomen do indeed cause substantial injury, and the physically less pliable thorax is more robust to pinpricks. However, capillary injections to the abdomen do virtually no tissue damage - very probably less than thoracic injections - and result in substantially higher survivals of infection even than thoracic injections. Thus, the present manuscript may infer substantial tissue damage in the original study because they are employing a different technique.

    1. eLife Assessment

      This important study builds on previous work from the same authors to present a conceptually distinct workflow for cryo-EM reconstruction that uses 2D template matching to enable high-resolution structure determination of small (sub-50 kDa) protein targets. The paper describes how density for small-molecule ligands bound to such targets can be reconstructed without these ligands being present in the template. However, the evidence described for the claim that this technique "significantly" improves the alignment of the reconstruction of small complexes is incomplete. The authors could better evaluate the effects of model bias on the reconstructed densities.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes an application of the high-resolution cryo-EM 2D template matching technique to sub-50kDa complexes. The paper describes how density for ligands can be reconstructed without having to process cryo-EM data through the conventional single particle analysis pipelines.

      Strengths:

      This paper contributes additional data (alongside other papers by the same authors) to convey the message that high-resolution 2D template matching is a powerful alternative for cryo-EM structure determination. The described application to ligand density reconstruction, without the need for extensive refinements, will be of interest to the pharmaceutical industry, where often multiple structures of the same protein in complex with different ligands are solved as part of their drug development pipelines. Improved insights into which particles contribute to the best ligand density are also highly valuable and transferable to other applications of the same technique.

      Weaknesses:

      Although the convenient visualisation of small molecules bound to protein targets of a known structure would be relevant for the pharmaceutical industry, the evidence described for the claim that this technique "significantly" improves alignment of reconstruction of small complexes is incomplete. The authors are encouraged to better evaluate the effects of model bias on the reconstructed densities in a revised paper.

    3. Reviewer #2 (Public review):

      In this manuscript, Zhang et al describe a method for cryo-EM reconstruction of small (sub-50kDa) complexes using 2D template matching. This presents an alternative, complementary path for high-resolution structure determination when there is a prior atomic model for alignment. Importantly, regions of the atomic model can be deleted to avoid bias in reconstructing the structure of these regions, serving as an important mechanism of validation.

      The manuscript focuses its analysis on a recently published dataset of the 40kDa kinase complex deposited to EMPIAR. The original processing workflow produced a medium resolution structure of the kinase (GSFSC ~4.3A, though features of the map indicate ~6-7A resolution); at this resolution, the binding pocket and ligand were not resolved in the original published map. With 2DTM, the authors produce a much higher resolution structure, showing clear density for the ATP binding pocket and the bound ATP molecule. With careful curation of the particle images using statistically derived 2DTM p-values, a high-resolution 2DTM structure was reconstructed from just 8k particles (2.6A non-gold standard FSC; ligand Q-score of 0.6), in contrast to the 74k particles from the original publication. This aligns with recent trends that fewer, higher-quality particles can produce a higher-quality structure. The authors perform a detailed analysis of some of the design choices of the method (e.g., p-value cutoff for particle filtering; how large a region of the template to delete).

      Overall, the workflow is a conceptually elegant alternative to the traditional bottom-up reconstruction pipeline. The authors demonstrate that the p-values from 2DTM correlations provide a principled way to filter/curate which particle images to extract, and the results are impressive. There are only a few minor recommendations that I could make for improvement.

    4. Reviewer #3 (Public review):

      Summary:

      Due to the low SNR of cryo-EM micrographs necessitated by radiation damage, determining the structure of proteins smaller than 50 kDa is exceedingly challenging, such that only a handful have been solved to date. This work aims to improve the reconstruction of small proteins in single-particle cryo-EM by using high-resolution 2D template matching, an algorithm previously used to locate and align macromolecules in situ, to align and reconstruct small proteins. This approach uses an existing macromolecular structure, either experimentally determined or predicted by AlphaFold, to simulate a noise-free 3D reference and generates whitened projections, crucially including high-spatial-frequency information, to align particles by the orientation with maximal cross-correlation. They demonstrate the success of this approach by generating a 3D reconstruction from an existing dataset of a 41.3 kDa protein kinase that had previously evaded attempts at high-resolution structure determination. To alleviate concerns that this is purely from template bias, they demonstrate clear density at two regions that were not present in the template: 6 residues in an alpha helix and an ATP in the ligand binding pocket. The latter is particularly important for its implications in determining structures of ligand-bound proteins for drug discovery. Additionally, the authors provide an update to the classic calculation in Henderson 1995 to predict the minimum molecular mass of a protein that can be solved by single-particle cryo-EM.

      Strengths:

      I am in no doubt that this technique can be used to gain valuable insights into the structures of small proteins, and this is an important advancement for the field. The ability to determine the structure of ligands in a binding site is particularly important, and this paper provides a method of doing that which outperforms traditional single-particle cryo-EM processing workflows.

      The claim that using high-spatial frequency information is essential for aligning small proteins is a valuable insight. A recent pre-print published at a similar time to this manuscript used high-resolution information in standard ab-initio reconstruction to generate a high-resolution reconstruction from the same dataset, supporting the claims made in the manuscript.

      The theoretical section outlined in the appendix is also theoretically sound. It uses the same logic as Henderson, but applies more up-to-date knowledge, such as incorporating dose-weighting and altering the cross-correlation-based noise estimation. This update is valuable for understanding factors preventing us from reaching the theoretical limit.

      Weaknesses:

      Given that this technique creates template bias, only parts of the reconstruction not in the template can be trusted, unlike standard single-particle processing, where the independent half-maps from separate, ab initio templates are used to generate a 3D reconstruction. Although, in principle, one could perform the search many times such that every residue has been omitted in at least one search, this will be extremely computationally intensive and was not demonstrated in this manuscript. It is therefore currently only realistically applicable when only a small portion of the sub-50 kDa protein is of interest.

      The applicability of this technique to more than a single target was also not demonstrated, and there are concerns that it may not work effectively in many cases. The authors note in the results that "the ATP density was consistently recovered more robustly than nearby residues" and speculate that this may be because misalignments disproportionately blur peripheral residues. Since the region of interest in a structure is not necessarily in the center, this may need further investigation. The implications of this statement may also be unclear to the reader. For example, can this issue be minimized by having the region of interest centered in the simulated volume?

      In Figure 3, the authors demonstrate that it is not solely improved particle filtering and a noise-free reference that improves alignment, but that the high spatial frequency information is important. This information is very valuable since it can be applied to other, more standard methods. However, this key figure is not as clear or convincing as it could be. The FSC curves are possibly misleading, since the reduced resolution could be explained by reduced template bias when auto-refining with a map initially low-pass filtered to 10 Å. Moreover, although the helix reconstruction does look slightly better using the 2DTM angles, the improvement in density for ATP in the binding pocket is not clear. A qualitative argument only clear in one out of two cases is not as convincing as a quantitative metric across more examples.

    1. eLife Assessment

      This work identifies a novel, conserved link between glycolysis and sulfur metabolism that governs fungal morphogenesis and virulence. The compelling evidence, integrating multiple approaches, provides an important conceptual advance. A future mechanistic dissection of how sulfur metabolites interface with known pathways is encouraged.

    2. Reviewer #1 (Public review):

      Summary:

      Fungal survival and pathogenicity rely on the ability to undergo reversible morphological transitions, which is often linked to nutrient availability. In this study, the authors uncover a conserved connection between glycolytic activity and sulfur amino acid biosynthesis that drives morphogenesis in two fungal model systems. By disentangling this process from canonical cAMP signaling, the authors identify a new metabolic axis that integrates central carbon metabolism with developmental plasticity and virulence.

      Strengths:

      The study integrates different experimental approaches, including genetic, biochemical, transcriptomic and morphological analyses and convincingly demonstrates that perturbations in glycolysis alters sulfur metabolic pathways and thus impacts pseudohyphal and hyphal differentiation. Overall, this work offers new and important insights into how metabolic fluxes are intertwined with fungal developmental programs and therefore opens new perspectives to investigate morphological transitioning in fungi.

      Importantly, in the revised version the authors now substantiate the transcriptomic findings by RT-qPCR analyses in the pfk1ΔΔ and adh1ΔΔ strains, demonstrating that genetic disruption of glycolytic flux generally mirrors the effects of 2-deoxyglucose treatment. The manuscript's discussion has also been strengthened by explicitly addressing why cysteine and methionine differ in their ability to rescue filamentation in S. cerevisiae versus C. albicans, highlighting species-specific differences in sulfur uptake and transsulfuration pathways.

      Overall, this revised manuscript provides compelling evidence for a previously unrecognized coupling between glycolysis and sulfur metabolism that shapes fungal morphogenesis and virulence. It opens new perspectives on metabolic control of fungal development and raises interesting mechanistic questions for future work.

      Comments on revisions:

      The authors have incorporated all of my suggested changes and addressed all raised concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript investigates the interplay between glycolysis and sulfur metabolism in regulating fungal morphogenesis and virulence. Using both Saccharomyces cerevisiae and Candida albicans, the authors demonstrate that glycolytic flux is essential for morphogenesis under nitrogen-limiting conditions, acting independently of the established cAMP-PKA pathway. Transcriptomic and genetic analyses reveal that glycolysis influences the de novo biosynthesis of sulfur-containing amino acids, specifically cysteine and methionine. Notably, supplementation with sulfur sources restores morphogenetic and virulence defects in glycolysis-deficient mutants, thereby linking core carbon metabolism with sulfur assimilation and fungal pathogenicity.

      Strengths:

      The work identifies a previously uncharacterized link between glycolysis and sulfur metabolism in fungi, bridging metabolic and morphogenetic regulation which is an important conceptual advance and fungal pathogenicity. Demonstrating that adding cysteine supplementation rescues virulence defects in animal model connects basic metabolism to infection outcomes that add on biomedical importance.

      Comments on revisions:

      The authors have sufficiently addressed my concern and provided a clear justification for their proposed model including the limitations of performing the mechanistic assays at this stage. I am satisfied with the response and have no further comments

    4. Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms.

      Strengths:

      The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is an advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      The ability to rescue the morphogenesis defect through supplementation of sulfur-containing amino acids provides a functional validation.

      Weaknesses:

      cAMP addition rescued the pseudohyphal differentiation defect exhibited by the ΔΔgpa2 strain. More clarity is needed on how this mechanism is mechanistically distinct from the metabolic control - whether cAMP acts in parallel or downstream to sulfur-containing amino acids biosynthesis has to be characterized. Supplementation of cysteine and methionine bypasses glycolytic regulation; the link between these amino acids and their role in fungal morphogenesis is not completely characterized.

      The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the downstream effects of Met4 activation were not fully characterized. How does Cysteine/Methionine rescue morphogenesis? The author's response figure 1 shows that there are no significant transcriptional changes in the expression of cAMP-PKA pathway-associated genes, which alone could not completely explain the role of gpa2 in morphogenesis, because exogenous cAMP can restore pseudohyphal differentiation in the ΔΔgpa2 background (Revised Fig. 1L). This implies that gpa2's function in morphogenesis is an additional, or possibly a metabolic or post-transcriptional, layer of regulation, and its connection to sulfur-containing amino acids remains to be elucidated.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fungal survival and pathogenicity rely on the ability to undergo reversible morphological transitions, which are often linked to nutrient availability. In this study, the authors uncover a conserved connection between glycolytic activity and sulfur amino acid biosynthesis that drives morphogenesis in two fungal model systems. By disentangling this process from canonical cAMP signaling, the authors identify a new metabolic axis that integrates central carbon metabolism with developmental plasticity and virulence.

      Strengths:

      The study integrates different experimental approaches, including genetic, biochemical, transcriptomic, and morphological analyses, and convincingly demonstrates that perturbations in glycolysis alter sulfur metabolic pathways and thus impact pseudohyphal and hyphal differentiation. Overall, this work offers new and important insights into how metabolic fluxes are intertwined with fungal developmental programs and therefore opens new perspectives to investigate morphological transitioning in fungi.

      We thank the reviewer for finding this study to be of importance and for appreciating our multipronged approach to substantiate our finding that perturbations in glycolysis alter sulfur metabolism and thus impact pseudohyphal and hyphal differentiation in fungi.

      Weaknesses:

      A few aspects could be improved to strengthen the conclusions. Firstly, the striking transcriptomic changes observed upon 2DG treatment should be analyzed in S. cerevisiae adh1 and pfk1 deletion strains, for instance, through qPCR or western blot analyses of sulfur metabolism genes, to confirm that observed changes in 2DG conditions mirror those seen in genetic mutants. Secondly, differences between methionine and cysteine in their ability to rescue the mutant phenotype in both species are not mentioned, nor discussed in more detail. This is especially important as there seem to be differences between S. cerevisiae and C. albicans, which might point to subtle but specific metabolic adaptations.

      The authors are also encouraged to refine several figure elements for clarity and comparability (e.g., harmonized axes in bar plots), condense the discussion to emphasize the conceptual advances over a summary of the results, and shorten figure legends.

      We are grateful for this valuable and constructive feedback, and we agree with the reviewer on the necessity of performing RT-qPCR analysis of sulfur metabolism genes in ∆∆pfk1 and ∆∆adh1 strains of S. cerevisiae to validate our RNA-Seq results using 2DG. We have performed this experiment, and our results show that several genes involved in the de novo biosynthesis of sulfur-containing amino acids are downregulated in both the ∆∆pfk1 and ∆∆adh1 strains, corroborating the downregulation of sulfur metabolism genes in the 2DG treated samples. This new data is now included in the revised manuscript as Supplementary Figure 2C. 

      Furthermore, we acknowledge the reviewer’s point regarding the significance of comparing the differences in the ability of methionine and cysteine to rescue filamentation defects exhibited by the mutants, between S. cerevisiae and C. albicans. The observed differences between S. cerevisiae and C. albicans likely highlight species-specific metabolic adaptations within the sulfur assimilation pathway.  While both yeasts employ the transsulfuration pathway to interconvert these sulfur-containing amino acids, the precise regulatory points including the specific enzymes, their compartmentalization, and transcriptional control are not identical. For instance, differences in the feedback inhibition mechanisms or the expression levels of key transsulfuration enzymes between S. cerevisiae and C. albicans could explain the variations in the phenotypic rescue experiments (Chebaro et al., 2017; Lombardi et al., 2024; Rouillon et al., 2000; Shrivastava et al., 2021; Thomas and Surdin-Kerjan, 1997). Furthermore, the species-specific differences in amino acid transport systems (permeases) adds another layer of complexity. S. cerevisiae primarily uses multiple, low-affinity permeases for cysteine transport (Gap1, Bap2, Bap3, Tat1, Tat2, Agp1, Gnp1, Yct1), while relying on a limited set of high-affinity transporters (like Mup1) for methionine transport, with the added complexity that its methionine transporters can also transport cysteine (Düring-Olsen et al., 1999; Huang et al., 2017; Kosugi et al., 2001; Menant et al., 2006). In contrast, C. albicans utilizes a high-affinity transporters for the uptake of both amino acids, employing Cyn1 specifically for cysteine and Mup1 for methionine, indicating a greater reliance on dedicated transport mechanisms for these sulfur-containing molecules in the pathogenic yeast (Schrevens et al., 2018; Yadav and Bachhawat, 2011). A combination of the aforesaid factors could be the potential reason for the differences in the ability of cysteine and methionine to rescue filamentation in S. cerevisiae and C. albicans.

      Finally, we have enhanced the quantitative rigor and clarity of the data presentation in the revised manuscript by implementing Y-axis uniformity across all relevant bar graphs to facilitate a more robust and direct comparative analysis. We have also condensed the discussion to emphasize the conceptual advances and have shortened the figure legends as per the reviewer suggestions

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the interplay between glycolysis and sulfur metabolism in regulating fungal morphogenesis and virulence. Using both Saccharomyces cerevisiae and Candida albicans, the authors demonstrate that glycolytic flux is essential for morphogenesis under nitrogen-limiting conditions, acting independently of the established cAMP-PKA pathway. Transcriptomic and genetic analyses reveal that glycolysis influences the de novo biosynthesis of sulfur-containing amino acids, specifically cysteine and methionine. Notably, supplementation with sulfur sources restores morphogenetic and virulence defects in glycolysis-deficient mutants, thereby linking core carbon metabolism with sulfur assimilation and fungal pathogenicity.

      Strengths:

      The work identifies a previously uncharacterized link between glycolysis and sulfur metabolism in fungi, bridging metabolic and morphogenetic regulation, which is an important conceptual advance and fungal pathogenicity. Demonstrating that adding cysteine supplementation rescues virulence defects in animal models connects basic metabolism to infection outcomes, which adds to biomedical importance.

      We would like to thank the reviewer for the positive comments on our work. We are pleased that they recognize the novel metabolic link between glycolysis and sulfur metabolism as a key conceptual advance in fungal morphogenesis. 

      Weaknesses:

      The proposed model that glycolytic flux modulates Met30 activity post-translationally remains speculative. While data support Met4 stabilization in met30 deletion strains, the mechanism of Met30 modulation by glycolysis is not demonstrated.

      We thank the reviewer for this valuable feedback. The activity of the SCF<sup>Met30</sup> E3 ubiquitin ligase, mediated by the F box protein Met30, is dynamically regulated through both proteolytic degradation and its dissociation from the SCF complex, to coordinate sulfur metabolism and cell cycle progression (Smothers et al., 2000; Yen et al., 2005). Our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism. This observation is consistent with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates the SCF<sup>Met30</sup> E3 ubiquitin ligase via the dissociation of Met30 from the Skp1 subunit of the SCF complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Mechanistic validation of this hypothesis, particularly the assessment of Met30 dissociation from the SCF<sup>Met30</sup> complex via immunoprecipitation (IP), is technically challenging. Since these experiments will involve isolation of cells from colonies undergoing pseudohyphal differentiation, on solid media (given that pseudohyphal differentiation does not occur in liquid media that is limiting for nitrogen (Gancedo, 2001; Gimeno et al., 1992)), current cell yields (OD<sub>600</sub>≈1 from ≈80-100 colonies) are significantly below the amount of cells that is needed to obtain the required amount of total protein concentration, for standard pull down assays (OD<Sub>600</sub>≈600-800 is required to achieve 1-2 mg/ml of total protein which is the standard requirement for pull down protocols in S. cerevisiae (Lauinger et al., 2024)).

      Given that the primary objective of our study is to establish the novel regulatory link between glycolysis and sulfur metabolism in the context of fungal morphogenesis, we would like to explore these crucial mechanistic details, in depth, in a subsequent study.

      Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms. However, despite the novel conceptual framework, the study provides limited mechanistic characterization of how the sulfur metabolism and glycolysis blockade directly drive morphological outcomes. In particular, the rationale for selecting specific gene deletions, such as Met32 (and not Met4), or the Met30 deletion used to probe this pathway, is not clearly explained, making it difficult to assess whether these targets comprehensively represent the metabolic nodes proposed to be critical. Further supportive data and experimental validation would strengthen the claims on connections between glycolysis, sulfur amino acid metabolism, and virulence.

      Strengths:

      (1) The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is a significant advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      (2) Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      (3) The ability to rescue the morphogenesis defect through exogenous supplementation of sulfur-containing amino acids provides functional validation.

      (4) The findings from the murine Pfk1-deficient model underscore the clinical significance of metabolic pathways in fungal infections.

      We are grateful for this comprehensive and insightful summary of our work. We deeply appreciate the reviewer's recognition of the key conceptual breakthroughs regarding the metabolic regulation of fungal morphogenesis and the clinical relevance of our findings.

      Weaknesses:

      (1) While the link between glycolysis and sulfur amino acid biosynthesis is established via transcriptomic and proteomic analysis, the specific regulation connecting these pathways via Met30 remains to be elucidated. For example, what are the expression and protein levels of Met30 in the initial analysis from Figure 2? How specific is this effect on Met30 in anaerobic versus aerobic glycolysis, especially when the pentose phosphate pathway is involved in the growth of the cells when glycolysis is perturbed ?

      We are grateful for the insightful feedback provided by the reviewer. S. cerevisiae is a Crabtree positive organism that primarily uses anaerobic glycolysis to metabolize glucose, under glucose-replete conditions (Barford and Hall, 1979; De Deken, 1966) and our pseudohyphal differentiation assays are performed in glucose-rich conditions (Gimeno et al., 1992). Furthermore, perturbation of glycolysis is known to induce compensatory upregulation of the Pentose Phosphate Pathway (PPP) (Ralser et al., 2007) and we have also observed the upregulation of the gene that encodes for transketolase-1 (Tkl1), a key enzyme in the PPP, in our RNA-seq data. Importantly, our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism.  This aligns with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates SCF<sup>Met30</sup> E3 ubiquitin ligase via Met30 dissociation from the Skp1 subunit of the complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Further experiments are required to delineate the specific role of pentose phosphate pathway in the aforesaid proposed regulation of the Met30 activity under glycolysis perturbation and this will be explored in our subsequent study.

      (2) Including detailed metabolite profiling could have strengthened the metabolic connection and provided additional insights into intermediate flux changes, i.e., measuring levels of metabolites to check if cysteine or methionine levels are influenced intracellularly. Also, it is expected to see how Met30 deletion could affect cell growth. Data on Met30 deletion and its effect on growth are not included, especially given that a viable heterozygous Met30 strain has been established. Measuring the cysteine or methionine levels using metabolomic analysis would further strengthen the claims in every section.

      We are grateful to the reviewer for this constructive feedback. To address the potential impact of met30 deletion on cell growth, we have included new data (Suppl. Fig. 4A) demonstrating that the deletion of a single copy of met30 in diploid S. cerevisiae does not compromise overall cell growth under nitrogen-limiting conditions as the ∆met30 strain grows similar to the wild-type strain. 

      Our pseudohyphal/hyphal differentiation assays show that the defects induced by glycolytic perturbation is fully rescued by the exogenous supplementation of sulfur-containing amino acids, cysteine or methionine. Since these data conclusively demonstrate that the primary metabolic limitation caused by the perturbation of glycolysis, which leads to filamentation defects is sulfur metabolism, we posit that performing comprehensive metabolic profiling would primarily reconfirm the aforesaid results. We believe that our in vitro and in vivo sulfur add-back experiments sufficiently substantiate the novel regulatory metabolic link between glycolysis and sulfur metabolism.

      (3) In comparison with the previous bioRxiv (doi: https://doi.org/10.1101/2025.05.14.654021) of this article in May 2025 to the recent bioRxiv of this article (doi: https://doi.org/10.1101/2025.05.14.654021), there have been some changes, and Met30 deletion has been recently included, and the chemical perturbation of glycolysis has been added as new data. Although the changes incorporated in the recent version of the article improved the illustration of the hypothesis in Figure 6, which connects glycolysis to Sulfur metabolism, the gene expression and protein levels of all genes involved in the illustrated hypothesis are not consistently shown. For example, in some cases, the Met4 expression is not shown (Figure 4), and the Met30 expression is not shown during profiling (gene expression or protein levels) throughout the manuscript. Lack of consistency in profiling the same set of key genes makes understanding more complicated.

      We thank the reviewer for this feedback which helps us to clarify the scope of our transcriptomic analysis. Our decision to focus our RT-qPCR experiments on downstream targets, while excluding met4 and met30 from the RT-qPCR analysis, is based on their known regulatory mechanisms. Met4 activity is predominantly regulated by post-translational ubiquitination by the SCFMet30 complex followed by its degradation (Rouillon et al., 2000; Shrivastava et al., 2021; Smothers et al., 2000)  while Met30 activity is primarily regulated by its auto-degradation or its dissociation from the SCFMet30 complex (Lauinger et al., 2024; Smothers et al., 2000; Yen et al., 2005).  Consistent with this, our RNA-Seq results indicate that neither met4 nor met30 transcripts are differentially expressed, in response to 2DG addition. For all our RT-qPCR analysis in S. cerevisiae and C. albicans, we have consistently used the same set of sulfur metabolism genes and these include met32, met3, met5, met10 and met17. Our data on protein expression analysis of Met30 in S. cerevisiae (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCFMet30 proteasomal degradation as the dominant regulatory mechanism.

      (4) The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the Met4 activation was not fully characterized, nor were the data presented when virulence was assessed in Figure 4. Why is Met4 not included in Figure 4D and I? Especially, according to Figure 6, Met4 activation is crucial and guides the differences between glycolysis-active and inactive conditions.

      We thank the reviewer for their input. As the Met4 transcription factor in C. albicans is primarily regulated post-translationally through its degradation and inactivation by the SCFMet30 E3 ubiquitin ligase complex (Shrivastava et al., 2021), we opted to monitor the transcriptional status of downstream targets of Met4 (i.e., genes directly regulated by Met4), as these are the genes that exhibit the most direct and functionally relevant transcriptional changes in response to the altered Met4 levels.

      (5) Similarly, the rationale behind selecting Met32 for characterizing sulfur metabolism is unclear. Deletion of Met32 resulted in a significant reduction in pseudohyphal differentiation; why is this attributed only to Met32? What happens if Met4 is deleted? It is not justified why Met32, rather than Met4, was chosen. Figure 6 clearly hypothesizes that Met4 activation is the key to the mechanism.

      We sincerely thank the reviewer for this insightful query regarding our selection of the met32 for our gene deletion experiments. The choice of ∆∆met32 strain was strategically motivated by its unique phenotypic properties within the de novo biosynthesis of sulfur-containing amino acids pathway. While deletions of most the genes that encode for proteins involved in the de novo biosynthesis of sulfurcontaining amino acids, result in auxotrophy for methionine or cysteine, ∆∆met32 strain does not exhibit this phenotype (Blaiseau et al., 1997). This key distinction is attributed to the functional redundancy provided by the paralogous gene, met31 (Blaiseau et al., 1997). Crucially, given that the deletion of the central transcriptional regulator, met4, results in cysteine/methionine auxotrophy, the use of the ∆∆met32 strain provides an essential, viable experimental model for investigating the role of sulfur metabolism during pseudohyphal differentiation in S. cerevisiae.

      (6) The comparative RT-qPCR in Figure 5 did not account for sulfur metabolism genes, whereas it was focused only on virulence and hyphal differentiation. Is there data to support the levels of sulfur metabolism genes?

      We thank the reviewer for this feedback. We wish to respectfully clarify that the data pertaining to expression of sulfur metabolism genes in the presence of 2DG or in the ∆∆pfk1 strain in C. albicans are already included and discussed within the manuscript. These results can be found in Figure 4, panels D and I, respectively.

      (7) To validate the proposed interlink between sulfur metabolism and virulence, it is recommended that the gene sets (illustrated in Figure 6) be consistently included across all comparative data included throughout the comparisons. Excluding sulfur metabolism genes in Figure 5 prevents the experiment from demonstrating the coordinated role of glycolysis perturbation → sulfur metabolism → virulence. The same is true for other comparisons, where the lack of data on Met30, Met4, etc., makes it hard.to connect the hypothesis. It is also recommended to check the gene expression of other genes related to the cAMP pathway and report them to confirm the cAMP-independent mechanism. For example, gap2 deletion was used to confirm the effects of cAMP supplementation, but the expression of this gene was not assessed in the RNA-seq analysis in Figure 2. It would be beneficial to show the expression of cAMP-related genes to completely confirm that they do not play a role in the claims in Figure 2.

      We thank the reviewer for this valuable feedback. The transcriptional analysis of the sulfur metabolism genes in the presence of 2DG and the ∆∆pfk1 strain is shown in Figures 4D and 4I.

      Our RNA-seq analysis (Author response image 1) confirms that there is no significant transcriptional change in the expression of cAMP-PKA pathway associated genes (Log2 fold change ≥ 1 for upregulated genes and Log2 fold change ≤ -1 for downregulated genes) in 2DG treated cells compared to the untreated control cells, reinforcing our conclusion that the glycolytic regulation of fungal morphogenesis is mediated through a cAMP-PKA pathway independent mechanism.

      Author response image 1.

      (8) Although the NAC supplementation study is included in the new version of the article compared to the previous version in BioRxiv (May 2025), the link to sulfur metabolism is not well characterized in Figure 5 and their related datasets. The main focus of the manuscript is to delineate the role of sulfur metabolism; hence, it is anticipated that Figure 5 will include sulfur-related metabolic genes and their links to pfk1 deletion, using RT-PCR measurements as shown for the virulence genes.

      We thank the reviewer for this question. The relevant data are indeed present within the current submission. We respectfully direct the reviewer's attention to Figure 4, panels D and I, where the data pertaining to expression of sulfur metabolism genes in the presence of 2DG or in the ∆∆pfk1 strain in C. albicans can be found.

      (9) The manuscript would benefit from more information added to the introduction section and literature supports for some of the findings reported earlier, including the role of (i) cAMP-PKA and MAPK pathways, (ii) what is known in the literature that reports about the treatment with 2DG (role of Snf1, HXT1, and HXT3), as well as how gpa2 is involved. Some sentences in the manuscripts are repetitive; it would be beneficial to add more relevant sections to the introduction and discussion to clarify the rationale for gene choices.

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 107: As morphological transitions are indeed a conserved phenomenon across fungal species, hosts & environmental niches, the authors could refer to a few more here (infection structures like appressoria; fruiting bodies, etc.).

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      Line 119/120: That's a bit misleading in my opinion. Gpr1 acts as a key sensor of external carbon, while Ras proteins control the cAMP pathway as intracellular sensory proteins. That should be stated more clearly. cAMP is the output and not the sensor.

      We appreciate the reviewer's detailed attention to this signaling network. We have revised the manuscript to precisely reflect this established signaling hierarchy for maximum clarity.

      (2) Line 180: ..differentiation

      We thank the reviewer for this valuable feedback. We have incorporated this change in our revised manuscript.

      (3) Figure 1 panels C & F. The authors should provide the same scale for all experiments. Otherwise, the interpretation can be difficult. The same applies to the different bar plots in Figure 4. Have the authors quantified pseudohyphal differentiation in the cAMP add-back assays? I agree that the chosen images look convincing, but they don't reflect quantitative analyses.

      We thank the reviewer for detailed and constructive feedback. We have changed the Y-axis and made it more uniform to improve the clarity of our data presentation in the revised manuscript.

      We have also incorporated the quantitative analysis of the cAMP add-back assays in S. cerevisiae, in Figure 2 Panel L.

      (4) Line 367/68: "cysteine or methionine was able to completely rescue". Here, the authors should phrase their wording more carefully. Figure 3C shows the complete rescue of the phenotype qualitatively, but Figure 3D clearly shows that there are differences between the supplementation of cysteine and methionine, with the latter not fully restoring the phenotype.

      We sincerely appreciate the reviewer's meticulous attention to the data interpretation. We fully agree that the initial phrasing in lines 367/368 requires adjustment, as Figure 3D establishes a quantitative difference in the efficiency of phenotypic rescue between cysteine and methionine supplementation. We have revised the text to articulate this difference.

      (5) Line 568: Here, apparently, the ability to rescue the differentiation phenotype is reversed compared to the experiment with S. cerevisiae. Cysteine only results in ~20% hyphal cells, while methionine restores to wild-type-like hyphal formation. Can the authors comment on where these differences might originate from? Is there a difference in the uptake of cysteine vs. methionine in the two species or consumption rates?

      We thank the reviewer for their detailed and constructive feedback. We believe this phenotypic difference can be due to the distinct metabolic prioritization of sulfur amino acids in C. albicans. Methionine is a known trigger for hyphal differentiation in C. albicans and serves as the immediate precursor for the universal methyl donor, S-adenosylmethionine (SAM) (Schrevens et al., 2018). (Kraidlova et al., 2016). The morphological transition to hyphae involves a complex regulatory cascade which requires high rates of methylation, and this requires a rapid and direct conversion of methionine into SAM (Kraidlova et al., 2016; Schrevens et al., 2018). Cysteine, however, must first be converted into methionine via the transsulfuration pathway to produce SAM, making it metabolically less efficient for these aforesaid processes.

      Reviewer #2 (Recommendations for the authors):

      The study's comprehensive experimental approach with integrating pharmacological inhibition, genetic manipulation, transcriptomics, and infection animal model, provides strong evidence for a conserved mechanism, though some aspects need further clarification.

      Major Comments:

      (1) While the data suggest that glycolysis affects Met30 activity post-translationally, the underlying mechanism remains speculative. The authors should perform co-immunoprecipitation or ubiquitination assays to confirm whether glycolytic perturbation alters Met30-SCF complex interactions or Met4 ubiquitination levels.

      We thank the reviewer for this valuable feedback. The activity of the SCF<sup>Met30</sup> E3 ubiquitin ligase, mediated by the F box protein Met30, is dynamically regulated through both proteolytic degradation and its dissociation from the SCF complex, to coordinate sulfur metabolism and cell cycle progression (Smothers et al., 2000; Yen et al., 2005). Our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism. This observation is consistent with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates the SCF<sup>Met30</sup> E3 ubiquitin ligase via the dissociation of Met30 from the Skp1 subunit of the SCF complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Mechanistic validation of this hypothesis, particularly the assessment of Met30 dissociation from the SCF<sup>Met30 </sup>complex via immunoprecipitation (IP), is technically challenging. Since these experiments will involve isolation of cells from colonies undergoing pseudohyphal differentiation, on solid media (given that pseudohyphal differentiation does not occur in liquid media that is limiting for nitrogen (Gancedo, 2001; Gimeno et al., 1992)), current cell yields (OD<sup>600</sup>≈1 from ≈80-100 colonies) are significantly below the amount of cells that is needed to obtain the required amount of total protein concentration, for standard pull down assays (OD600≈600-800 is required to achieve 1-2 mg/ml of total protein which is the standard requirement for pull down protocols in S. cerevisiae (Lauinger et al., 2024)).

      Given that the primary objective of our study is to establish the novel regulatory link between glycolysis and sulfur metabolism in the context of fungal morphogenesis, we would like to explore these crucial mechanistic details, in depth, in a subsequent study.

      (2) 2DG can exert pleiotropic effects unrelated to glycolytic inhibition (e.g., ER stress, autophagy induction). The authors are encouraged to perform complementary metabolic flux analyses, such as quantification of glycolytic intermediates or ATP levels, to confirm specific glycolytic inhibition.

      We appreciate the reviewer's concern regarding the potential pleiotropic effects of 2DG. While we acknowledge that 2DG may induce secondary cellular stress, we are confident that the observed phenotypes are robustly attributed to glycolytic inhibition based on our complementary genetic evidence. Specifically, the deletion strains ∆∆pfk1 and ∆∆adh1, which genetically perturb distinct steps in glycolysis, recapitulate the phenotypic results observed with 2DG treatment. Given this strong congruence between chemical inhibition and specific genetic deletions of key glycolytic enzymes, we are confident that our observed phenotypes are predominantly driven by the perturbation of the glycolytic pathway by 2DG.

      (3) The differential rescue effects (cysteine-only in inhibitor assays vs. both cysteine and methionine in genetic mutants) require further explanation. The authors should discuss potential differences in metabolic interconversion or amino acid transport that may account for this observation.

      We thank the reviewer for their valuable feedback. One explanation for the observed differential rescue effects of cysteine and methionine can be due to the distinct amino acid transport systems used by S. cerevisiae to transport these amino acids. S. cerevisiae primarily uses multiple, lowaffinity permeases (Gap1, Bap2, Bap3, Tat1, Tat2, Agp1, Gnp1, Yct1) for cysteine transport, while relying on a limited set of high-affinity transporters (like Mup1) for methionine transport, with the added complexity that its methionine transporters can also transport cysteine (Düring-Olsen et al., 1999; Huang et al., 2017; Kosugi et al., 2001; Menant et al., 2006). Hence, it is likely that cysteine uptake could be happening at a higher efficiency in S. cerevisiae compared to methionine uptake. Therefore, to achieve a comparable functional rescue by exogenous supplementation of methionine, it is necessary to use a higher concentration of methionine. When we performed our rescue experiments using higher concentrations of methionine, we did not see any rescue of pseudohyphal differentiation in the presence of 2DG and in fact we noticed that, at higher concentrations of methionine, the wild-type strain failed to undergo pseudohyphal differentiation even in the absence of 2DG. This is likely due to the fact that increasing the methionine concentration raises the overall nitrogen content of the medium, thereby making the medium less nitrogen-starved. This presents a major experimental constraint, as pseudohyphal differentiation is strictly dependent on nitrogen limitation, and the elevated nitrogen resulting from the higher methionine concentration can inhibit pseudohyphal differentiation.

      (4) NAC may influence host redox balance or immune responses. The discussion should consider whether the observed virulence rescue could partly result from host-directed effects.

      We thank the reviewer for this valuable feedback. We acknowledge the role of NAC in host directed immune response. It is important to note that, in the context of certain bacterial pathogens, NAC has been reported to augment cellular respiration, subsequently increasing Reactive Oxygen Species (ROS) generation, which contributes to pathogen clearance (Shee et al., 2022). Interestingly, in our study, NAC supplementation to the mice was given prior to the infection and maintained continuously throughout the duration of the experiment. This continuous supply of NAC likely contributes to the rescue of virulence defects exhibited by the ∆∆pfk1 strain (Fig. 5I and J). Essentially, NAC likely allows the mutant to fully activate its essential virulence strategies (including morphological switching), to cause a successful infection in the host. As per the reviewer suggestion, this has been included in the discussion section of the manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the comments related to improving the manuscript have been provided in the public review. Here are some specifics for the authors to consider:

      (1) It is important to clarify the rationale for choosing specific gene deletions over other key genes (e.g., Met32 and Met30) and explain why Met4 was not included, given its proposed central role in Figure 6.

      We sincerely thank the reviewer for this insightful query regarding our selection of the met32 for our gene deletion experiments. The choice of ∆∆met32 strain was strategically motivated by its unique phenotypic properties within the de novo biosynthesis of sulfur-containing amino acids pathway. While deletions of most the genes that encode for proteins involved in the de novo biosynthesis of sulfurcontaining amino acids, result in auxotrophy for methionine or cysteine, ∆∆met32 strain does not exhibit this phenotype (Blaiseau et al., 1997). This key distinction is attributed to the functional redundancy provided by the paralogous gene, met31 (Blaiseau et al., 1997). Crucially, given that the deletion of the central transcriptional regulator, met4, results in cysteine/methionine auxotrophy, the use of the ∆∆met32 strain provides an essential, viable experimental model for investigating the role of sulfur metabolism during pseudohyphal differentiation in S. cerevisiae.

      (2) Comparison of consistent gene and protein expression data (Met30, Met4, Met32) across all relevant figures and analyses would strengthen the mechanistic connection in a better way. Some data that might help connect the sections is not included; please see the public review for more details.

      We thank the reviewer for this valuable input, which helps us to clarify the scope of our transcriptomic analysis. Our decision to focus our RT-qPCR experiments on downstream targets, while excluding Met4 and Met30 from the RT-qPCR analysis, is based on their known regulatory mechanisms. Met4 activity is predominantly regulated by post-translational ubiquitination by the SCFMet30 complex followed by its degradation (Rouillon et al., 2000; Shrivastava et al., 2021; Smothers et al., 2000)  while Met30 activity is primarily regulated by its auto-degradation or its dissociation from the SCFMet30 complex (Lauinger et al., 2024; Smothers et al., 2000; Yen et al., 2005).  Consistent with this, our RNA-Seq results indicate that neither met4 nor met30 transcripts are differentially expressed, in response to 2DG addition. For all our RT-qPCR analysis in S. cerevisiae and C. albicans, we have consistently used the same set of sulfur metabolism genes and these include met32, met3, met5, met10 and met17. Our data on protein expression analysis of Met30 in S, cerevisiae (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCFMet30 proteasomal degradation as the dominant regulatory mechanism.

      (3) Suggested to include metabolomic profiling (cysteine, methionine, and intermediate metabolites) to substantiate the proposed metabolic flux between glycolysis and sulfur metabolism.

      We thank the reviewer for this valuable input. Our pseudohyphal/hyphal differentiation assays show that the defects induced by glycolytic perturbation is fully rescued by the exogenous supplementation of sulfur-containing amino acids, cysteine or methionine. Since these data conclusively demonstrate that the primary metabolic limitation caused by the perturbation of glycolysis, which leads to filamentation defects, is sulfur metabolism, we posit that performing comprehensive metabolic profiling would primarily reconfirm the aforesaid results. We believe that our in vitro and in vivo sulfur add-back experiments sufficiently substantiate the novel regulatory metabolic link between glycolysis and sulfur-metabolism.

      (4) Data on the effects of Met30 deletion on cell growth are currently not included, and relevant controls should be included to ensure observed phenotypes are not due to general growth defects.

      We are grateful to the reviewer for this constructive feedback. To address the potential impact of met30 deletion on cell growth, we have included new data (Suppl. Fig. 4A) demonstrating that the deletion of a single copy of met30 in diploid S. cerevisiae does not compromise overall growth under nitrogen-limiting conditions as the ∆met30 strain grows similar to the wild-type strain.

      (5) Expanding RT-qPCR and data from transcriptomic analyses to include sulfur metabolism genes and key cAMP pathway genes to confirm the proposed cAMP-independent mechanism during virulence characterization is necessary.

      We thank the reviewer for this valuable feedback. The transcriptional analysis of the sulfur metabolism genes in the presence of 2DG and the ∆∆pfk1 strain is shown in Figures 4D and 4I. 

      In order to confirm that glycolysis is critical for fungal morphogenesis in a cAMP-PKA pathway independent manner under nitrogen-limiting conditions in C. albicans, we performed cAMP add-back assays. Interestingly, corroborating our S. cerevisiae data, the exogenous addition of cAMP failed to rescue hyphal differentiation defect caused by the perturbation of glycolysis through 2DG addition or by the deletion of the pfk1 gene, under nitrogen-limiting condition in C. albicans. This data is now included in Suppl. Fig. 5B.

      (6) Enhancing the introduction and discussion by providing a clearer rationale for gene selection and more detailed references to established pathways (cAMP-PKA, MAPK, Snf1/HXT regulation, gpa2 involvement) is needed to reinstate the hypothesis.

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      (7) Reducing redundancy in the text and improving figure consistency, particularly by ensuring that the gene sets depicted in Figure 6 are represented across all datasets, would strengthen the interconnections among sections.

      We thank the reviewer for this valuable feedback.  We have incorporated these changes in our revised manuscript.

      References

      Barford JP, Hall RJ. 1979. An examination of the crabtree effect in Saccharomyces cerevisiae: The role of respiratory adaptation. J Gen Microbiol. https://doi.org/10.1099/00221287-114-2-267

      Blaiseau, P. L., & Thomas, D. (1998). Multiple transcriptional activation complexes tether the yeast activator Met4 to DNA. The EMBO journal, 17(21), 6327–6336. https://doi.org/10.1093/emboj/17.21.6327

      Chebaro, Y., Lorenz, M., Fa, A., Zheng, R., & Gustin, M. (2017). Adaptation of Candida albicans to Reactive Sulfur Species. Genetics, 206(1), 151–162. https://doi.org/10.1534/genetics.116.199679

      De Deken R. H. (1966). The Crabtree effect: a regulatory system in yeast. Journal of general microbiology, 44(2), 149–156. https://doi.org/10.1099/00221287-44-2-149

      Düring-Olsen, L., Regenberg, B., Gjermansen, C., Kielland-Brandt, M. C., & Hansen, J. (1999). Cysteine uptake by Saccharomyces cerevisiae is accomplished by multiple permeases. Current genetics, 35(6), 609–617. https://doi.org/10.1007/s002940050459

      Gancedo J. M. (2001). Control of pseudohyphae formation in Saccharomyces cerevisiae. FEMS microbiology reviews, 25(1), 107–123. https://doi.org/10.1111/j.1574-6976.2001.tb00573.x

      Gimeno, C. J., Ljungdahl, P. O., Styles, C. A., & Fink, G. R. (1992). Unipolar cell divisions in the yeast S. cerevisiae lead to filamentous growth: regulation by starvation and RAS. Cell, 68(6), 1077–1090. https://doi.org/10.1016/0092-8674(92)90079-r

      Huang, C. W., Walker, M. E., Fedrizzi, B., Gardner, R. C., & Jiranek, V. (2017). Yeast genes involved in regulating cysteine uptake affect production of hydrogen sulfide from cysteine during fermentation. FEMS yeast research, 17(5), 10.1093/femsyr/fox046. https://doi.org/10.1093/femsyr/fox046

      Kosugi, A., Koizumi, Y., Yanagida, F., & Udaka, S. (2001). MUP1, high affinity methionine permease, is involved in cysteine uptake by Saccharomyces cerevisiae. Bioscience, biotechnology, and biochemistry, 65(3), 728–731. https://doi.org/10.1271/bbb.65.728

      Kraidlova, L., Schrevens, S., Tournu, H., Van Zeebroeck, G., Sychrova, H., & Van Dijck, P. (2016). Characterization of the Candida albicans Amino Acid Permease Family: Gap2 Is the Only General Amino Acid Permease and Gap4 Is an S-Adenosylmethionine (SAM) Transporter Required for SAM-Induced Morphogenesis. mSphere, 1(6), e00284-16. https://doi.org/10.1128/mSphere.00284-16

      Lauinger, L., Andronicos, A., Flick, K., Yu, C., Durairaj, G., Huang, L., & Kaiser, P. (2024). Cadmium binding by the F-box domain induces p97-mediated SCF complex disassembly to activate stress response programs. Nature communications, 15(1), 3894. https://doi.org/10.1038/s41467-024-48184-6

      Lombardi, L., Salzberg, L. I., Cinnéide, E. Ó., O'Brien, C., Morio, F., Turner, S. A., Byrne, K. P., & Butler, G. (2024). Alternative sulphur metabolism in the fungal pathogen Candida parapsilosis. Nature communications, 15(1), 9190. https://doi.org/10.1038/s41467-024-53442-8

      Menant, A., Barbey, R., & Thomas, D. (2006). Substrate-mediated remodeling of methionine transport by multiple ubiquitin-dependent mechanisms in yeast cells. The EMBO journal, 25(19), 4436–4447. https://doi.org/10.1038/sj.emboj.7601330

      Ralser, M., Wamelink, M. M., Kowald, A., Gerisch, B., Heeren, G., Struys, E. A., Klipp, E., Jakobs, C., Breitenbach, M., Lehrach, H., & Krobitsch, S. (2007). Dynamic rerouting of the carbohydrate flux is key to counteracting oxidative stress. Journal of biology, 6(4), 10. https://doi.org/10.1186/jbiol61

      Rouillon, A., Barbey, R., Patton, E. E., Tyers, M., & Thomas, D. (2000). Feedback-regulated degradation of the transcriptional activator Met4 is triggered by the SCF(Met30 )complex. The EMBO journal, 19(2), 282–294. https://doi.org/10.1093/emboj/19.2.282

      Schrevens, S., Van Zeebroeck, G., Riedelberger, M., Tournu, H., Kuchler, K., & Van Dijck, P. (2018). Methionine is required for cAMP-PKA-mediated morphogenesis and virulence of Candida albicans. Molecular microbiology, 108(3), 258–275. https://doi.org/10.1111/mmi.13933

      Shee, S., Singh, S., Tripathi, A., Thakur, C., Kumar T, A., Das, M., Yadav, V., Kohli, S., Rajmani, R. S., Chandra, N., Chakrapani, H., Drlica, K., & Singh, A. (2022). Moxifloxacin-Mediated Killing of Mycobacterium tuberculosis Involves Respiratory Downshift, Reductive Stress, and Accumulation of Reactive Oxygen Species. Antimicrobial agents and chemotherapy, 66(9), e0059222. https://doi.org/10.1128/aac.00592-22

      Shrivastava, M., Feng, J., Coles, M., Clark, B., Islam, A., Dumeaux, V., & Whiteway, M. (2021). Modulation of the complex regulatory network for methionine biosynthesis in fungi. Genetics, 217(2), iyaa049. https://doi.org/10.1093/genetics/iyaa049

      Smothers, D. B., Kozubowski, L., Dixon, C., Goebl, M. G., & Mathias, N. (2000). The abundance of Met30p limits SCF(Met30p) complex activity and is regulated by methionine availability. Molecular and cellular biology, 20(21), 7845–7852. https://doi.org/10.1128/MCB.20.21.7845-7852.2000

      Thomas, D., & Surdin-Kerjan, Y. (1997). Metabolism of sulfur amino acids in Saccharomyces cerevisiae. Microbiology and molecular biology reviews : MMBR, 61(4), 503–532. https://doi.org/10.1128/mmbr.61.4.503532.1997

      Yadav, A. K., & Bachhawat, A. K. (2011). CgCYN1, a plasma membrane cystine-specific transporter of Candida glabrata with orthologues prevalent among pathogenic yeast and fungi. The Journal of biological chemistry, 286(22), 19714–19723. https://doi.org/10.1074/jbc.M111.240648

      Yen, J. L., Su, N. Y., & Kaiser, P. (2005). The yeast ubiquitin ligase SCFMet30 regulates heavy metal response. Molecular biology of the cell, 16(4), 1872–1882. https://doi.org/10.1091/mbc.e04-12-1130

    1. eLife Assessment

      This study presents DeepTX, a valuable methodological tool that integrates mechanistic stochastic models with single-cell RNA sequencing data to infer transcriptional burst kinetics at genome scale. The approach is broadly applicable and of interest to subfields such as systems biology, bioinformatics, and gene regulation. The evidence supporting the findings is solid, with appropriate validation on synthetic data and thoughtful discussion of limitations related to identifiability and model assumptions.

    2. Joint Public Review:

      In this work, the authors present DeepTX, a computational tool for studying transcriptional bursting using single-cell RNA sequencing (scRNA-seq) data and deep learning. The method aims to infer transcriptional burst dynamics-including key model parameters and the associated steady-state distributions-directly from noisy single-cell data. The authors apply DeepTX to datasets from DNA damage experiments, revealing distinct regulatory patterns: IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU alters burst frequency in human cancer cells, driving apoptosis or survival depending on dose. These findings underscore the role of burst regulation in mediating cell fate responses to DNA damage.

      The main strength of this study lies in its methodological contribution. DeepTX integrates a non-Markovian mechanistic model with deep learning to approximate steady-state mRNA distributions as mixtures of negative binomial distributions, enabling genome-scale parameter inference with reduced computational cost. The authors provide a clear discussion of the framework's assumptions, including reliance on steady-state data and the inherent unidentifiability of parameter sets, and they outline how the model could be extended to other regulatory processes.

      The revised manuscript addresses the original concerns raised by the reviewers, particularly those related to sample size requirements, distributional assumptions, and the biological interpretation of the inferred parameters. The authors have also included an extensive discussion of the limitations of the methodological framework, including the constraints associated with relying on snapshot data, as well as a broader contextualisation of DeepTX within the landscape of existing tools that link mechanistic modelling and single-cell transcriptomics.

      Overall, this work represents a valuable contribution to the integration of mechanistic models with high-dimensional single-cell data. It will be of interest to researchers in systems biology, bioinformatics, and computational modelling.

      Comments on revisions:

      We thank the authors for their thorough revision and for carefully addressing the points raised in the previous review. At this stage, the reviewers have no further concerns.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Joint Public Review:

      In this work, the authors present DeepTX, a computational tool for studying transcriptional bursting using single-cell RNA sequencing (scRNA-seq) data and deep learning. The method aims to infer transcriptional burst dynamics-including key model parameters and the associated steady-state distributions-directly from noisy single-cell data. The authors apply DeepTX to datasets from DNA damage experiments, revealing distinct regulatory patterns: IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU alters burst frequency in human cancer cells, driving apoptosis or survival depending on dose. These findings underscore the role of burst regulation in mediating cell fate responses to DNA damage.

      The main strength of this study lies in its methodological contribution. DeepTX integrates a non-Markovian mechanistic model with deep learning to approximate steady-state mRNA distributions as mixtures of negative binomial distributions, enabling genome-scale parameter inference with reduced computational cost. The authors provide a clear discussion of the framework's assumptions, including reliance on steady-state data and the inherent unidentifiability of parameter sets, and they outline how the model could be extended to other regulatory processes.

      The revised manuscript addresses many of the original concerns, particularly regarding sample size requirements, distributional assumptions, and the biological interpretation of inferred parameters. However, the framework remains limited by the constraints of snapshot data and cannot yet resolve dynamic heterogeneity or causality. The manuscript would also benefit from a broader contextualisation of DeepTX within the landscape of existing tools linking mechanistic modelling and single-cell transcriptomics. Finally, the interpretation of pathway enrichment analyses still warrants clarification.

      Overall, this work represents a valuable contribution to the integration of mechanistic models with highdimensional single-cell data. It will be of interest to researchers in systems biology, bioinformatics, and computational modelling.

      Recommendations for the authors:

      We thank the authors for their thorough revision and for addressing many of the points raised during the initial review. The revised manuscript presents an improved and clearer account of the methodology and its implications. However, several aspects would benefit from further clarification and refinement to strengthen the presentation and avoid overstatement.

      (1) Contextualization within the existing literature

      The manuscript would benefit from placing DeepTX more clearly in the context of other computational tools developed to connect mechanistic modelling and single-cell RNA sequencing data. This is an active area of research with notable recent contributions, including Sukys and Grima (bioRxiv, 2024), Garrido-Rodriguez et al. (PLOS Comp Biol, 2021), and Maizels (2024). Positioning DeepTX in relation to these and other relevant efforts would help readers appreciate its specific advances and contributions.

      We sincerely thank you for this valuable suggestion. We agree that situating DeepTX within the broader landscape of computational approaches linking mechanistic modeling and single-cell RNA sequencing data will clarify its contributions and advances. In this revised version, we have explicitly discussed the comparison and relation of DeepTX in the context of this active area using an individual paragraph in the Discussion section.

      Specifically, we mentioned that the DeepTX research paradigm contributes to a growing line of area aiming to link mechanistic models of gene regulation with scRNA-seq data. Maizels provided a comprehensive review of computational strategies for incorporating dynamic mechanisms into single-cell transcriptomics (Maizels RJ, 2024). In this context, RNA velocity is one of the most important examples as it infers short-term transcriptional trends based on splicing kinetics and deterministic ODEs model. However, such approaches are limited by their deterministic assumptions and cannot fully capture the stochastic nature of gene regulation. DeepTX can be viewed as an extension of this framework to stochastic modelling, explicitly addressing transcriptional bursting kinetics under DNA damage. Similarly, DeepCycle, developed by Sukys and Grima (Sukys A & Grima R, 2025), investigates transcriptional burst kinetics during the cell cycle, employing a stochastic age-dependent model and a neural network to infer burst parameters while correcting for measurement noise. By contrast, MIGNON integrates genomic variation data and static transcriptomic measurements into a mechanistic pathway model (HiPathia) to infer pathway-level activity changes, rather than gene-level stochastic transcriptional dynamics (Garrido-Rodriguez M et al., 2021). In this sense, DeepTX and MIGNON are complementary, with DeepTX resolving burst kinetics at the single-gene level and MIGNON emphasizing pathway responses to genomic perturbations, which could inspire future extensions of DeepTX that incorporate sequence-level information.

      (2) Interpretation of GO analysis

      The interpretation of the GO enrichment results in Figure 4D should be revised. While the text currently associates the enriched terms with signal transduction and cell cycle G2/M phase transition, the most significant terms relate to mitotic cell cycle checkpoint signaling. This distinction should be made clear in the main text, and the conclusions drawn from the GO analysis should be aligned more closely with the statistical results.

      We sincerely appreciate you for the insightful comment. We have carefully re-examined the GO enrichment results shown in Figure 4D and agree that the most significantly enriched terms correspond to mitotic cell cycle checkpoint signaling and signal transduction in response to DNA damage, rather than general G2/M phase transition processes. Accordingly, we have revised the main text to highlight the biological significance of mitotic cell cycle checkpoint signaling.

      Specifically, we now emphasize two key points: DNA damage and mitotic checkpoint activation are closely interconnected. (1) The mitotic checkpoint serves as a crucial safeguard to ensure accurate chromosome segregation and maintain genomic stability under DNA damage conditions. Activation of the mitotic checkpoint can influence cell fate decisions and differentiation potential (Kim EM & Burke DJ, 2008; Lawrence KS et al., 2015). (2) Sustained activation of the spindle assembly checkpoint (SAC) has been reported to induce mitotic slippage and polyploidization, which in turn may enhance the differentiation potential of embryonic stem cells  (Mantel C et al., 2007). These revisions ensure that our interpretation is consistent with the statistical enrichment results and better reflect the underlying biological processes implicated by the data.

      (3) Justification for training on simulated data

      The decision to train the model on simulated data should be clearly justified. While the advantage of having access to ground-truth parameters is understood, the manuscript would benefit from a discussion of the limitations of this approach, particularly in terms of generalizability to real datasets. Moreover, it is worth noting that many annotated scRNA-seq datasets are publicly available and could, in principle, be used to complement the training strategy.

      We thank you for this insightful comment. We chose to train DeepTXsolver on simulated data because no experimental dataset currently provides genome-wide transcriptional burst kinetics with known ground truth, which is essential for supervised learning. Simulation enables us to (i) generate large, fully annotated datasets spanning the biologically relevant parameter space, (ii) expose the solver to diverse bursting regimes (e.g., low/high burst frequency, small/large burst size, unimodal/bimodal distributions), and (iii) quantitatively benchmark model accuracy, parameter identifiability, and robustness prior to deployment on real scRNA-seq data.

      We acknowledge, however, that simulation-based training has inherent limitations in terms of generalizability. Real biological systems may deviate from the idealized bursting model, exhibit more complex noise structures, or display parameter distributions that differ from those in simulations. Moreover, the lack of ground-truth parameters in experimental scRNA-seq datasets prevents an absolute evaluation of inference accuracy. In the future work, publicly available annotated scRNA-seq datasets could be used to complement this simulation-based training strategy and enhance generalizability. We have revised the manuscript to explicitly discuss both the rationale for using simulated data and the potential limitations of this approach.

      (4) Benchmarking against external methods

      The performance of DeepTX is primarily compared to a prior method from the same group. To strengthen the methodological claims, it would be preferable to include benchmarking against additional established tools from the broader literature. This would offer a more objective evaluation of the performance gains attributed to DeepTX.

      We thank you for this constructive suggestion. We fully agree that benchmarking DeepTX against additional established tools from the broader literatures would provide a more comprehensive and objective evaluation of DeepTX . In the revised manuscript, we have included comparative analyses with other widely used methods, including nnRNA (From Shahrezaei group (Tang W et al., 2023)), txABC (from our group (Luo S et al., 2023)), txBurst (from Sandberg group (Larsson AJM et al., 2019)), txInfer (from Junhao group (Gu J et al., 2025)) (Supplementary Figure S4). The comparative results indicate that our method demonstrates superior performance in both efficiency and accuracy.

      (5) Interpretation of Figures 4-6

      The revised figures are clear and informative; however, the associated interpretations in the main text remain too strong relative to the type of analysis performed. For instance, in Figure 4, it is suggested that changes in burst size are linked to DNA damage-induced signalling cascades that affect cell cycle progression and fate decisions. While this is a plausible hypothesis, GO and GSEA analyses are correlative by nature and not sufficient to support such a mechanistic claim on their own. These analyses should be presented as exploratory, and the strength of the conclusions drawn should be tempered accordingly. Similar caution should be applied to the interpretations of Figures 5 and 6.

      We thank you for this important comment. In the revised manuscript, we have carefully moderated the interpretation of the GO and GSEA results in Figures 4, 5, and 6. Specifically, we now present these analyses as exploratory and emphasize their correlative nature, avoiding causal claims that go beyond the scope of the data. The text has been rephrased to highlight the observed associations rather than implying direct causal relationships.

      For Figure 4, we emphasize that while it is tempting to hypothesize that enhanced burst size may contribute to DNA damage-related checkpoint activation and thereby influence cell cycle progression and differentiation, our current results only indicate an association between burst size enhancement and pathways involved in DNA damage response and checkpoint signaling.

      For Figure 5, we emphasize that although our GO analysis cannot establish causality, the results are consistent with an association between 5-FU-induced changes in burst kinetics and pathways related to oxidative stress and apoptosis. Based on this, we propose a model outlining a potential process through which DNA damage may ultimately lead to cellular apoptosis.

      For Figure 6, we emphasize that these enrichment results suggest that high-dose 5FU treatment may be associated with processes such as telomerase activation and mitochondrial function maintenance, both of which have been implicated in cell survival and apoptosis evasion in previous experimental studies. For example, prior work indicates that hTERT translocation can activate telomerase pathways to support telomere maintenance and reduce oxidative stress, which is thought to contribute to apoptosis resistance. While our enrichment analysis cannot establish causality, the observed transcriptional bursting changes are consistent with these reported survival-associated mechanisms.

      (6) Discussion section framing

      The initial paragraphs of the discussion section make broad biological claims about the role of transcriptional bursting in cellular decision-making. While transcriptional bursting is undoubtedly relevant, the manuscript would benefit from a more cautious framing. It would be more appropriate to foreground the methodological contributions of DeepTX, and to present the biological insights as hypotheses or observations that may guide future experimental investigation, rather than as established conclusions.

      We thank you for this insightful comment. We have revised the discussion to clarify and appropriately temper our claims regarding transcriptional bursting. First, we now explicitly recognize that transcriptional bursting is one of multiple contributors to cellular variability, rather than the sole or dominant factor driving cellular decision-making. Second, we have restructured the opening of the discussion to prioritize the methodological contributions of DeepTX, highlighting its strength as a framework for inferring genomewide burst kinetics from scRNA-seq data. Finally, the biological insights derived from our analysis are now presented as correlative observations and potential hypotheses, which may inform and guide future experimental investigations, rather than as definitive mechanistic conclusions.

      Small Comments

      (1) Presentation of discrete distributions: In several figures (e.g., Figure 2B and Supplementary Figures S4, S6, and S8), the comparisons between empirical mRNA distributions and DeepTX-inferred distributions are visually represented using connecting lines, which may give the impression that continuous distributions are being compared to discrete ones. Given the focus on transcriptional bursting, a process inherently tied to discrete stochastic events, this representation could be misleading. The figure captions and visual style should be revised to clarify that all distributions are discrete and to avoid potential confusion. In general, it is recommended to avoid connecting points in discrete distributions with lines, as this can suggest interpolation or comparison with continuous distributions. This applies to Figures 2A and 2B in particular.

      We thank you for this valuable suggestion. To prevent any potential misinterpretation of discrete distributions as continuous ones, we have revised the visual representation of the empirical and DeepTXinferred mRNA distributions in Figures 2B, and Supplementary Figures S4, S6, and S8. Specifically, we have replaced the line plots with step plots, which more accurately capture the discrete nature of transcriptional bursting. Additionally, we have updated the figure captions to clearly state that all distributions are discrete.

      (2) Transcription is always a multi-step process. While the manuscript aims to model additional complexity introduced by DNA damage, the current phrasing (e.g., on page 5) could be read as implying that transcription becomes multi-step only under damage conditions. This should be clarified.

      We thank you for this helpful observation. We agree that transcription is inherently a multi-step process under all conditions. To avoid any possible misunderstanding, we have revised the text to clarify this point.

      Specifically, we now explain that many previous studies have employed simplified two-state models to approximate transcriptional dynamics, however, the gene expression process is inherently a multi-step process, which particularly cannot be neglected under conditions of DNA damage. DNA damage can result in slowing or even stopping the RNA pol II movement and cause many macromolecules to be recruited for damage repair. This process will affect the spatially localized behavior of the promoter, causing the dwell time of promoter inactivation and activation that cannot be approximated by a simple two state. Our work adopts a multi-step model because it is more appropriate for capturing the additional complexity introduced by DNA damage.

      (3) The first sentence of the discussion section overstates the importance of transcriptional bursting. While it is a key source of variability, it is not the only nor always the dominant one. Furthermore, its role in DNA damage response remains an emerging hypothesis rather than a general principle. The claims in this section should be moderated accordingly.

      We thank you for this valuable feedback. In the revised discussion, we have moderated the statements in the opening paragraph to better reflect the current understanding. Specifically, we now acknowledge that transcriptional bursting represents one of multiple sources of variability and is not always the dominant contributor. In addition, we have reframed the role of transcriptional bursting in DNA damage response as an emerging hypothesis, rather than a general principle. To further address this concern, we replaced conclusion-like statements with more cautious, hypothesis-oriented phrasing, presenting our observations as potential directions for future experimental validation.

      References

      Maizels, R.J. 2024. A dynamical perspective: moving towards mechanism in single-cell transcriptomics. Philos Trans R Soc Lond B Biol Sci 379: 20230049. DOI: https://dx.doi.org/10.1098/rstb.2023.0049, PMID: 38432314

      Sukys, A., Grima, R. 2025. Cell-cycle dependence of bursty gene expression: insights from fitting mechanistic models to single-cell RNA-seq data. Nucleic Acids Research 53. DOI: https://dx.doi.org/10.1093/nar/gkaf295, PMID: 40240003

      Garrido-Rodriguez, M., Lopez-Lopez, D., Ortuno, F.M., Peña-Chilet, M., Muñoz, E., Calzado, M.A., Dopazo, J. 2021. A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways. PLoS Computational Biology 17: e1008748. DOI: https://dx.doi.org/10.1371/journal.pcbi.1008748, PMID: 33571195

      Kim, E.M., Burke, D.J. 2008. DNA damage activates the SAC in an ATM/ATR-dependent manner, independently of the kinetochore. PLoS Genet 4: e1000015. DOI: https://dx.doi.org/10.1371/journal.pgen.1000015, PMID: 18454191

      Lawrence, K.S., Chau, T., Engebrecht, J. 2015. DNA damage response and spindle assembly checkpoint function throughout the cell cycle to ensure genomic integrity. PLoS Genet 11: e1005150.DOI: https://dx.doi.org/10.1371/journal.pgen.1005150, PMID: 25898113

      Mantel, C., Guo, Y., Lee, M.R., Kim, M.K., Han, M.K., Shibayama, H., Fukuda, S., Yoder, M.C., Pelus, L.M., Kim, K.S., Broxmeyer, H.E. 2007. Checkpoint-apoptosis uncoupling in human and mouse embryonic stem cells: a source of karyotpic instability. Blood 109: 4518-4527. DOI: https://dx.doi.org/10.1182/blood-2006-10-054247, PMID: 17289813

      Tang, W., Jørgensen, A.C.S., Marguerat, S., Thomas, P., Shahrezaei, V. 2023. Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics. Bioinformatics 39. DOI: https://dx.doi.org/10.1093/bioinformatics/btad395, PMID: 37354494

      Luo, S., Zhang, Z., Wang, Z., Yang, X., Chen, X., Zhou, T., Zhang, J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Royal Society Open Science 10: 221057. DOI: https://dx.doi.org/10.1098/rsos.221057, PMID: 37035293

      Larsson, A.J.M., Johnsson, P., Hagemann-Jensen, M., Hartmanis, L., Faridani, O.R., Reinius, B., Segerstolpe, A., Rivera, C.M., Ren, B., Sandberg, R. 2019. Genomic encoding of transcriptional burst kinetics. Nature 565: 251-254. DOI: https://dx.doi.org/10.1038/s41586-018-0836-1, PMID: 30602787

      Gu, J., Laszik, N., Miles, C.E., Allard, J., Downing, T.L., Read, E.L. 2025. Scalable inference and identifiability of kinetic parameters for transcriptional bursting from single cell data. Bioinformatics. DOI: https://dx.doi.org/10.1093/bioinformatics/btaf581, PMID: 41131798.

    1. eLife Assessment

      This study provides important insights into mural cell dynamics and vascular pathology using a zebrafish model of cerebral small vessel disease. The authors present convincing evidence that partial loss of foxf2 function results in progressive, cell-autonomous defects in pericytes accompanied by endothelial abnormalities across the lifespan. By leveraging advanced in vivo imaging and genetic approaches, the work establishes zebrafish as a powerful and relevant model for dissecting the cellular mechanisms underlying cerebral small vessel disease.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by Graff et al. investigates the function of foxf2 in zebrafish to understand the progression of cerebral small vessel disease. The authors use a partial loss of foxf2 (zebrafish possess two foxf2 genes, foxf2a and foxf2b, and the authors mainly analyze homozygous mutants in foxf2a) to investigate the role of foxf2 signaling in regulating pericyte biology. The find that the number of pericytes is reduced in foxf2a mutants and that the remaining pericytes display alterations in their morphologies. The authors further find that mutant animals can develop to adulthood but that in adult animals, both endothelial and pericyte morphologies are affected. They also show that mutant pericytes can partially repopulate the brain after genetic ablation.

      Strengths:

      The paper is well written and easy to follow. The authors now include pericyte marker gene analysis and solid quantifications of the observed phenotypes.

      Weaknesses:

      None left.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the developmental and lifelong consequences of reduced foxf2 dosage in zebrafish, a gene associated with human stroke risk and cerebral small vessel disease (CSVD). The authors show that a ~50% reduction in foxf2 function through homozygous loss of foxf2a leads to a significant decrease in brain pericyte number, along with striking abnormalities in pericyte morphology-including enlarged soma and extended processes-during larval stages. These defects are not corrected over time but instead persist and worsen with age, ultimately affecting the surrounding endothelium. The study also makes an important contribution by characterizing pericyte behavior in wild-type zebrafish using a clever pericyte-specific Brainbow approach, revealing novel interactions such as pericyte process overlap not previously reported in mammals.

      Strengths:

      This work provides mechanistic insight into how subtle, developmental changes in mural cell biology and coverage of the vasculature can drive long-term vascular pathology. The authors make strong use of zebrafish imaging tools, including longitudinal analysis in transgenic lines to follow pericyte number and morphology over larval development and then applied tissue clearing and whole brain imaging at 3 and 11 months to further dissect the longitudinal effects of foxf2a loss. The ability to track individual pericytes in vivo reveals cell-intrinsic defects and process degeneration with high spatiotemporal resolution. Their use of a pericyte-specific Zebrabow line also allows, for the first time, detailed visualization of pericyte-pericyte interactions in the developing brain, highlighting structural features and behaviors that challenge existing models based on mouse studies. Together, these findings make the zebrafish a valuable model for studying the cellular dynamics of CSVD.

      Weaknesses:

      I originally suggested quantifying pericyte coverage across brain regions to address potential lineage-specific effects due to the distinct developmental origins of forebrain (neural crest-derived) and hindbrain (mesoderm-derived) pericytes. However, I appreciate the authors' response referencing recent work from their lab (Ahuja, 2024), which demonstrates that both neural crest and mesoderm contribute to pericyte lineages in the midbrain and hindbrain. The convergence of these lineages into a shared transcriptional state by 30 hpf, as shown by their single-cell RNA-seq data, makes it unlikely that regional quantification would provide meaningful lineage-specific insight. I agree with the authors that lineage tracing experiments often suffer from low sample sizes, and their updated findings challenge earlier compartmental models of pericyte origin. I therefore appreciate their rationale for not pursuing regional quantification and consider this concern addressed. Furthermore, my other two points regarding quantification of foxf2 levels and overall vascular changes have been thoroughly addressed in the revised manuscript. These additions significantly strengthen the paper's conclusions and improve the overall rigor of the study.

    4. Reviewer #3 (Public review):

      Summary:

      The goal of the work by Graff, et al. is to model CSVD in the zebrafish using foxf2a mutants. The mutants show loss of cerebral pericyte coverage that persists through adulthood, but it seems foxf2a does not regulate the regenerative capacity of these cells. The findings are interesting and build on previous work from the group. Limitations of the work include little mechanistic insight into how foxf2a alters pericyte recruitment/differentiation/survival/proliferation in this context, and the overlap of these studies with previous work in fox2a/b double mutants. However, the data analysis is clean and compelling and the findings will contribute to the field.

      Comments on revisions:

      The authors have addressed all of my original concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents valuable findings that advance our understanding of mural cell dynamics and vascular pathology in a zebrafish model of cerebral small vessel disease. The authors provide compelling evidence that partial loss of foxf2 function leads to progressive, cell-intrinsic defects in pericytes and associated endothelial abnormalities across the lifespan, leveraging powerful in vivo imaging and genetic tools. The strength of evidence could be further improved by additional mechanistic insight and quantitative or lineage-tracing analyses to clarify how pericyte number and identity are affected in the mutant model.

      Thank you to the reviewers for insightful comments and for the time spent reviewing the manuscript. We have strengthened the data through responding to the comments.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Graff et al. investigates the function of foxf2 in zebrafish to understand the progression of cerebral small vessel disease. The authors use a partial loss of foxf2 (zebrafish possess two foxf2 genes, foxf2a and foxf2b, and the authors mainly analyze homozygous mutants in foxf2a) to investigate the role of foxf2 signaling in regulating pericyte biology. They find that the number of pericytes is reduced in foxf2a mutants and that the remaining pericytes display alterations in their morphologies. The authors further find that mutant animals can develop to adulthood, but that in adult animals, both endothelial and pericyte morphologies are affected. They also show that mutant pericytes can partially repopulate the brain after genetic ablation.

      (1) Weaknesses: The results are mainly descriptive, and it is not clear how they will advance the field at their current state, given that a publication on mice has already examined the loss of foxf2 phenotype on pericyte biology (Reyahi, 2015, Dev. Cell).

      The Reyahi paper was the earliest report of foxf2 mutant brain pericytes and remains illuminating. The work was very well technically executed. Our manuscript expands and at times, contradicts, their findings. We realized that we did not fully discuss this in our discussion, and this has now been updated. The biggest difference between the two studies is in the direction of change in pericytes after foxf2 knockout, a major finding in both papers. This is where it is important to understand the differences in methods. Reyahi et al., used a conditional knockout under Wnt1:Cre which will ablate pericytes derived from neural crest, but not those derived from mesoderm, nor will it affect foxf2 expression in endothelial cells. Our model is a full constitutive knockout of the gene in all brain pericytes and endothelial cells. For GOF, Reyahi used a transgenic model with a human FOXF2 BAC integrated into the mouse germline.

      Both studies are important. We do not know enough about human phenotypes in patients with strokeassociated human FOXF2 SNVs to know the direction of change in pericyte numbers. We showed that the SNVs reduce FOXF2 gene expression in vitro (Ryu, 2022). Here we demonstrate dosage sensitivity in fish (showing phenotypes when 1 of 4 foxf2a + foxf2b alleles are lost, Figure 1F), supporting that slight reductions of FOXF2 in humans could lead to severe brain vessel phenotypes. For this reason, our work is complementary to the previously published work and suggests that future studies should focus on understanding the role of dosage, cell autonomy, and human pericyte phenotypes with respect to FOXF2. While some experiments are parallel in mouse and fish, we go further to look at cell death and regeneration, and to understand the consequences on the whole brain vasculature.

      (2) Reyahi et al. showed that loss of foxf2 in mice leads to a marked downregulation of pdgfrb expression in perivascular cells. In contrast to expectation, perivascular cell numbers were higher in mutant animals, but these cells did not differentiate properly. The authors use a transgenic driver line expressing gal4 under the control of the pdgfrb promoter and observe a reduction in pericyte (pdgfrb-expressing) cells in foxf2a mutants. In light of the mouse data, this result might be due to a similar downregulation of pdgfrb expression in fish, which would lead to a downregulation of gal4 expression and hence reduced labelling of pericytes. The authors show a reduction of pdgfrb expression also in zebrafish in foxf2b mutants (Chauhan et al., The Lancet Neurology 2016).

      Reyahi detected more pericytes in the Wnt1:Cre mouse, while we detected fewer in the foxf2a (and foxf2a;foxf2b) mutants. This may be because of different methods. For instance, because the mouse knockout is not a constitutive Foxf2 knockout, the observed increase in pericytes may be because mesodermal-derived pericytes proliferate more highly when the neural crest-derived pericytes are absent. Or does endothelial foxf2 activate pericyte proliferation when foxf2 is lost in some pericytes? It is also possible that mouse foxf2 has a different role from its fish ortholog. Despite these differences, there are common conclusions from both models. For instance, both mouse and fish show foxf2 controls capillary pericyte numbers, albeit in different directions. Both show hemorrhage and loss of vascular stability as a result. Both papers identify the developmental window as critical for setting up the correct numbers of pericytes.  

      As the reviewer suggested, it was important to test whether pdgfrb is downregulated in fish as it is in mice. To do this, we measured expression of pdgfrb in foxf2 mutants using hybridization chain reaction (HCR) of pdgfrb in foxf2 mutants. The results show no change in pdgfrb mRNA in foxf2a mutants at two independent experiments (Fig S3). Independently, we integrated pdgfrb transgene intensity (using a single allele of the transgene so there are no dose effects) in foxf2a mutants vs. wildtype. We found no difference (Fig S3) suggesting that pdgfrb is a reliable reporter for counting pericytes in the foxf2a knockout. The reviewer is correct that we previously showed downregulation of pdgfrb in foxf2b mutants at 4 dpf using colorimetric ISH. foxf2a and foxf2b are unlinked, independent genes (~400 M years apart in evolution) and may have different regulation.

      (3) It would be important to clarify whether, also in zebrafish, foxf2a/foxf2b mutants have reduced or augmented numbers of perivascular cells and how this compares to the data in the mouse.  

      We discuss methodological differences between Reyahi and our work in point (1) above. The reduction in pericytes in foxf2a;foxf2b mutants has been previously published (Ryu, 2022, Supplemental Figure 1) and shown again here in Supplemental Figure 2). Numbers are reduced in double mutants up to 10 dpf, suggesting no recovery. Further, in response to reviewer comments, we have quantified pericytes in the whole fish brain (Figure 3E-G) and show reduced pericytes in the adult, reduced vessel network length, and importantly that the pericyte density is reduced. In aggregate, our data shows pericyte reduction at 5 developmental stages from embryo through adult. The reason for different results from the mouse is unknown and may reflect a technical difference (constitutive vs Wnt1:Cre) or a species difference.  

      (4) The authors should perform additional characterization of perivascular cells using marker gene expression (for a list of markers, see e.g., Shih et al. Development 2021) and/or genetic lineage tracing.

      This is a good point. We have added HCR analysis of additional markers. Results show co-expression of foxf2a, foxf2b, nduf4la2 and pdgfrb in brain pericytes (Fig 2, Fig S3).

      (5) The authors motivate using foxf2a mutants as a model of reduced foxf2 dosage, "similar to human heterozygous loss of FOXF2". However, it is not clear how the different foxf2 genes in zebrafish interact with each other transcriptionally. Is there upregulation of foxf2b in foxf2a mutants and vice versa? This is important to consider, as Reyahi et al. showed that foxf2 gene dosage in mice appears to be important, with an increase in foxf2 gene dosage (through transgene expression) leading to a reduction in perivascular cell numbers.

      We agree that dosage is a very important concept and show phenotypes in foxf2a heterozygotes (Fig 1F). To test the potential compensation from foxf2b, we have added qPCR for foxf2b in foxf2a mutants as well as HCR of foxf2b in foxf2a mutants (Fig S3C,D). There is no change in foxf2b expression in foxf2a mutants. We discuss dosage in our discussion.

      (6) Figures 3 and 4 lack data quantification. The authors describe the existence of vascular defects in adult fish, but no quantifiable parameters or quantifications are provided. This needs to be added.

      This query was technically challenging to address, but very worthwhile. We have not seen published methods for quantifying brain pericytes along with the vascular network (certainly not in zebrafish adults), so we developed new methods of analyzing whole brain vascular parameters of cleared adult brains (Figure S6) using a combination of segmentation methods for pericytes, endothelium and smooth muscle. We have added another author (David Elliott) as he was instrumental in designing methods. We find a significant decrease in vessel network length in foxf2a mutants at 3 month and 6 months (Figures 3F and 4G). Similarly, we show a lower number of brain pericytes in foxf2a mutants (Figure 3E). Finally, we added whole brain analysis of smooth muscle coverage (Figure 4) and show no change in vSMC number or coverage of vessels at 5 and 10 dpf or adult, respectively, pointing to pericytes being the cells most affected. Thank you, this query pushed us in a very productive direction. These methods will be extremely useful in the future!

      (7) The analysis of pericyte phenotypes and morphologies is not clear. On page 6, the authors state: "In the wildtype brain, adult pericytes have a clear oblong cell body with long, slender primary processes that extend from the cytoplasm with secondary processes that wrap around the circumference of the blood vessel." Further down on the same page, the authors note: "In wildtype adult brains, we identified three subtypes of pericytes, ensheathing, mesh and thin-strand, previously characterized in murine models." In conclusion, not all pericytes have long, slender primary processes, but there are at least three different sub-types? Did the authors analyze how they might be distributed along different branch orders of the vasculature, as they are in the mouse?

      We have reworded the text on page 5/6 to be clearer that embryonic pericytes are thin strand only. Additional pericyte subtypes develop later are seen in the mature vasculature of the adult. We could not find a way to accurately analyze pericyte subtypes in the adult brain. The imaging analysis to count pericytes used soma as machine learning algorithms have been developed to count nuclei but not analyze processes.

      (8) Which type of pericyte is affected in foxf2a mutant animals? Can the authors identify the branch order of the vasculature for both wildtype and mutant animals and compare which subtype of pericyte might be most affected? Are all subtypes of pericytes similarly affected in mutant animals? There also seems to be a reduction in smooth muscle cell coverage.

      Please see the response to (7) about pericyte subtypes. In response to the reviewer’s query, we have now analyzed vSMCs in the embryonic and adult brain. In the embryonic brain we see no statistical differences in vSMC number at 5 and 10 dpf (Figure 4). In the adult, vSMC length (total length of vSMCs in a brain) and vSMC coverage (proportion of brain vessels with vSMCs) are not significantly different. This data is important because it suggests that foxf2a has a more important role in pericytes than in vSMCs.

      (9) Regarding pericyte regeneration data (Figure 7): Are the values in Figure 7D not significantly different from each other (no significance given)?

      Any graphs missing bars have no significance and were left off for clarity. We have stated this in the statistical methods.  

      (10) In the discussion, the authors state that "pericyte processes have not been studied in zebrafish".

      Ando et al. (Development 2016) studied pericyte processes in early zebrafish embryos, and Leonard et al. (Development 2022) studied zebrafish pericytes and their processes in the developing fin. We apologize, this was not meant to say that pericyte processes had not been studied before, we have reworded this to make clear the intent of the sentence. We were trying to emphasize that we are the first to quantify processes at different stages, especially  in foxf2 mutants. Processes change morphology over development, especially after 5 dpf, something that our data captures. Our images are of stages that have not been previously characterized. We added a reference to Mae et al., who found similar process length changes in a mouse knockout of a different gene, and to Leonard who previously showed overlap of processes in a different context in fish.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the developmental and lifelong consequences of reduced foxf2 dosage in zebrafish, a gene associated with human stroke risk and cerebral small vessel disease (CSVD). The authors show that a ~50% reduction in foxf2 function through homozygous loss of foxf2a leads to a significant decrease in brain pericyte number, along with striking abnormalities in pericyte morphologyincluding enlarged soma and extended processes-during larval stages. These defects are not corrected over time but instead persist and worsen with age, ultimately affecting the surrounding endothelium. The study also makes an important contribution by characterizing pericyte behavior in wild-type zebrafish using a clever pericyte-specific Brainbow approach, revealing novel interactions such as pericyte process overlap not previously reported in mammals.

      Strengths:

      This work provides mechanistic insight into how subtle, developmental changes in mural cell biology and coverage of the vasculature can drive long-term vascular pathology. The authors make strong use of zebrafish imaging tools, including longitudinal analysis in transgenic lines to follow pericyte number and morphology over larval development, and then applied tissue clearing and whole brain imaging at 3 and 11 months to further dissect the longitudinal effects of foxf2a loss. The ability to track individual pericytes in vivo reveals cell-intrinsic defects and process degeneration with high spatiotemporal resolution. Their use of a pericyte-specific Zebrabow line also allows, for the first time, detailed visualization of pericytepericyte interactions in the developing brain, highlighting structural features and behaviors that challenge existing models based on mouse studies. Together, these findings make the zebrafish a valuable model for studying the cellular dynamics of CSVD.

      Weaknesses:

      (11) While the findings are compelling, several aspects could be strengthened. First, quantifying pericyte coverage across distinct brain regions (forebrain, midbrain, hindbrain) would clarify whether foxf2a loss differentially impacts specific pericyte lineages, given known regional differences in developmental origin, with forebrain pericytes being neural crest-derived and hindbrain pericytes being mesoderm-derived.

      In recently published work from our lab, we published that both neural crest and mesodermal cells contribute to pericytes in both the mid and hindbrain, and could not confirm earlier work suggesting more rigid compartmental origins (Ahuja, 2024). In the Ahuja, 2024 paper we noted that lineage experiments are often limited by n’s which is why this may not have been discovered before. This makes us skeptical that counting different regions will allow us to interpret data about neural crest and mesoderm. Further, Ahuja 2024 shows that pericyte intermediate progenitors from both mesoderm and neural crest are indistinguishable at 30 hpf through single cell sequencing and have converged on a common phenotype.  

      (12) Second, measuring foxf2b expression in foxf2a mutants would better support the interpretation that total FOXF2 dosage is reduced in a graded fashion in heterozygote and homozygote foxf2a mutants.

      We have done both qPCR for foxf2b in foxf2a mutants and HCR (quantitative ISH). This is now reported in Fig S3. 

      (13) Finally, quantifying vascular density in adult mutants would help determine whether observed endothelial changes are a downstream consequence of prolonged pericyte loss. Correlating these vascular changes with local pericyte depletion would also help clarify causality.

      We have added this data to Figure 3 and 4. Please also see response (6).

      Reviewer #3 (Public review):

      Summary:

      The goal of the work by Graff et al. is to model CSVD in the zebrafish using foxf2a mutants. The mutants show loss of cerebral pericyte coverage that persists through adulthood, but it seems foxf2a does not regulate the regenerative capacity of these cells. The findings are interesting and build on previous work from the group. Limitations of the work include little mechanistic insight into how foxf2a alters pericyte recruitment/differentiation/survival/proliferation in this context, and the overlap of these studies with previous work in fox2a/b double mutants. However, the data analysis is clean and compelling, and the findings will contribute to the field.

      (14) Please make Figures 5C and 5E red-green colorblind friendly.

      Thank you. We have changed the colors to light blue and yellow to be colorblind friendly.

      Reviewer #3 (Recommendations for the authors):

      (15) I'm not sure this reviewer totally agrees with the assessment that foxf2a loss of function, while foxf2b remains normal, is the same as FOXF2 heterozygous loss of function in humans. The discussion of the gene dosage needs to be better framed, and the authors should carry out qPCR to show that foxf2b levels are not altered in the foxf2a mutant background.

      We have added data on foxf2b expression in foxf2a mutants to Fig S3. We have updated the results.

      (16) Figure 4/SF7- is the aneurysm phenotype derived from the ECs or pericytes? Cell-type-specific rescues would be interesting to determine if phenotypes are rescued, especially the developmental phenotypes (it is appreciated that carrying out rescue experiments until adulthood is complex). When is the earliest time point that aneurysm-like structures are seen?

      This is a fascinating question, especially as we show that endothelial cells (vessel network length) are affected in the adult mutants. The foxf2a mutants that we work with here are constitutive knockouts. While a strategy to rescue foxf2a in specific lineages is being developed in the laboratory this will require a multi-generation breeding effort to get drivers, transgenes and mutants on the same background, and these fish are not currently available. Thank you for this comment- it is something we want to follow up on.

      (17) Figure 5 - This is very nice analysis.

      Thank you! We think it is informative too.

      (18) Figure 6 - needs to contain control images

      We have added wildtype images to figure 6A.

      (19) Figure 7- vessel images should be shown to demonstrate the specificity of NTR treatment to the pericytes.

      We have added the vessel images to Figure 7. We apologize for the omission.

    1. eLife Assessment

      This valuable study uses fiber photometry, implantable lenses, and optogenetics, to show that a subset of subthalamic nucleus neurons are active during movement, and that active but not passive avoidance depends in part on STN projections to substantia nigra. The strength of the evidence for these claims is solid and this paper may be of interest to basic and applied behavioural neuroscientists working on movement or avoidance.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a robust set of experiments that provide new insights into the role of STN neurons during active and passive avoidance tasks. These forms of avoidance have received comparatively less attention in the literature than the more extensively studied escape or freezing responses, despite being extremely relevant to human behaviour and more strongly influenced by cognitive control.

      Strengths:

      Understanding the neural infrastructure supporting avoidance behaviour would be a fundamental milestone in neuroscience. The authors employ sophisticated methods to delineate the role of STN neurons during avoidance behaviours. The work is thorough and the evidence presented is compelling. Experiments are carefully constructed, well-controlled, and the statistical analyses are appropriate.

    3. Reviewer #2 (Public review):

      Summary:

      Zhou, Sajid et al. present a study investigating the STN involvement in signaled movement. They use fiber photometry, implantable lenses, and optogenetics during active avoidance experiments to evaluate this. The data are useful for the scientific community and the overall evidence for their claims is solid, but many aspects of the findings are confusing. The authors present a huge collection of data, it is somewhat difficult to extract the key information and the meaningful implications resulting from these data.

      Strengths:

      The study is comprehensive in using many techniques and many stimulation powers and frequencies and configurations.

    4. Reviewer #3 (Public review):

      Summary:

      The authors use calcium recordings from STN to measure STN activity during spontaneous movement and in a multi-stage avoidance paradigm. They also use optogenetic inhibition and lesion approaches to test the role of STN during the avoidance paradigm. The paper reports a large amount of data and makes many claims, some seem well supported to this Reviewer, others not so much.

      Strengths:

      Well-supported claims include data showing that during spontaneous movements, especially contraversive ones, STN calcium activity is increased using bulk photometry measurements. Single-cell measures back this claim but also show that it is only a minority of STN cells that respond strongly, with most showing no response during movement, and a similar number showing smaller inhibitions during movement.

      Photometry data during cued active avoidance procedures show that STN calcium activity sharply increases in response to auditory cues, and during cued movements to avoid a footshock. Optogenetic and lesion experiments are consistent with an important role for STN in generating cue-evoked avoidance. And a strength of these results is that multiple approaches were used.

      [Editors' note: The authors provided a good explanation regarding the difference between interpreting 'caution' in the healthy vs impaired situation, and this addressed one of the remaining major concerns from the last round of review.]

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      One possible remaining conceptual concern that might require future work is determining whether STN primarily mediates higher-level cognitive avoidance or if its activation primarily modulates motor tone.

      Our results using viral and electrolytic lesions (Fig. 11) and optogenetic inhibition of STN neurons (Fig. 10) show that signaled active avoidance is virtually abolished, and this effect is reproduced when we selectively inhibit STN fibers in the midbrain (Fig. 12). Inhibition of STN projections in either the substantia nigra pars reticulata (SNr) or the midbrain reticular tegmentum (mRt) eliminates cued avoidance responses while leaving escape responses intact. Importantly, mice continue to escape during US presentation after lesions or during photoinhibition, demonstrating that basic motor capabilities and the ability to generate rapid defensive actions are preserved.

      These findings argue against the idea that STN’s role in avoidance reflects a nonspecific suppression or facilitation of motor tone, even if the STN also contributes to general movement control. Instead, they show that STN output is required for generating “cognitively” guided cued actions that depend on interpreting sensory information and applying learned contingencies to decide when to act. Thus, while STN activity can modulate movement parameters, the loss-of-function results point to a more selective role in supporting cued, goal-directed avoidance behavior rather than a general adjustment of motor tone.

      Reviewer #2 (Public review):

      All previous weaknesses have been addressed. The authors should explain how inhibition of the STN impairing active avoidance is consistent with the STN encoding cautious action. If 'caution' is related to avoid latency, why does STN lesion or inhibition increase avoid latency, and therefore increase caution? Wouldn't the opposite be more consistent with the statement that the STN 'encodes cautious action'?

      The reviewer’s interpretation treats any increase in avoidance latency as evidence of “more caution,” but this holds only when animals are performing the avoidance behavior normally. In our intact animals, avoidance rates remain high across AA1 → AA2 → AA3, and the active avoidance trials (CS1) used to measure latency are identical across tasks (e.g., in AA2 the only change is that intertrial crossings are punished). Under these conditions, changes in latency genuinely reflect adjustments in caution, because the behavior itself is intact, actions remain tightly coupled to the cue, and the trials are identical.

      This logic does not apply when STN function is disrupted. STN inhibition or lesions reduce avoidance to near chance levels; the few crossings that do occur are poorly aligned to the CS and many likely reflect random movement rather than a cued avoidance response. Once performance collapses, latency can no longer be assumed to reflect the same cognitive process. Thus, interpreting longer latencies during STN inactivation as “more caution” would be erroneous, and we never make that claim.

      A simple analogy may help clarify this distinction. Consider a pedestrian deciding when to cross the street after a green light. If the road is deserted (like AA1), the person may step off the curb quickly. If the road is busy with many cars that could cause harm (like AA2), they may wait longer to ensure that all cars have stopped. This extra hesitation reflects caution, not an inability to cross. However, if the pedestrian is impaired (e.g., cannot clearly see the light, struggles to coordinate movements, or cannot reliably make decisions), a delayed crossing would not indicate greater caution—it would reflect a breakdown in the ability to perform the behavior itself. The same principle applies to our data: we interpret latency as “caution” only when animals are performing the active avoidance behavior normally, success rates remain high, and the trial rules are identical. Under STN inhibition or lesion, when active avoidance collapses, the latency of the few crossings that still occur can no longer be interpreted as reflecting caution. We have added these points to the Discussion.

      Reviewer #3 (Public review):

      Original Weaknesses:

      I found the experimental design and presentation convoluted and some of the results over-interpreted.

      We appreciate the reviewer’s comment, but the concern as stated is too general for us to address in a concrete way. The revised manuscript has been substantially reorganized, with simplified terminology, streamlined figures, and removal of an entire set of experiments to avoid over-interpretation. We are confident that the experimental design and results are now presented clearly and without extrapolation beyond the data. If there are specific points the reviewer finds convoluted or over-interpreted, we would be happy to address them directly.

      As presented, I don't understand this idea that delayed movement is necessarily indicative of cautious movements. Is the distribution of responses multi-modal in a way that might support this idea; or do the authors simply take a normal distribution and assert that the slower responses represent 'caution'? Even if responses are multi-modal and clearly distinguished by 'type', why should readers think this that delayed responses imply cautious responding instead of say: habituation or sensitization to cue/shock, variability in attention, motivation, or stress; or merely uncertainty which seems plausible given what I understand of the task design where the same mice are repeatedly tested in changing conditions. This relates to a major claim (i.e., in the title).

      We appreciate the reviewer’s question and address each component directly.

      (1) What we mean by “caution” and how it is operationalized

      In our study, caution is defined operationally as a systematic increase in avoidance latency when the behavioral demand becomes higher, while the trial structure and required response remain unchanged. Specifically, CS1 trials are identical in AA1, AA2, and AA3. Thus, when mice take longer to initiate the same action under more demanding contexts, the added time reflects additional evaluation before acting—consistent with longestablished interpretations of latency shifts in cognitive psychology (see papers by Donders, Sternberg, Posner) and interpretations of deliberation time in speed-accuracy tradeoff literature.

      (2) Why this interpretation does not rely on multi-modal response distributions We do not claim that “cautious” responses form a separate mode in the latency distribution. The distributions are unimodal, and caution is inferred from conditiondependent shifts in these distributions across identical trials, not from the existence of multiple peaks (see Zhou et al, 2022). Latency shifts across conditions with identical trial structure are widely used as behavioral indices of deliberation or caution.

      (3) Why alternative explanations (habituation/sensitization, motivation, attention, stress, uncertainty) do not account for these latency changes

      Importantly, nothing changes in CS1 trials between AA1 and AA2 with respect to the cue, shock, or required response. Therefore:

      - Habituation/sensitization to the cue or shock cannot explain the latency shift (the stimuli and trial type are unchanged). We have previously examined cue-evoked orienting responses and their habituation in detail (Zhou et al., 2023), and those measurements are dissociable from the latency effects described here.

      - Motivation or attention are unlikely to change selectively for identical CS1 trials when the task manipulation only adds a contingency to intertrial crossings.

      - Uncertainty also does not increase for CS1 trials, they remain fully predictable and unchanged between conditions.

      - Stress is too broad a construct to be meaningful unless clearly operationalized; moreover, any stress differences that arise from task structure would covary with caution rather than replace the interpretation.

      (4) Clarifying “types” of responses

      The reviewer’s question about “response types” appears to conflate behavioral latencies with the neuronal response “types” defined in the manuscript. The term “type” in this paper refers to neuronal activation derived from movement-based clustering, not to distinct behavioral categories of avoidance, which we term modes.

      In sum, we interpret increased CS1 latency as “caution” only when performance remains intact and trial structure is identical between conditions; under those criteria, latency reliably reflects additional cognitive evaluation before acting, rather than nonspecific changes in sensory processing, motivation, etc.

      Related to the last, I'm struggling to understand the rationale for dividing cells into 'types' based their physiological responses in some experiments.

      There is longstanding precedent in systems neuroscience for classifying neurons by their physiological response patterns, because neurons that respond similarly often play similar functional roles. For example, place cells, grid cells, direction cells, in vivo, and regular spiking, burst firing, and tonic firing in vitro are all defined by characteristic activity patterns in response to stimuli rather than anatomy or genetics alone. In the same spirit, our classifications simply reflect clusters of neurons that exhibit similar ΔF/F dynamics around behaviorally relevant events, such as movement sensitivity or avoidance modes. This is a standard analytic approach used in many studies. Thus, our rationale is not arbitrary: the “classes” and “types” arise from data-driven clustering of physiological responses, consistent with widespread practice, and they help reveal functional distinctions within the STN that would otherwise remain obscured.

      In several figures the number of subjects used was not described. This is necessary. Also necessary is some assessment of the variability across subjects.

      All the results described include the number of animals. To eliminate uncertainty, we now also include this information in figure legends.

      The only measure of error shown in many figures relates trial-to-trial or event variability, which is minimal because in many cases it appears that hundreds of trials may have been averaged per animal, but this doesn't provide a strong view of biological variability (i.e., are results consistent across animals?).

      The concern appears to stem from a misunderstanding of what the mixed-effects models quantify. The figure panels often show session-averaged traces for clarity, all statistical inferences in the paper are made at the level of animals, not trials. Mixed-effects modeling is explicitly designed for hierarchical datasets such as ours, where many trials are nested within sessions, which are themselves nested within animals.

      In our models, animal is the clustering (random) factor, and sessions are nested within animals, so variability across animals is directly estimated and used to compute the population-level effects. This approach is not only appropriate but is the most stringent and widely recommended method for analyzing behavioral and neural data with repeated measures. In other words, the significance tests and confidence intervals already fully incorporate biological variability across animals.

      Thus, although hundreds of trials per animal may be illustrated for visualization, the inferences reflect between-animal consistency, not within-animal trial repetition. The fact that the mixed-effects results are robust across animals supports the biological reliability of the findings.

      It is not clear if or how spread of expression outside of target STN was evaluated, and if or how or how many mice were excluded due to spread or fiber placements. Inadequate histological validation is presented and neighboring regions that would be difficult to completely avoid, such as paraSTN may be contributing to some of the effects.

      The STN is a compact structure with clear anatomical boundaries, and our injections were rigorously validated to ensure targeting specificity. As detailed in the Methods, every mouse underwent histological verification, and injections were quantified using the Brain Atlas Analyzer app (available on OriginLab), which we developed to align serial sections to the Allen Brain Atlas. This approach provides precise, slice-by-slice confirmation of viral spread. We have performed thousands of AAV injections and probe implants in our lab, incorporating over the years highly reliable stereotaxic procedures with multiple depth and angle checks and tools. For this study specifically, fewer than 10% of mice were excluded due to off-target expression or fiber/lesion placement. None of the included cases showed spread into adjacent structures.

      Regarding paraSTN: anatomically, paraSTN is a very small extension contiguous with STN. Our study did not attempt to dissociate subregions within STN, and the viral expression patterns we report fall within the accepted boundaries of STN. Importantly, none of our photometry probes or miniscope lenses sampled paraSTN, so contributions from that region are extremely unlikely to account for any of our neural activity results.

      Finally, our paper employs five independent loss-of-function approaches—optogenetic inhibition of STN neurons, selective inhibition of STN projections to the midbrain (in two sites: SNr and mRt), and STN lesions (electrolytic and viral). All methods converge on the same conclusion, providing strong evidence that the effects we report arise from manipulation of STN itself rather than from neighboring regions.

      Raw example traces are not provided.

      We do not think raw traces are useful here. All figures contain average traces to reflect the average activity of the estimated populations, which are already clustered per classes and types.

      The timeline of the spontaneous movement and avoidance sessions were not clear, nor the number of events or sessions per animal and how this was set. It is not clear if there was pre-training or habituation, if many or variable sessions were combined per animal, or what the time gaps between sessions was, or if or how any of these parameters might influence interpretation of the results.

      As noted, we have enhanced the description of the sessions, including the number of animals and sessions, which are daily and always equal per animals in each group of experiments. The sessions are part of the random effects in the model. In addition, we now include schematics to facilitate understanding of the procedures.  

      Comments on revised version:

      The authors removed the optogenetic stimulation experiments, but then also added a lot of new analyses. Overall the scope of their conclusions are essentially unchanged. Part of the eLife model is to leave it to the authors discretion how they choose to present their work. But my overall view of it is unchanged. There are elements that I found clear, well executed, and compelling. But other elements that I found difficult to understand and where I could not follow or concur with their conclusions.

      We respectfully disagree with the assertion that the scope of our conclusions remains unchanged. The revised manuscript differs in several fundamental ways:

      (1) Removal of all optogenetic excitation experiments

      These experiments were a substantial portion of the original manuscript, and their removal eliminated an entire set of claims regarding the causal control of cautious responding by STN excitation. The revised manuscript no longer makes these claims.

      (2) Addition of analyses that directly address the reviewers’ central concerns The new analyses using mixed-effects modeling, window-specific covariates, and movement/baseline controls were added precisely because reviewers requested clearer dissociation of sensory, motor, and task-related contributions. These additions changed not only the presentation but the interpretation of the neural signals. We now conclude that STN encodes movement, caution, and aversive signals in separable ways—not that it exclusively or causally regulates caution.

      (3) Clear narrowing of conclusions

      Our current conclusions are more circumscribed and data-driven than in the original submission. For example, we removed all claims that STN activation “controls caution,” relying instead on loss-of-function data showing that STN is necessary for performing cued avoidance—not for generating cautious latency shifts. This is a substantial conceptual refinement resulting directly from the review process.

      (4) Reorganization to improve clarity

      Nearly every section has been restructured, including terminology (mode/type/class), figure organization, and explanations of behavioral windows. These revisions were implemented to ensure that readers can follow the logic of the analyses.

      We appreciate the reviewer’s recognition that several elements were clear and compelling. For the remaining points they found difficult to understand, we have addressed each one in detail in the response and revised the manuscript accordingly. If there are still aspects that remain unclear, we would welcome explicit identification of those points so that we can clarify them further.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Show individual data points on bar plots

      - partially addressed. Individual data points are still not shown.

      Wherever feasible, we display individual data points (e.g., Figures 1 and 2) to convey variability directly. However, in cases where figures depict hundreds of paired (repeatedmeasures) data points, showing all points without connecting them would not be appropriate, while linking them would make the figures visually cluttered and uninterpretable. All plots and traces include measures of variability (SEM), and the raw data will be shared on Dryad. When error bars are not visible, they are smaller than the trace thickness or bar line—for example, in Figure 5B, the black circles and orange triangles include error bars, but they are smaller than the symbol size.

      Also, to minimize visual clutter, only a subset of relevant comparisons is highlighted with asterisks, whereas all relevant statistical results, comparisons, and mouse/session numbers are fully reported in the Results section, with statistical analyses accounting for the clustering of data within subjects and sessions.

      (2) The active avoidance experiments are confusing when they are introduced in the results section. More explanation of what paradigms were used and what each CS means at the time these are introduced would add clarity. For example AA1, AA2 etc are explained only with references to other papers, but a brief description of each protocol and a schematic figure would really help.

      - partially addressed. A schematic figure showing the timeline would still be helpful.

      As suggested, we have added an additional panel to Fig. 5A with a schematic describing

      AA1-3 tasks. In addition, the avoidance protocols are described briefly but clearly in the Results section (second paragraph of “STN neurons activate during goal-directed avoidance contingencies”) and in greater detail in the Methods section. As stated, these tasks were conducted sequentially, and mice underwent the same number of sessions per procedure, which are indicated. All relevant procedural information has been included in these sections. Mice underwent daily sessions and learnt these tasks within 1-2 sessions, progressing sequentially across tasks with an equal number of sessions per task (7 per task), and the resulting data were combined and clustered by mouse/session in the statistical models.

      (3) How do the Class 1, 2, 3 avoids relate to Class 1 , 2, 3 neural types established in Figure 3? It seems like they are not related, and if that is the case they should be named something different from each other to avoid confusion.

      -not sufficiently addressed. The new naming system of neural 'classes' and 'types' helps with understanding that these are completely different ways of separating subpopulations within the STN. However, it is still unclear why the authors re-type the neurons based on their relation to avoids, when they classify the neurons based on their relationship to speed earlier. And it is unclear whether these neural classes and neural types have anything to do with each other. Are the neural Types related to the neural classes in any way? and what is the overlap between neural types vs classes? Which separation method is more useful for functionally defining STN populations?

      The remaining confusion stems from treating several independent analyses as if they were different versions of the same classification. In reality, each analysis asks a distinct question, and the resulting groupings are not expected to overlap or correspond. We clarify this explicitly below.

      - Movement onset neuron classes (Class A, B, C; Fig. 3):

      These classes categorize neurons based on how their ΔF/F changes around spontaneous movement onset. This analysis identifies which neurons encode the initiation and direction of movement. For instance, Class B neurons (15.9%) were inhibited as movement slowed before onset but did not show sharp activation at onset, whereas Class C neurons (27.6%) displayed a pronounced activation time-locked to movement initiation. Directional analyses revealed that Class C neurons discharged strongly during contraversive turns, while Class B neurons showed a weaker ipsiversive bias. Because neurons were defined per session and many of these recordings did not include avoidance-task sessions, these movement-onset classes were not used in the avoidance analyses.

      - Movement-sensitivity neuron classes (Class 1, 2, 3, 4; Fig. 7):

      These classes categorize neurons based on the cross-correlation between ΔF/F and head speed, capturing how each neuron’s activity scales with movement features across the entire recording session. This analysis identifies neurons that are strongly speed-modulated, weakly speed-modulated, or largely insensitive to movement. These movement-sensitivity classes were then carried forward into the avoidance analyses to ask how neurons with different kinematic relationships participate during task performance; for example, whether neurons that are insensitive to movement nonetheless show strong activation during avoidance actions.

      - Avoidance modes (Mode 1, 2, 3; Fig. 8)

      Here we classify actions, not neurons. K-means clustering is applied to the movementspeed time series during CS1 active avoidance trials only, which allows us to identify distinct action modes or variants—fast-onset versus delayed avoidance responses. This action-based classification ensures that we compare neural activity across identical movements, eliminating a major confound in studies that do not explicitly separate action variants. First, we examine how population activity differs across these avoidance modes, reflecting neural encoding of the distinct actions themselves. Second, within each mode, we then classify neurons into “types,” which simply describes how different neurons activate during that specific avoidance action (as noted next).

      - Neuron activation types within each mode (Type a, b, c; Fig.9)

      This analysis extends the mode-based approach by classifying neuronal activation patterns only within each specific avoidance mode. For each mode, we apply k-means clustering to the ΔF/F time series to identify three activation types—e.g., neurons showing little or no response, neurons showing moderate activation, and neurons showing strong or sharply timed activation. Because all trials within a mode have identical movement profiles, these activation types capture the variability of neural responses to the same avoidance behavior. Importantly, these activation “types” (a, b,

      c) are not global neuron categories. They do not correspond to, nor are they intended to map onto, the movement-based neuron classes defined earlier. Instead, they describe how neurons differ in their activation during a particular behavioral mode—that is, within a specific set of behaviorally matched trials. Because modes are defined at the trial level, the neurons contributing to each mode can differ: some neurons have trials belonging to one mode, others to two or all three. Thus, Type a/b/c groupings are not fixed properties of neurons. To prevent confusion, we refer to them explicitly as neuronal activation types, emphasizing that they characterize mode-specific response patterns rather than global cell identities.

      In conclusion, the categorizations serve entirely different analytical purposes and should not be interpreted as competing classifications. The mode-specific “types” do not reclassify or replace the movement-sensitivity classes; they capture how neurons differ within a single, well-defined avoidance action, while the movement classes reflect how neurons relate to movements in general. Each classification relates to different set of questions and overlap between them is not expected.

      To make this as clear as possible we added the following paragraph to the Results:  

      “To avoid confusion between analyses, it is important to note that the movement-sensitivity classes defined here (Class 1–4; Fig. 7) are conceptually distinct from both the movementonset classes (Class A–C; Fig. 3) and the neuronal activation “types” introduced later in the avoidance-mode analysis. The Class 1–4 grouping reflects how neurons relate to movement across the entire session, based on their cross-correlation with speed. The onset classes A–C capture neural activity specifically around spontaneous movement initiation during general exploration. In contrast, the later activation “types” are derived within each avoidance mode and describe how neurons differ in their activation patterns during identical CS1 avoidance responses. These classifications answer different questions about STN function and are not intended to correspond to one another.”

      (4) Similarly having 3 different cell types (a,b,c) in the active avoidance seems unrelated to the original classification of cell types (1,2,3), and these are different for each class of avoid. This is very confusing and it is unclear how any of these types relate to each other. Presumable the same mouse has all three classes of avoids, so there are recording from each cell during each type of avoid. So the authors could compare one cell during each avoid and determine whether it relates to movement or sound or something else. It is interesting that types a,b,c have the exact same proportions in each class of avoid, and really makes it important to investigate if these are the exact same cells or not. Also, these mice could be recorded during open field so the original neural classification (class 1, 2,3) could be applied to these same cells and then the authors can see whether each cell type defined in the open field has different response to the different avoid types. As it stands, the paper simply finds that during movement and during avoidance behaviors different cells in the STN do different things. - Similarly, the authors somewhat addressed the neural types issue, but figure 9 still has 9 different neural types and it is unclear whether the same cells that are type 'a' in mode 1 avoids are also type 'a' in mode 2 avoids, or do some switch to type b? Is there consistency between cell types across avoid modes? The authors show that type 'c' neurons are differentially elevated in mode 3 vs 2, but also describes neurons as type '2c' and statistically compare them to type '1c' neurons. Are these the same neurons? or are type 2c neurons different cells vs type 1c neurons? This is still unclear and requires clarification to be interpretable.

      We believe the remaining confusion arises from treating the different classification schemes as if they were alternative labels applied to the same neurons, when in fact they serve entirely separate analytical purposes and may not include the same neurons (see previous point). Because these classifications answer different questions, they are not expected to overlap, nor is overlap required for the interpretations we draw. It is therefore not appropriate to compare a neuron’s “type” in one avoidance mode to its movement class, or to ask whether types a/b/c across different modes are “the same cells,” since modes are defined by trial-level movement clustering rather than by neuron identity. Importantly, Types a/b/c are not intended as a new global classification of neurons; they simply summarize the variability of neuronal responses within each behaviorally matched mode. We agree that future studies could expand our findings, but that is beyond the already wide scope of the present paper. Our current analyses demonstrate a key conceptual point: when movement is held constant (via modes), STN neurons still show heterogeneous, outcome- and caution-related patterns, indicating encoding that cannot be reduced to movement alone.

      Relatedly, was the association with speed used to define each neural "class" done in the active avoidance context or in a separate (e.g. open field) experiment? This is not clear in the text.

      The cross-correlation classes were derived from the entire recording session, which included open-field and avoidance tasks recordings. The tasks include long intertrial periods with spontaneous movements. We found no difference in classes when we include only a portion of the session, such as the open field or if we exclude the avoidance interval where actions occur.

      Finally, in figure 7, why is there a separate avoid trace for each neural class? With the GRIN lens, the authors are presumably getting a sample of all cell types during each avoid, so why do the avoids differ depending on the cell type recorded?

      The entire STN population is not recorded within a single session; each session contributes only a subset of neurons to the dataset. Consequently, each neural class is composed of neurons drawn from partially non-overlapping sets of sessions, each with its own movement traces. For this reason, we plot avoidance traces separately for each neural class to maintain strict within-session correspondence between neural activity and the behavior collected in the same sessions. This prevents mixing behavioral data across sessions that did not contribute neurons to that class and ensures that all neural– behavioral comparisons remain appropriately matched. We have clarified this rationale in the revised manuscript. We note that averaging movement across classes—as is often done—would obscure these distinctions and would not preserve the necessary correspondence between neural activity and behavior. This is also clarified in Results.

      (5) The use of the same colors to mean two different things in figure 9 is confusing. AA1 vs AA2 shouldn't be the same colors as light-naïve vs light signaling CS.

      -addressed, but the authors still sometimes use the same colors to mean different things in adjacent figures (e.g. the red, blue, black colors in figure 1 and figure 2 mean totally different things) and use different colors within the same figure to represent the same thing (Figure 9AB vs Figure 9CD). This is suboptimal.

      Following the reviewer’s suggestion, in Figure 2, we changed the colors, so readers do not assume they are related to Fig. 1.

      In Figure 9, we changed the colors in C,D to match the colors in A,B.

      (6) The exact timeline of the optogenetics experiments should be presented as a schematic for understandability. It is not clear which conditions each mouse experienced in which order. This is critical to the interpretation of figure 9 and the reduction of passive avoids during STN stimulation. Did these mice have the CS1+STN stimulation pairing or the STN+US pairing prior to this experiment? If they did, the stimulation of the STN could be strongly associated with either punishment or with the CS1 that predicts punishment. If that is the case, stimulating the STN during CS2 could be like presenting CS1+CS2 at the same time and could be confusing. The authors should make it clear whether the mice were naïve during this passive avoid experiment or whether they had experienced STN stimulation paired with anything prior to this experiment.

      -addressed

      (7) Similarly, the duration of the STN stimulation should be made clear on the plots that show behavior over time (e.g. Figure 9E).

      -addressed

      (8) There is just so much data and so many conditions for each experiment here. The paper is dense and difficult to read. It would really benefit readability if the authors put only the key experiments and key figure panels in the main text and moved much of the repetative figure panels to supplemental figures. The addition of schematic drawings for behavioral experiment timing and for the different AA1, AA2, AA3 conditions would also really improve clarity.

      -partially addressed. The paper is still dense and difficult to read. No experimental schematics were added.

      As suggested, we now added the schematic to Fig. 5A.  

      New Comments:

      (9) Description of the animals used and institutional approval are missing from the methods.

      The information on animal strains and institutional approval is already included in the manuscript. The first paragraph of the Methods section states:

      “… All procedures were reviewed and approved by the institutional animal care and use committee and conducted in adult (>8 weeks) male and female mice. …”

      Additionally, the next subsection, “Strains and Adeno-Associated Viruses (AAVs),” fully specifies all mouse lines used. We therefore believe that the required descriptions of animals and institutional approval are already present and meet standard reporting.

    1. eLife Assessment

      The authors combine a modeling approach, using a digital twin, with electrophysiological evidence in two species to assess the role of inhibition in shaping selectivity in the visual cortex. The results provide an important advance beyond the classic view of sensory coding by proving compelling evidence that many neurons in visual areas exhibit dual-feature selectivity. Overall, the work exceptionally showcases how in silico experiments can generate concrete hypotheses about neuronal coding that are difficult to discover experimentally.

    2. Reviewer #1 (Public review):

      This manuscript used deep learning to highlight the role of inhibition in shaping selectivity in primary and higher visual cortex. The findings hint at hitherto unknown axes of structured inhibition operating in cortical networks with a potentially key role in object recognition.

      The multi-species approach of testing the model in macaque and mouse is excellent, as it improves the chances that the observed findings are a general property of mammalian visual cortex. However, it would be useful to delineate any notable differences between these species, which are to be expected given their lifestyle.

      The overall performance of the model appears to be excellent in V1, with over 80% performance, but it falls substantially in V4. It would be important to consider the implications of this finding; for example, in the context of studying temporal lobe structures that are central to recognizing objects. Would one expect that model performance decreases further here, and what measures could be taken to avoid this? Or is this type of model better restricted to V1 or even LGN?

      While the manuscript delineates novel axes of inhibitory interactions, it remains unclear what exactly these axes are and how they arise. What are the steps that need to be taken to make progress along these lines?

    3. Reviewer #2 (Public review):

      The classic view of sensory coding states that (excitatory) neurons are active to some preferred stimuli and otherwise silent. In contrast, inhibitory neurons are considered broadly tuned. Due to the gigantic potential image space, it is hard to comprehensively map the tuning of individual neurons. In this tour de force study, Franke et al. combine electrophysiological recordings in macaque (V1, V4) and mouse (V1, LM, LI) visual cortex with large-scale screens based on digital twin models, as well as beautiful systems identification (most/least activating stimuli). Based on these digital twins, they discover dual-feature selectivity (which they validate both in macaques and mice). Dual-feature selectivity involves a bidirectional modulation of firing rates around an elevated baseline. Neurons are excited by specific preferred features and systematically suppressed by distinct, non-preferred features. This tuning was identified by excellently combining advances in AI & high-throughput ephys.

      The study is comprehensive and convincing. Overall, this work showcases how in silico experiments can generate concrete hypotheses about neuronal coding that are difficult to discover experimentally, but that can be experimentally validated! I think this work is of substantial interest to the neuroscience community. I'm sure it will motivate many future experimental and computational studies. In particular, it will be of great interest to understand when and how the brain leverages dual-feature selectivity. The discussion of the article is already an interesting starting point for these considerations.

      Strengths:

      (1) Using computational models to predict neuronal responses allowed them to go through millions of images, which may not be possible in vivo.

      (2) The cross-species and cross-area consistency of the results is another major strength. Pointing out that the results may be a fundamental strategy of mammalian cortical processing.

      (3) They show that the feature causing peak excitation in one neuron often drives suppression in another. This may be an efficient coding scheme where the population covers the visual manifold. I'd like to understand better why the authors believe that this shows that there are low-dimensional subspaces based on preferred and non-preferred stimulus features (vs. many more, but some axes are stronger).

    4. Author response:

      We thank the reviewers for their constructive and helpful feedback on our manuscript. We are delighted that they found the study to be "comprehensive and convincing" and a "tour de force" in its combination of electrophysiological recordings with large-scale digital twin screening. We appreciate that the reviewers highlighted the strengths of our multi-species approach and the "cross-species and cross-area consistency" of the results, noting that the work showcases how in silico experiments can generate concrete, experimentally validatable hypotheses.

      The reviewers also raised several important points that we plan to address in the final version of the manuscript to improve clarity and interpretation. These center on:

      Model performance in V4: Reviewer #1 raised questions regarding the comparative drop in model performance in V4 and the implications for the validity of the results (including the use of "high confidence" neurons and a request for clarification on the number of animals in the V4 dataset).

      Species differences: Both reviewers noted the value of the macaque-mouse comparison but requested a more explicit delineation of the differences between these species given their distinct ethological niches.

      The nature of inhibitory dimensions: The reviewers asked for further details on how to identify these inhibitory dimensions and the specific relationship between excitation and inhibition. We believe unraveling these mechanisms represents an exciting direction for future work, and we will explicitly mention this in the Discussion section of the final manuscript, alongside a clearer contextualization with prior literature.

      Technical clarifications: Reviewer #2 requested clarifications on specific technical details, such as the skewness thresholds used for sparsity analysis.

      In the final version of the manuscript, we will address these points by adding necessary clarifications to the text—including confirming the animal cohort details—explicitly contrasting the mouse and macaque data to highlight coding differences, and expanding our discussion. We will also ensure all technical inquiries, such as those regarding skewness and reference citations, are fully resolved.

      We believe addressing these points will significantly strengthen the manuscript.

    1. eLife Assessment

      This paper represents a valuable contribution to our understanding of how LFP oscillations and beta band coordination between the hippocampus and prefrontal cortex of rats may relate to learning. Enthusiasm for the reported results was moderated by the concern that some key analyses need to be done, and highly relevant details about task, data, and statistics were missing. Consequently, the reviewers considered the evidence to be incomplete in this version of the manuscript.

    2. Reviewer #1 (Public review):

      Wang, Zhou et al. investigated coordination between the prefrontal cortex (PFC) and the hippocampus (Hp), during reward delivery, by analyzing beta oscillations. Beta oscillations are associated with various cognitive functions, but their role in coordinating brain networks during learning is still not thoroughly understood. The authors focused on the changes in power, peak frequencies, and coherence of beta oscillations in two regions when rats learn a spatial task over days. Inconsistent with the authors' hypothesis, beta oscillations in those two regions during reward delivery were not coupled in spectral or temporal aspects. They were, however, able to show reverse changes in beta oscillations in PFC and Hp as the animal's performance got better. The authors were also able to show a small subset of cell populations in PFC that are modulated by both beta oscillations in PFC and sharp wave ripples in Hp. A similarly modulated cell population was not observed in Hp. These results are valuable in pointing out distinct periods during a spatial task when two regions modulate their activity independently from each other.

      The authors included a detailed analysis of the data to support their conclusions. However, some clarifications would help their presentation, as well as help readers to have a clear understanding.

      (1) The crucial time point of the analysis is the goal entry. However, it needs a better explanation in the methods or in figures of what a goal entry in their behavioral task means.

      (2) Regarding Figure 2, the authors have mentioned in the methods that PFC tetrodes have targeted both hemispheres. It might be trivial, but a supplementary graph or a paragraph about differences or similarities between contralateral and ipsilateral tetrodes to Hp might help readers.

      (3) The authors have looked at changes in burst properties over days of training. For the coincidence of beta bursts between PFC and Hp, is there a change in the coincidence of bursts depending on the day or performance of the animal?

      (4) Regarding the changes in performance through days as well as variance of the beta burst frequency variance (Figures 3C and 4C); was there a change in the number of the beta bursts as animals learn the task, which might affect variance indirectly?

      (5) In the behavioral task, within a session, animals needed to alternate between two wells, but the central arm (1) was in the same location. Did the authors alternate the location of well number 1 between days to different arms? It is possible that having well number 1 in the same location through days might have an effect on beta bursts, as they would get more rewards in well number 1?

      (6) The animals did not increase their performance in the F maze as much as they increased it in the Y maze. It would be more helpful to see a comparison between mazes in Figure 5 in terms of beta burst timing. It seems like in Y maze, unrewarded trials have earlier beta bursts in Y maze compared to F maze. Also, is there a difference in beta burst frequencies of rewarded and unrewarded trials?

      (7) For individual cell analysis, the authors recorded from Hp and the behavioral task involved spatial learning. It would be helpful to readers if authors mention about place field properties of the cells they have recorded from. It is known that reward cells firing near reward locations have a higher rate to participate in a sharp wave ripple. Factoring in the place field properties of the cells into the analysis might give a clearer picture of the lack of modulation of HP cells by beta and sharp wave ripples.

    3. Reviewer #2 (Public review):

      (1) When presenting the power spectra for the representative example (Figure 1), it would be appropriate to display a broader frequency band-including delta, theta, and gamma (up to ~100 Hz), rather than only the beta band. What was the rat's locomotor state (e.g., running speed) after entering the reward location, during which the LFPs were recorded? If the rats stopped at the goal but still consumed the reward (i.e., exhibited very low running speed), theta rhythms might still occasionally occur, and sharp-wave ripples (SWRs) could be observed during rest. Do beta bursts also occur during navigation prior to goal entry? It would be beneficial to display these rhythmic activities continuously across both the navigation and goal entry phases. Additionally, given that the hippocampal theta rhythm is typically around 7-8 Hz, while a peak at approximately 15-16 Hz is visible in the power spectra in Figure 1C, the authors should clarify whether the 22 Hz beta activity represents a genuine oscillation rather than a harmonic of the theta rhythm.

      (2) The authors claim that beta activity is independent between CA1 and PFC, based on the low coherence between these regions. However, it is challenging to discern beta-specific coherence in CA1; instead, coherence appears elevated across a broader frequency band (Figure 2 and Figure 2-1D). An alternative explanation could be that the uncoupled beta between CA1 and PFC results from low local beta coherence within CA1 itself.

      (3) In Figure 2-1E-F, visual inspection of the box plots reveals minimal differences between PFC-Ind and PFC-Coin/CA1-Coin conditions, despite reported statistical significance. It may be necessary to verify whether the significance arises from a large sample size.

      (4) In Figure 3 and Figure 4, although differences in power and frequency appear to change significantly across days, these changes are not easily discernible by visual inspection. It is worth considering whether these variations are related to increased task familiarity over days, potentially accompanied by higher running speeds.

      (5) The stronger spiking modulation by local beta oscillations shown in Figure 6 could also be interpreted in the context of uncoupled beta between CA1 and PFC. In this analysis, only spikes occurring during beta bursts should be included, rather than all spikes within a trial. The authors should verify the dataset used and consider including a representative example illustrating beta modulation of single-unit spiking.

      (6) As observed in Figure 7D, CA1 beta bursts continue to occur even after 2.5 seconds following goal entry, when SWRs begin to emerge. Do these oscillations alternate over time, or do they coexist with some form of cross-frequency coupling?

    4. Reviewer #3 (Public review):

      Summary:

      This paper explored the role of beta rhythms in the context of spatial learning and mPFC-hippocampal dynamics. The authors characterized mPFC and hippocampal beta oscillations, examining how their coordination and their spectral profiles related to learning and prefrontal neuronal firing. Rats performed two tasks, a Y-maze and an F-maze, with the F-maze task being more cognitively demanding. Across learning, prefrontal beta oscillation power increased while beta frequency decreased. In contrast, hippocampal beta power and beta frequency decreased. This was particularly the case for the well-performed and well-learned Y-maze paradigm. The authors identified the timing of beta oscillations, revealing an interesting shift in beta burst timing relative to reward entry as learning progressed. They also discovered an interesting population of prefrontal neurons that were tuned to both prefrontal beta and hippocampal sharp-wave ripple events, revealing a spectrum of SWR-excited and SWR-inhibited neurons that were differentially phase locked to prefrontal beta rhythms.

      In sum, the authors set out to examine how beta rhythms and their coordination were related to learning and goal occupancy. The authors identified a set of learning and goal-related correlates at the level of LFP and spike-LFP interactions, but did not report on spike-behavioral correlates.

      Strengths:

      Pairing dual recordings of medial prefrontal cortex (mPFC) and CA1 with learning of spatial memory tasks is a strength of this paper. The authors also discovered an interesting population of prefrontal neurons modulated by both beta and CA1 sharp-wave ripple (SWR) events, showing a relationship between SWR-excited and SWR-inhibited neurons and beta oscillation phase.

      Weaknesses:

      The authors report on a task where rats were performing sub-optimally (F-maze), weakening claims. Likewise, it is questionable as to whether mPFC and hippocampus are dually required to perform a no-delay Y-maze task at day 5, where rats are performing near 100%. There would be little reason to suspect strong oscillatory coupling when task performance is poor and/or independent of mPFC-HPC communication (Jones and Wilson, 2005), potentially weakening conclusions about independent beta rhythms. Moreover, there is little detail provided about sample sizes and how data sampling is being performed (e.g., rats, sessions, or trials), raising generalizability concerns.

    5. Author response:

      Public Reviews:.

      Reviewer #1 (Public review):

      Wang, Zhou et al. investigated coordination between the prefrontal cortex (PFC) and the hippocampus (Hp), during reward delivery, by analyzing beta oscillations. Beta oscillations are associated with various cognitive functions, but their role in coordinating brain networks during learning is still not thoroughly understood. The authors focused on the changes in power, peak frequencies, and coherence of beta oscillations in two regions when rats learn a spatial task over days. Inconsistent with the authors' hypothesis, beta oscillations in those two regions during reward delivery were not coupled in spectral or temporal aspects. They were, however, able to show reverse changes in beta oscillations in PFC and Hp as the animal's performance got better. The authors were also able to show a small subset of cell populations in PFC that are modulated by both beta oscillations in PFC and sharp wave ripples in Hp. A similarly modulated cell population was not observed in Hp. These results are valuable in pointing out distinct periods during a spatial task when two regions modulate their activity independently from each other.

      The authors included a detailed analysis of the data to support their conclusions. However, some clarifications would help their presentation, as well as help readers to have a clear understanding.

      (1) The crucial time point of the analysis is the goal entry. However, it needs a better explanation in the methods or in figures of what a goal entry in their behavioral task means.

      We appreciate Reviewer 1 pointing out this shortcoming and will clarify the description in the revised manuscript. Each goal is located at the end of the arm, and is equipped with a reward delivery unit. The unit has an infrared sensor. The rat breaks the infrared beam when it enters the goal.

      (2) Regarding Figure 2, the authors have mentioned in the methods that PFC tetrodes have targeted both hemispheres. It might be trivial, but a supplementary graph or a paragraph about differences or similarities between contralateral and ipsilateral tetrodes to Hp might help readers.

      We will provide the requested analysis in the full revision. We saw both hemispheres had similar properties.

      (3) The authors have looked at changes in burst properties over days of training. For the coincidence of beta bursts between PFC and Hp, is there a change in the coincidence of bursts depending on the day or performance of the animal?

      We will provide the requested analysis in the full revision.

      (4) Regarding the changes in performance through days as well as variance of the beta burst frequency variance (Figures 3C and 4C); was there a change in the number of the beta bursts as animals learn the task, which might affect variance indirectly?

      The analysis we can do here is to control for differences in the number of bursts for each category (days/performance quintile) by resampling the data to match the burst count between categories.

      (5) In the behavioral task, within a session, animals needed to alternate between two wells, but the central arm (1) was in the same location. Did the authors alternate the location of well number 1 between days to different arms? It is possible that having well number 1 in the same location through days might have an effect on beta bursts, as they would get more rewards in well number 1?

      The central arm remained the same across days since we needed the animals to learn the alternation task. In our experience, the animal needs a few days to learn the alternation rule when we switch the central arm location. For this experiment, we were interested in the initial learning process, and we kept the central constant. Switching the central arm location is a great suggestion for a follow up experiment where we can understand the effects of reward contingency change has on beta bursts.

      (6) The animals did not increase their performance in the F maze as much as they increased it in the Y maze. It would be more helpful to see a comparison between mazes in Figure 5 in terms of beta burst timing. It seems like in Y maze, unrewarded trials have earlier beta bursts in Y maze compared to F maze. Also, is there a difference in beta burst frequencies of rewarded and unrewarded trials?

      We will add this analysis in the revised manuscript.

      (7) For individual cell analysis, the authors recorded from Hp and the behavioral task involved spatial learning. It would be helpful to readers if authors mention about place field properties of the cells they have recorded from. It is known that reward cells firing near reward locations have a higher rate to participate in a sharp wave ripple. Factoring in the place field propertiesd of the cells into the analysis might give a clearer picture of the lack of modulation of HP cells by beta and sharp wave ripples.

      This is a great suggestion, and we will address this in the full revision.

      Reviewer #2 (Public review):

      We thank Reviewer 2 for their helpful comments and will address these in full in the revision. These are great suggestions to provide greater detail on the spectral and behavioral data at the goal.

      (1) When presenting the power spectra for the representative example (Figure 1), it would be appropriate to display a broader frequency band-including delta, theta, and gamma (up to ~100 Hz), rather than only the beta band.

      We will show more examples of power spectra with a wider frequency range. We did examine the wider spectra and noticed power in the beta frequency band was more prominent than others.

      What was the rat's locomotor state (e.g., running speed) after entering the reward location, during which the LFPs were recorded?

      We will add the time aligned speed profile to the spectra and raw data examples. Because goal entry is defined as the time the animals break the infrared beam at the goal (response to Reviewer 1), the rat would have come to a stop.

      If the rats stopped at the goal but still consumed the reward (i.e., exhibited very low running speed), theta rhythms might still occasionally occur, and sharp-wave ripples (SWRs) could be observed during rest.

      We typically find low theta power in the hippocampus after the animal reaches the goal location and as it consumes reward. Reviewer 2 is correct about occasional theta power at the goal. We have observed this but mostly before the animal leaves the goal location. We did find SWRs during goal periods. One example is shown in Fig. 7A.

      Do beta bursts also occur during navigation prior to goal entry?

      We did not find consistent beta bursts in PFC or CA1 on approach to goal entry. We can provide the analyses in our full revision. In our initial exploratory analysis, we found beta bursts was most prominent after goal entry, which led us to focus on post-goal entry beta for this manuscript. However, beta oscillations in the hippocampus during locomotion or exploration has been reported (Ahmed & Mehta, 2012; Berke et al., 2008; França et al., 2014; França et al., 2021; Iwasaki et al., 2021; Lansink et al., 2016; Rangel et al., 2015).

      It would be beneficial to display these rhythmic activities continuously across both the navigation and goal entry phases. Additionally, given that the hippocampal theta rhythm is typically around 7-8 Hz, while a peak at approximately 15-16 Hz is visible in the power spectra in Figure 1C, the authors should clarify whether the 22 Hz beta activity represents a genuine oscillation rather than a harmonic of the theta rhythm.

      To ensure we fully address this concern, we can provide further spectral analysis in our revised manuscript to show theta power in CA1 is reduced after goal entry. We were initially concerned about the possibility that the 22Hz power in CA1 may be a harmonic rather than a standalone oscillation band. If these are harmonics of theta, we should expect to find coincident theta at the time of bursts in the beta frequency. In Fig. 1B, Fig. 2A, we show examples of the raw LFP traces from CA1. Here, the detected bursts are not accompanied by visible theta frequency activity. For PFC, we do not always see persistent theta frequency oscillations like CA1. In PFC, we found beta bursts were frequent and visually identifiable when examining the LFP. We provided examples of the PFC LFP (Fig. 1B, Fig. 1-1, and Fig. 2A). In these cases, we see clear beta frequency oscillations lasting several cycles and these are not accompanied by any oscillations in the theta frequency in the LFP trace.

      (2) The authors claim that beta activity is independent between CA1 and PFC, based on the low coherence between these regions. However, it is challenging to discern beta-specific coherence in CA1; instead, coherence appears elevated across a broader frequency band (Figure 2 and Figure 2-1D). An alternative explanation could be that the uncoupled beta between CA1 and PFC results from low local beta coherence within CA1 itself.

      This is a legitimate concern, and we used three methods to characterize coherence and coordination between the two regions. First, we calculated coherence for tetrode pairs for times when the animal was at goals (Fig. 2B), which provides a general estimation of coherence across frequencies but lack any temporal resolution. Second, we calculated burst aligned coherence (Fig. 2-1), which provides temporal resolution relative to the burst, but the multi-taper method is constrained by the time-frequency resolution trade off. Third, we quantified the timing between the burst peaks (Fig. 2D), which will describe timing differences but the peaks for the bursts may not be symmetric. Thus, each method has its own caveats, but we drew our conclusion from the combination of results from these three analyses, which pointed to similar conclusions.

      Reviewer 2 is correct in pointing out the uniformly high coherence within CA1 across the frequency range we examined. When we inspected the raw LFP across multiple tetrodes in CA1, they were similar to each other (Fig. 2A). This likely reflects the uniformity in the LFP across recording sites in CA1, which is what we saw with coherence values across the frequency range (Fig. 2B). We found CA1 coherence between tetrode pairs within CA1 across the range, were statistically higher, compared to tetrode pairs in PFC (Fig. 2B and C), thus our results are unlikely to be explained by low beta coherence within CA1 itself. The burst aligned coherence using a multi-taper method also supports this. The coherence values within CA1 at the time of CA1 bursts is ~0.8-0.9.

      (3) In Figure 2-1E-F, visual inspection of the box plots reveals minimal differences between PFC-Ind and PFC-Coin/CA1-Coin conditions, despite reported statistical significance. It may be necessary to verify whether the significance arises from a large sample size.

      We will include the sample sizes for each of the boxplots, these should be the same as the power comparison in Fig. 2-1 A-C. The LFP within a one second window centered around the bursts are usually very similar, and the multi-taper method will return high coherence values. The p-values from statistical comparisons between the boxes are corrected using the Benjamini-Hochberg method.

      (4) In Figure 3 and Figure 4, although differences in power and frequency appear to change significantly across days, these changes are not easily discernible by visual inspection. It is worth considering whether these variations are related to increased task familiarity over days, potentially accompanied by higher running speeds.

      We agree with Reviewer 2 that familiarity increases across days, and the animal is likely running faster. The analysis for Fig. 3 and 4 includes only data from periods when the animal was at the goal and was not moving. We used linear mixed effects models to quantify the relationship between power, frequency and day or behavioral quintile.

      (5) The stronger spiking modulation by local beta oscillations shown in Figure 6 could also be interpreted in the context of uncoupled beta between CA1 and PFC. In this analysis, only spikes occurring during beta bursts should be included, rather than all spikes within a trial. The authors should verify the dataset used and consider including a representative example illustrating beta modulation of single-unit spiking.

      We agree with Reviewer 2 that the stronger modulation to local beta is another piece of evidence indicating uncoupled beta between the two regions. We appreciate this suggestion and will add examples illustrating beta modulation for single units. We want to clarify the spikes were only from periods when the animal is at the goal location on each trial and does not include the running period between goals.

      (6) As observed in Figure 7D, CA1 beta bursts continue to occur even after 2.5 seconds following goal entry, when SWRs begin to emerge. Do these oscillations alternate over time, or do they coexist with some form of cross-frequency coupling?

      This is a very interesting and helpful suggestion. Although we found SWRs generally appear later than beta bursts, it is possible the two are related on a finer timescale pointing to coordination. Our cross-correlation analysis between PFC and CA1 beta bursts only showed the relationship on the timescale of seconds. We will show a higher time-resolution version of this analysis in the revision.

      Reviewer #3 (Public review):

      Summary:

      This paper explored the role of beta rhythms in the context of spatial learning and mPFC-hippocampal dynamics. The authors characterized mPFC and hippocampal beta oscillations, examining how their coordination and their spectral profiles related to learning and prefrontal neuronal firing. Rats performed two tasks, a Y-maze and an F-maze, with the F-maze task being more cognitively demanding. Across learning, prefrontal beta oscillation power increased while beta frequency decreased. In contrast, hippocampal beta power and beta frequency decreased. This was particularly the case for the well-performed and well-learned Y-maze paradigm. The authors identified the timing of beta oscillations, revealing an interesting shift in beta burst timing relative to reward entry as learning progressed. They also discovered an interesting population of prefrontal neurons that were tuned to both prefrontal beta and hippocampal sharp-wave ripple events, revealing a spectrum of SWR-excited and SWR-inhibited neurons that were differentially phase locked to prefrontal beta rhythms.

      In sum, the authors set out to examine how beta rhythms and their coordination were related to learning and goal occupancy. The authors identified a set of learning and goal-related correlates at the level of LFP and spike-LFP interactions, but did not report on spike-behavioral correlates.

      Strengths:

      Pairing dual recordings of medial prefrontal cortex (mPFC) and CA1 with learning of spatial memory tasks is a strength of this paper. The authors also discovered an interesting population of prefrontal neurons modulated by both beta and CA1 sharp-wave ripple (SWR) events, showing a relationship between SWR-excited and SWR-inhibited neurons and beta oscillation phase.

      Weaknesses:

      Moreover, there is little detail provided about sample sizes and how data sampling is being performed (e.g., rats, sessions, or trials), raising generalizability concerns.

      We appreciate Reviewer 3’s thoughtful suggestions for making our claims convincing. We will include information about sample sizes and address each detailed recommendation in the revised manuscript.

      The authors report on a task where rats were performing sub-optimally (F-maze), weakening claims.

      Our experiment was designed to allow us to examine within the same animal, a well-performed task (Y) and a less well-performed task (F). This contrast allows us to determine differences in neural correlates. We can further dissect the relevant differences to take advantage of this experiment design.

      Likewise, it is questionable as to whether mPFC and hippocampus are dually required to perform a no-delay Y-maze task at day 5, where rats are performing near 100%.

      We agree with Reviewer 3 that the mPFC and hippocampus may not be required when the animal reaches stable performance on day 5 (Deceuninck & Kloosterman, 2024). The data we collected spans the full range of early learning (day 1) to proficiency (day 5). We wanted to understand the dynamics of beta across these learning stages.

      Recent studies suggest mPFC and hippocampus are likely to be needed, in some capacity, for learning continuous spatial alternation tasks on a range of maze geometries. Lesions, inactivation or waking activity perturbation of hippocampus or hippocampus and mPFC on the W maze alternation task slowed learning (Jadhav et al., 2012; Kim & Frank, 2009; Maharjan et al., 2018). More recently, optogenetic silencing of mPFC after sharp wave ripples on the Y maze alternation affected performance when the center arm was switched (den Bakker et al., 2023). The Y and F mazes in our study both share the continuous alternation rule, where the animal needed to avoid visiting a previously visited location on the outbound choice relative to the center, and always return to the center location.

      Further, the performance characteristics on the outbound and inbound components of our Y task is similar to the W task. We have analyzed the “inbound” and “outbound” performance of the animals on the Y maze alternation task, and they are similar to the W maze alternation task. The “inbound” or reference location component is learned quickly whereas the ”outbound”, alternation component is learned slowly. We can add this analysis to the revised manuscript.

      There would be little reason to suspect strong oscillatory coupling when task performance is poor and/or independent of mPFC-HPC communication (Jones and Wilson, 2005) potentially weakening conclusions about independent beta rhythms.

      Although many studies have examined the oscillatory coupling properties at the theta frequency between mPFC-HPC (Hyman et al., 2005; Jones & Wilson, 2005; Siapas et al., 2005), our understanding of beta frequency coordination between the two regions is less established, especially at goal locations. Beta frequency coordination at goal locations may or may not follow similar properties to theta frequency coupling. In this manuscript we are reporting the properties of goal-location beta frequency activity in mPFC-HPC networks. We are not aware of prior work describing these properties at this stage of a spatial navigation task, especially their coordination in time.

      References

      Ahmed, O. J., & Mehta, M. R. (2012). Running speed alters the frequency of hippocampal gamma oscillations. J Neurosci, 32(21), 7373-7383. https://doi.org/10.1523/JNEUROSCI.5110-11.2012

      Berke, J. D., Hetrick, V., Breck, J., & Greene, R. W. (2008). Transient 23-30 Hz oscillations in mouse hippocampus during exploration of novel environments. Hippocampus, 18(5), 519-529. https://doi.org/10.1002/hipo.20435

      Deceuninck, L., & Kloosterman, F. (2024). Disruption of awake sharp-wave ripples does not affect memorization of locations in repeated-acquisition spatial memory tasks. Elife, 13. https://doi.org/10.7554/eLife.84004

      den Bakker, H., Van Dijck, M., Sun, J. J., & Kloosterman, F. (2023). Sharp-wave-ripple-associated activity in the medial prefrontal cortex supports spatial rule switching. Cell Rep, 42(8), 112959. https://doi.org/10.1016/j.celrep.2023.112959

      França, A. S., do Nascimento, G. C., Lopes-dos-Santos, V., Muratori, L., Ribeiro, S., Lobão-Soares, B., & Tort, A. B. (2014). Beta2 oscillations (23-30 Hz) in the mouse hippocampus during novel object recognition. Eur J Neurosci, 40(11), 3693-3703. https://doi.org/10.1111/ejn.12739

      França, A. S. C., Borgesius, N. Z., Souza, B. C., & Cohen, M. X. (2021). Beta2 Oscillations in Hippocampal-Cortical Circuits During Novelty Detection. Front Syst Neurosci, 15, 617388. https://doi.org/10.3389/fnsys.2021.617388

      Hyman, J. M., Zilli, E. A., Paley, A. M., & Hasselmo, M. E. (2005). Medial prefrontal cortex cells show dynamic modulation with the hippocampal theta rhythm dependent on behavior. Hippocampus, 15(6), 739-749. https://doi.org/10.1002/hipo.20106

      Iwasaki, S., Sasaki, T., & Ikegaya, Y. (2021). Hippocampal beta oscillations predict mouse object-location associative memory performance. Hippocampus, 31(5), 503-511. https://doi.org/10.1002/hipo.23311

      Jadhav, S. P., Kemere, C., German, P. W., & Frank, L. M. (2012). Awake hippocampal sharp-wave ripples support spatial memory. Science (New York, N.Y.), 336(6087), 1454-1458. https://doi.org/10.1126/science.1217230

      Jones, M. W., & Wilson, M. A. (2005). Theta Rhythms Coordinate Hippocampal–Prefrontal Interactions in a Spatial Memory Task. PLoS Biology, 3(12). https://doi.org/10.1371/journal.pbio.0030402

      Kim, S. M., & Frank, L. M. (2009). Hippocampal Lesions Impair Rapid Learning of a Continuous Spatial Alternation Task. PLoS ONE, 4(5). https://doi.org/10.1371/journal.pone.0005494

      Lansink, C. S., Meijer, G. T., Lankelma, J. V., Vinck, M. A., Jackson, J. C., & Pennartz, C. M. (2016). Reward Expectancy Strengthens CA1 Theta and Beta Band Synchronization and Hippocampal-Ventral Striatal Coupling. J Neurosci, 36(41), 10598-10610. https://doi.org/10.1523/JNEUROSCI.0682-16.2016

      Maharjan, D. M., Dai, Y. Y., Glantz, E. H., & Jadhav, S. P. (2018). Disruption of dorsal hippocampal - prefrontal interactions using chemogenetic inactivation impairs spatial learning. Neurobiol Learn Mem, 155, 351-360. https://doi.org/10.1016/j.nlm.2018.08.023

      Rangel, L. M., Chiba, A. A., & Quinn, L. K. (2015). Theta and beta oscillatory dynamics in the dentate gyrus reveal a shift in network processing state during cue encounters. Front Syst Neurosci, 9, 96. https://doi.org/10.3389/fnsys.2015.00096

      Siapas, A. G., Lubenov, E. V., & Wilson, M. A. (2005). Prefrontal Phase Locking to Hippocampal Theta Oscillations. Neuron, 46(1), 141-151. https://doi.org/10.1016/j.neuron.2005.02.028.

    1. eLife Assessment

      This important study uncovers a previously unrecognized light-responsive pathway in C. elegans that depends on live food bacteria and is mediated by the bZIP factors ZIP-2/CEBP-2 and the cytochrome P450 enzyme, CYP-14A5. The authors show that this bacteria-linked pathway modulates long-term memory and can be harnessed as a low-cost light-inducible expression system, opening new directions for sensory biology and genetic engineering in worms. The exact means by which live bacteria modulate light signal that activates ZIP-2/CEBP-2 in the worm remains to be elucidated. The evidence supporting the pathway's role uses multiple genetic, transcriptional, and behavioural assays, and is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to understand how animals respond to visible light in an animal without eyes. To do so they used the C. elegans model, which lacks eyes, but nonetheless exhibits robust responses to visible light at several wavelengths. Here, the authors report a promoter that is activated by visible light and independent of known pathways of light resposnes.

      Strengths:

      The authors convincingly demonstrate that visible light activates the expression of the cyp-14A5 promoter driven gene expression in a variety of contexts and report the finding that this pathway is activated via the ZIP-2 transcriptionally regulated signaling pathway.

      Weaknesses:

      Because the ZIP-2 pathway has been reported to activated predominantly by changes in the bacterial food source of C. elegans -- or exposure of animals to pathogens -- it remains unclear if visible light activates a pathway in C. elegans (animals) or if visible light potentially is sensed by the bacteria on the plate which also lack eyes. Specifically, it is possible that the the plates are seeded with excess E. coli, that E. coli is altered by light in some way and in this context alters its behavior in such a way that activates a known bacterially responsive pathway in the animals. Consistent with this possibility the authors found that heat-killed bacteria prevented the reporter activation in animals. This weakness would not affect the ability to use this novel discovery as a tool, which would still be useful to the field.

    3. Reviewer #2 (Public review):

      Summary:

      Ji, Ma and colleagues report the discovery of a mechanism in C. elegans that mediates transcriptional responses to low intensity light stimuli. They find that light-induced transcription requires a pair of bZIP transcription factors and induces expression of a cytochrome P450 effector. This unexpected light-sensing mechanism is required for physiologically relevant gene expression that controls behavioral plasticity. The authors further show that this mechanism can be co-opted to create light-inducible transgenes.

      Strengths:

      The authors rigorously demonstrate that ambient light stimuli regulate gene expression via a mechanism that requires the bZIP factors ZIP-2 and CEBP-2. Transcriptional responses to light stimuli are measured using transgenes and using measurements of endogenous transcripts. The study shows proper genetic controls for these effects. The study shows that this light-response does not require known photoreceptors, is tuned to specific wavelengths, and is highly unlikely to be an artifact of temperature-sensing. The study further shows that the function of ZIP-2 and CEBP-2 in light-sensing can be distinguished from their previously reporter role in mediating transcriptional responses to pathogenic bacteria. The study includes experiments that demonstrate that regulatory motifs from a known light-response gene can be used to confer light-regulated gene expression, demonstrating sufficiency and suggesting an application of these discoveries in engineering inducible transgenes. Finally, the study shows that ambient light and the transcription factors that transduce it into gene expression changes are required to stabilize a learned olfactory behavior, suggesting a physiological function for this mechanism.

      Weaknesses:

      The study implies but does not show that the effects of ambient light on stabilizing a learned olfactory behavior are through the described pathway. To show this clearly, the authors should determine whether ambient light has any further effects on learning in mutants lacking CYP-14A5, ZIP-2, or CEBP-2.

    4. Reviewer #3 (Public review):

      Ji et al. report a novel and interesting light-induced transcriptional response pathway in the eyeless roundworm Caenorhabditis elegans that involves a cytochrome P450 family protein (CYP-14A5) and functions independently from previously established photosensory mechanisms. The authors also demonstrate the potential for this pathway to enable robust light-induced control of gene expression and behavior, albeit with some restrictions. Despite the limitations of this tool, including those presented by the authors, it could prove useful for the community. Overall, the evidence supporting the claims of the authors is convincing, and the authors' work suggests numerous interesting lines of future inquiry.

      (1) Although the exact mechanisms underlying photoactivation of this pathway remain unclear, light-dependent induction of CYP-14A5 requires bZIP transcription factors ZIP-2 and CEBP-2 that have been previously implicated in worm responses to pathogens. Notably, this light response requires live food bacteria, suggesting a microbial contribution to this phenomenon. The nature of the microbial contribution to the light response is unknown but very interesting.

      (2) The authors suggest that light-induced CYP-14A5 activity in the C. elegans hypoderm can unexpectedly and cell-non-autonomously contribute to retention of an olfactory memory. How retention of the olfactory memory is enhanced by light generally remains unclear. Additional experiments, including verification of light-dependent changes in CYP-14A5 levels in the olfactory memory behavioral setup, appropriate would help further interpret these otherwise interesting results.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors set out to understand how animals respond to visible light in an animal without eyes. To do so, they used the C. elegans model, which lacks eyes, but nonetheless exhibits robust responses to visible light at several wavelengths. Here, the authors report a promoter that is activated by visible light and independent of known pathways of light responses.

      Strengths:

      The authors convincingly demonstrate that visible light activates the expression of the cyp-14A5 promoter-driven gene expression in a variety of contexts and report the finding that this pathway is activated via the ZIP-2 transcriptionally regulated signaling pathway.

      Weaknesses:

      Because the ZIP-2 pathway has been reported to be activated predominantly by changes in the bacterial food source of C. elegans -- or exposure of animals to pathogens -- it remains unclear if visible light activates a pathway in C. elegans (animals) or if visible light potentially is sensed by the bacteria on the plate, which also lack eyes. Specifically, it is possible that the plates are seeded with excess E. coli, that E. coli is altered by light in some way, and in this context, alters its behavior in such a way that activates a known bacterially responsive pathway in the animals. This weakness would not affect the ability to use this novel discovery as a tool, which would still be useful to the field, but it does leave some questions about the applicability to the original question of how animals sense light in the absence of eyes.

      Thank you for the insightful questions and suggestions. We have now performed a key experiment requested. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. We now include this interesting new result in the paper and revised discussion on the bacteria-modulated mechanism but note that this bacterial requirement does not alter the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity likely influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the intrinsic regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50. Accordingly, we have revised the Results and Discussion to reflect the appropriate scope.

      Reviewer #2 (Public review):

      Summary:

      Ji, Ma, and colleagues report the discovery of a mechanism in C. elegans that mediates transcriptional responses to low-intensity light stimuli. They find that light-induced transcription requires a pair of bZIP transcription factors and induces expression of a cytochrome P450 effector. This unexpected light-sensing mechanism is required for physiologically relevant gene expression that controls behavioral plasticity. The authors further show that this mechanism can be co-opted to create light-inducible transgenes.

      Strengths:

      The authors rigorously demonstrate that ambient light stimuli regulate gene expression via a mechanism that requires the bZIP factors ZIP-2 and CEBP-2. Transcriptional responses to light stimuli are measured using transgenes and using measurements of endogenous transcripts. The study shows proper genetic controls for these effects. The study shows that this light-response does not require known photoreceptors, is tuned to specific wavelengths, and is highly unlikely to be an artifact of temperature-sensing. The study further shows that the function of ZIP-2 and CEBP-2 in light-sensing can be distinguished from their previously reported role in mediating transcriptional responses to pathogenic bacteria. The study includes experiments that demonstrate that regulatory motifs from a known light-response gene can be used to confer light-regulated gene expression, demonstrating sufficiency and suggesting an application of these discoveries in engineering inducible transgenes. Finally, the study shows that ambient light and the transcription factors that transduce it into gene expression changes are required to stabilize a learned olfactory behavior, suggesting a physiological function for this mechanism.

      Weaknesses:

      The study implies but does not show that the effects of ambient light on stabilizing a learned olfactory behavior are through the described pathway. To show this clearly, the authors should determine whether ambient light has any effect on mutants lacking CYP-14A5, ZIP-2, or CEBP-2. Other minor edits to the text and figures are suggested.

      We appreciate the reviewer’s comment. Our study indeed implies that ambient light stabilizes learned olfactory behavior through effects on the described pathway. Importantly, the existing data already address this point. Mutants lacking CYP-14A5, ZIP-2, or CEBP-2 display impaired olfactory memory even when exposed to ambient light, indicating that these genes are required for the behavioral effect of light. Consistent with this, ambient light robustly induces cyp-14A5p::GFP in wild-type animals but fails to do so in zip-2 and cebp-2 mutants, demonstrating that light-dependent transcriptional activation is blocked upstream in these pathway mutants. Together, these results support the conclusion that ambient light acts through the ZIP-2 → CEBP-2 → CYP-14A5 pathway to stabilize memory. Minor textual and figure revisions have been made where helpful to clarify this point.

      Reviewer #3 (Public review):

      Ji et al. report a novel and interesting light-induced transcriptional response pathway in the eyeless roundworm Caenorhabditis elegans that involves a cytochrome P450 family protein (CYP-14A5) and functions independently from previously established photosensory mechanisms. Although the exact mechanisms underlying photoactivation of this pathway remain unclear, light-dependent induction of CYP-14A5 requires bZIP transcription factors ZIP-2 and CEBP-2 that have been previously implicated in worm responses to pathogens. The authors then suggest that light-induced CYP-14A5 activity in the C. elegans hypoderm can unexpectedly and cell-non-autonomously contribute to retention of an olfactory memory. Finally, the authors demonstrate the potential for this pathway to enable robust light-induced control of gene expression and behavior, albeit with some restrictions. Overall, the evidence supporting the claims of the authors is convincing, and the authors' work suggests numerous interesting lines of future inquiry.

      (1) The authors determine that light, but not several other stressors tested (temperature, hypoxia, and food deprivation), can induce transcription of cyp-15A5. The authors use these experiments to suggest the potential specificity of the induction of CYP-14A5 by light. Given the established relationship between light and oxidative stress and the authors' later identification of ZIP-2, testing the effect of an oxidative stressor or pathogen exposure on transcription of cyp-14A5 would further strengthen the validity of this statement and potentially shed some insight into the underlying mechanisms.

      We appreciate the reviewer’s thoughtful suggestion. We would like to clarify that the “specificity” we refer to is the strong and preferential induction of cyp-14A5 by light among pathogen or detoxification-related genes, rather than an assertion that cyp-14A5 is exclusively light-responsive. This does not preclude the possibility that cyp-14A5 can also be activated under other conditions. Indeed, prior work from the Troemel laboratory has identified cyp-14A5 as one of many pathogen-inducible genes, consistent with its role in stress physiology. Our data show that classical pathogen-responsive genes (e.g., irg-1) are not induced by light, whereas cyp-14A5 is strongly induced, highlighting the selective engagement of this cytochrome P450 by light under the conditions tested. We have revised the text to clarify this point.

      (2) The authors suggest that short-wavelength light more robustly increases transcription of cyp-14A5 compared to equally intense longer wavelengths (Figure 2F and 2G). Here, however, the authors report intensities in lux of wavelengths tested. Measurements of and reporting the specific spectra of the incident lights and their corresponding irradiances (ideally, in some form of mW/mm2 - see Ward et al., 2008, Edwards et al., 2008, Bhatla and Horvitz, 2015, De Magalhaes Filho et al., 2018, Ghosh et al., 2021, among others, for examples) is critical for appropriate comparisons across wavelengths and facilitates cross-checking with previous studies of C. elegans light responses. On a related and more minor note, the authors place an ultraviolet shield in front of a visible light LED to test potential effects of ultraviolet light on transcription of cyp-14A5. A measurement of the spectrum of the visible light LED would help confirm if such an experiment was required. Regardless, the principal conclusions the authors made from these experiments will likely remain unchanged.

      Thank you. We have revised the text to clarify this point. “Using controlled light versus dark conditions, we confirmed the finding from an integrated cyp-14A5p::GFP reporter and observed its robust widespread GFP expression in many tissues induced by moderate-intensity (500-3000 Lux, 16-48 hr duration) LED light exposure (Fig. 1A). The photometric Lux range is approximately 0.1–0.60 mW/cm<sup>2</sup> in radiometric (total radiant power) metric given the spectrum of the LED light source.”

      (3) The authors report an interesting observation that animals exposed to ambient light (~600 lux) exhibit significantly increased memory retention compared to those maintained in darkness (Figure 4). Furthermore, light deprivation within the first 2-4 hours after learning appears to eliminate the effect of light on memory retention. These processes depend on CYP-14A5, loss of which can be rescued by re-expression of cyp-14A5 in mutant animals using a hypoderm-specific- and non-light-inducible- promoter. Taken together, the authors argue convincingly that hypodermal expression of cyp-14A5 can contribute to the retention of the olfactory memory. More broadly, these experiments suggest that cell-non-autonomous signaling can enhance retention of olfactory memory. How retention of the olfactory memory is enhanced by light generally remains unclear. In addition, the authors' experiments in Figure 1B demonstrate - at least by use of the transcriptional reporter - that light-dependent induction of cyp-14A5 transcription at 500 - 1000 lux is minimal and especially so at short duration exposures. Additional experiments, including verification of light-dependent changes in CYP-14A5 levels in the olfactory memory behavioral setup, would help further interpret these otherwise interesting results.

      We thank the reviewer for these thoughtful comments. We agree that understanding how light enhances memory retention at a mechanistic level is an important direction for future work. Regarding the light intensities used in Figure 1B, we would like to clarify that 500–1000 lux does produce a measurable and statistically significant induction of cyp-14A5p::GFP, although the magnitude is lower than that observed at higher intensities. We interpret this modest induction as physiologically relevant: intermediate light levels appear sufficient to engage the CYP-14A5–dependent program required for memory stabilization, whereas stronger light intensities are detrimental to learning and reduce behavioral performance. Thus, the behavioral paradigm uses a light regime that activates the pathway without introducing stress-associated confounders.

      (4) The experiments in Figure 4 nicely validate the usage of the cyp-14A5 promoter as a potential tool for light-dependent induction of gene expression. Despite the limitations of this tool, including those presented by the authors, it could prove useful for the community.

      Thank you and we agree. In addition, we have included in the revised manuscript the single-copy integration strains based on UAS-GAL4 that produced similar results as transgenic strains and will be even more flexible and useful for the community.

      Recommendations for the authors:

      Reviewing Editor Comments:

      While appreciating the quality and presentation of this important study, we had two major concerns that the authors need to address.

      (1) Bacteria-versus-worm origin:

      To rule out a bacterially derived stimulus, we suggest testing whether cyp-14A5p::GFP is inducible without bacteria (or killed bacteria). Checking whether the canonical immune reporters irg-5p::GFP and gst-4p::GFP are also light-inducible will further clarify this point.

      We have now performed the key experiment requested by the reviewers. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. Importantly, this requirement does not alter any of the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50.

      We included the data (Fig. 2D) to show that the canonical immune reporter irg-1p::GFP is not induced by the light condition that robustly induced cyp-14A5p::GFP, and gst-4p::GFP is only very mildly induced (Fig. S1J).

      (2) Pathway-behaviour link:

      The behavioural relevance of the newly described pathway is intriguing, but it needs direct support. Ideally, this would require comparing memory in WT, zip-2-/-, cebp-2-/-, and cyp-14A5-/- under both dark and light conditions. But at the very least, it would require testing if constitutive CYP-14A5 rescue in the dark bypasses the requirement of light.

      We respectfully submit that additional experiments are not required to support the behavioral conclusions. Our model posits that cyp-14A5 is required but not sufficient for memory stabilization, one component within a broader set of light-induced genes. Thus, constitutive hypodermal expression of cyp-14A5 would not be expected to bypass the requirement for ambient light. The existing data are fully consistent with this framework and conclusions of the paper.

      Reviewer #1 (Recommendations for the authors):

      Overall, I think this paper is interesting to the field of C. elegans researchers at a minimum, as a light-inducible gene expression system might have a variety of uses throughout the diverse research paradigms that use this model system. With that said, I have a couple of suggestions that I think would substantially impact the ability to interpret these findings, which might be useful for broader implications of the study.

      (1) Most importantly, the supplemental table of RNA-seq data should likely be updated and discussed further beyond the cyp-14A5 findings. First, the authors report 7,902 genes are differentially expressed in response to light and then break these into upregulated and downregulated genes. But there are only 1,785 upregulated genes and 3,632 downregulated genes. This adds up to 5417 genes, but doesn't match the 7,902 genes reported to change, and I could not find in the text if some other filters were applied that might explain this not adding up.

      Thank you for this helpful comment. We agree that the exact numbers depend on statistical thresholds and are therefore somewhat arbitrary. To avoid implying unwarranted precision, we have revised the text to state that “thousands of genes are differentially regulated by light.”

      (2) Among the upregulated genes in response to light are irg-5, irg-4, irg-6, irg-8, and gst-4. Indeed, all of these well-studied genes (or most) show even more induction by light than cyp-14A5. It is my opinion that this result needs further criticism as there are existing GFP reporters for gst-4 and irg-5 that are similarly well studied to irg-1, which is in the paper (and is not upregulated). In my opinion, the authors should test if they see activation of the irg-4 and gst-4 GFP reporters by light as well. This would not only validate their RNA-seq but might provide more important evidence for the field, as these other reporters are not considered light-inducible previously. If they are, several major studies might be impacted by this.

      Thank you for the comments. We have irg-1p::GFP and gst-4p::GFP in the lab but did not find other reporters for the genes mentioned from CGC. Neither of the two reporters showed light induction (Figs. 2D and S1J) as strongly as cyp-14A5p::GFP. It is possible that irg-1 and gst-4 RNA levels are up-regulated but not reflected in our transgenic reporters that used their promoters to drive GFP expression. Stronger light induction of cyp-14A5p::GFP is unlikely caused by the multi-copy nature of the transgene since newly generated single-copy integration strains based on the UAS-GAL4 system produced similar robust results for light induction (Fig. S1I and see Method).

      (3) Along the same lines, if at least 4 (and likely more) well characterized immune response genes are activated by light and these genes are known to mostly respond to differences in C. elegans bacterial food source/diet, then it stands to reason that maybe in this experimental context the light is not acting on "animals" at all, but rather triggering changes in E. coli (i.e. changing E. coli metabolism or pathogenicity like properties). If true, then perhaps the light affects bacteria in such a way that it activates a previously known bacterial pathogen response mechanism. This should be easy to test by seeing if this reporter is still activated by light in the presence of diverse bacterial diets, which are available from the CGC (CeMBio collection, for example). This is likely very important to the conclusions of the manuscript as it relates to animals sensing light, but might not be as important to the use of this system as a tool.

      Thank you for the insightful questions and suggestions. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. Importantly, this requirement does not alter any of the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50. We have revised the Results and Discussion to reflect the appropriate scope of our study and implications of the new findings.

      (4) Lastly, it seems unlikely that nearly half the C. elegans genome is transcriptionally regulated by light (or nearly half of the detected genes in the RNA-seq results). It seems likely that this list of 7,902 genes contains false positives. I would suggest upping some sort of filter, like moving to padj < 0.01 instead of 0.05, or adding a 4-fold change filter (2-fold and 0.01 still results in near 5000+ genes changing, which might explain the difference in up and down genes just being due to different padj filters. Along these lines, it is worth noting that the padj is generated using DESeq2 it appears and one of the first assumptions of DESeq2 is that the median expressed genes do not change, and there is a normalization. However, if MOST genes do change in expression, then one of the fundamental assumptions of DESeq2 is not valid, and thus would mean it might not be an appropriate analysis tool - perhaps there is some other normalization that could be done before running DESeq2 due to some other noise present in the RNA-seq runs?

      Thank you for this helpful comment. We agree that the exact numbers depend on statistical thresholds and are therefore somewhat arbitrary. To avoid implying unwarranted precision, we have revised the text to state that “thousands of genes are differentially regulated by light.”

      (5) Minor point - I would delete the reference to ER in line 92. While most CYPs do localize to the ER, the images shown are not clearly ER and probably do not have enough resolution to make claims about subcellular localization. To me, it would be easier to just delete this claim as it is not required for the main claims of the manuscript.

      Reference deleted.

      Reviewer #2 (Recommendations for the authors):

      I have one request for clarification that likely requires additional data. Figure 3 shows that ambient light stabilizes learned changes to chemotaxis and further shows that CYP-14A5 has a similar function. The implication is that light promotes CYP-14A5 expression, which somehow promotes memory consolidation. The authors should test whether memory consolidation in cyp-15A5, zip-2, or cebp-2 mutants is no longer affected by ambient light.

      It is also possible to test whether forced expression of CYP14A5 can bypass the effect of 'no light' conditions on memory consolidation.

      Thank you for the comments. We respectfully submit that additional experiments are not required to support the behavioral conclusions. Our model posits that cyp-14A5 is required but not sufficient for memory stabilization, one component within a broader set of light-induced genes. Thus, constitutive hypodermal expression of cyp-14A5 would not be expected to bypass the requirement for ambient light. The existing data are fully consistent with this framework and conclusions of the paper.

      I have several minor suggestions relating to the text and figures.

      (1) In the introduction, the authors assert that little is known about non-visual light sensing and then list many examples of molecular mechanisms of non-visual light-sensing. They should emphasize that non-visual light sensing is important and accomplished by diverse molecular mechanisms.

      Agree and revised accordingly.

      (2) Check spacing between gene names (line 109).

      Corrected.

      (3) There should be a new paragraph break when the uORF experiments are described (line 146).

      Corrected.

      (4) 'Phenoptosis' is an esoteric word. Please define it (line 206).

      Corrected.

      (5) 'p' in the transgene name cyp-14A5p::nlp-22 is in italics, unlike the rest of the manuscript.

      Corrected.

      (6) 'Acknowledgment' should be 'Acknowledgments' (line 384).

      Corrected.

      (7) The color map in panel 1B should have units.

      It was arbitrary unit (now added) to highlight relative not absolute differences.

      (8) In panel 1E, it is confusing to have 'DARK' denoted by reddish bars and 'LIGHT' denoted by bluish bars. Perhaps 'DARK' is black/dark grey and 'LIGHT' is white?

      Corrected.

      (9) In panel 1D, it takes a minute to find the purple diamond. Please mark up the volcano plot to make it easier.

      Corrected.

      Reviewer #3 (Recommendations for the authors):

      The authors generally present convincing experiments detailing interesting results in a well-written manuscript.

      One quick note: the same Bhatla and Horvitz (2015) papers appear to be cited twice [line 52].

      Corrected.

    1. eLife Assessment

      This important study presents a methodologically rigorous framework for stability-guided fine-mapping, extending PICS and generalizing to methods such as SuSiE, supported by comprehensive simulations and functional enrichment analyses. The evidence is now convincing, demonstrating improved causal variant recovery and offering a robust alternative for cross-population fine-mapping. The approach will be of particular interest to statistical geneticists, computational biologists, and biomedical researchers who rely on fine-mapping to interpret genetic association signals.

    2. Reviewer #1 (Public review):

      Aw et al. have proposed that utilizing stability analysis can be useful for fine-mapping of cross populations. In addition, the authors have performed extensive analyses to understand the cases where the top eQTL and stable eQTL are the same or different via functional data.

      Comments on revisions:

      The authors have answered all my concerns.

    3. Reviewer #2 (Public review):

      Aw et al presents a new stability-guided fine-mapping method by extending the previously proposed PICS method. They applied their stability-based method to fine-map cis-eQTLs in the GEUVADIS dataset and compared it against residualization-based approaches. They evaluated the performance of the proposed method using publicly available functional annotations and demonstrated that the variants identified by their stability-based method show enrichment for these functional annotations.

      The authors have substantially strengthened the manuscript by addressing the major concerns raised in the initial review. I acknowledge that they have conducted comprehensive simulation studies to show the performance of their proposed approach and that they have extended their approach to SuSiE ("Stable SuSiE") to demonstrate the broader applicability of the stability-guided principle beyond PICS.

      One remaining question is the interpretation of matching variants with very low stable posterior probabilities (~0), which the authors have analyzed in detail but without fully conclusive findings. I agree with the authors that this event is relatively rare and the current sample size is limited but this might be something to keep in mind for future studies.

    4. Author response:

      The following is the authors’ response to the latest reviews:

      "One remaining question is the interpretation of matching variants with very low stable posterior probabilities (~0), which the authors have analyzed in detail but without fully conclusive findings. I agree with the authors that this event is relatively rare and the current sample size is limited but this might be something to keep in mind for future studies."

      Fine-mapping stabilityon matching variants with very low stable posterior probability

      We thank Reviewer 2 for encouraging us to think more about how low stable posterior probability matching variants can be interpreted. We describe a few plausible interpretations, even though – as Reviewer 2 and we have both acknowledged – our present experiments do not point to a clear and conclusive account.

      One explanation is that the locus captured by the variant might not be well-resolved, in the sense that many correlated variants exist around the locus. Thus, the variant itself is unlikely causal, but the set of variants in high LD with it may contain the true causal variant, or it's possible that the causal variant itself was not sequenced but lies in that locus. A comparison of LD patterns across ancestries at the locus would be helpful here.

      Another explanation rests on the following observation. For a variant to be matching between top and stable PICS and to also have very small stable PP, it has to have the largest PP after residualization on the ALL slice but also have positive PP with gene expression on many other slices. In other words, failing to control for potential confounders shrinks the PP. If one assumes that the matching variant is truly causal, then our observation points to an example of negative confounding (aka suppressor effect). This can occur when the confounders (PCs) are correlated with allele dosage at the causal variant in a different direction than their correlation with gene expression, so that the crude association between unresidualized gene expression and causal variant allele dosage is biased toward 0.

      Although our present study does not allow us to systematically confirm either interpretation – since we found that matching variants were depleted in causal variants in our simulations, violating the second argument, but we also found functional enrichment in analyses of GEUVADIS data though only 17 matching variants with low stable PP were reported – we believe a larger-scale study using larger cohort sizes (at least 1000 individuals per ancestry) and many more simulations (to increase yield of such cases) would be insightful.

      ———

      The following is the authors’ response to the original reviews:

      Reviewer #1:

      Major comments:

      (1) It would be interesting to see how much fine-mapping stability can improve the fine-mapping results in cross-population. One can simulate data using true genotype data and quantify the amount the fine-mapping methods improve utilizing the stability idea.

      We agree, and have performed simulation studies where we assume that causal variants are shared across populations. Specifically, by mirroring the simulation approach described in Wang et al. (2020), we generated 2,400 synthetic gene expression phenotypes across 22 autosomes, using GEUVADIS gene expression metadata (i.e., gene transcription start site) to ensure largely cis expression phenotypes were simulated. We additionally generated 1,440 synthetic gene expression phenotypes that incorporate environmental heterogeneity, to motivate our pursuit of fine-mapping stability in the first place (see Response to Reviewer 2, Comment 6). These are described in Results section “Simulation study”:

      We evaluated the performance of the PICS algorithm, specifically comparing the approach incorporating stability guidance against the residualization approach that is more commonly used — similar to our application to the real GEUVADIS data. We additionally investigated two ways of “combining” the residualization and stability guidance approaches: (1) running stability-guided PICS on residualized phenotypes; (2) prioritizing matching variants returned by both approaches. See Response to Reviewer 2, Comment 5.

      (2) I would be very interested to see how other fine-mapping methods (FINEMAP, SuSiE, and CAVIAR) perform via the stability idea.

      Thank you for this valuable comment. We ran SuSiE on the same set of simulated datasets. Specifically, we ran a version that uses residualized phenotypes (supposedly removing the effects of population structure), and also a version that incorporates stability. The second version is similar to how we incorporate stability in PICS. We investigated the performance of Stable SuSiE in a similar manner to our investigation of PICS. First we compared the performance relative to SuSiE that was run on residualized phenotypes. Motivated by our finding in PICS that prioritizing matching variants improves causal variant recovery, we did the same analysis for SuSiE. This analysis is described in Results section “Stability guidance improves causal variant recovery in SuSiE.”

      We reported overall matching frequencies and causal variant recovery rates of top and stable variants for SuSiE in Figures 2C&D.

      Frequencies with which Stable and Top SuSiE variants match, stratified by the simulation parameters, are summarized in Supplementary File 2C (reproduced for convenience in Response to Reviewer 2, Comment 3). Causal variant recovery rates split by the number of causal variants simulated, and stratified by both signal-to-noise ratio and the number of credible sets included, are reported in Figure 2—figure supplements 16-18. We reproduce Figure 2—figure supplement 18 (three causal variants scenario) below for convenience. Analogous recovery rates for matching versus non-matching top or stable variants are reported in Figure 2—figure supplements 19, 21 and 23.

      (3) I am a little bit concerned about the PICS's assumption about one causal variant. The authors mentioned this assumption as one of their method limitations. However, given the utility of existing fine-mapping methods (FINEMAP and SuSiE), it is worth exploring this domain.

      Thank you for raising this fair concern. We explored this domain, by considering simulations that include two and three causal variants (see Response to Reviewer 2, Comment 3). We looked at how well PICS recovers causal variants, and found that each potential set largely does not contain more than one causal variant (Figure 2—figure supplements 20 and 22). This can be explained by the fact that PICS potential sets are constructed from variants with a minimum linkage disequilibrium to a focal variant. On the other hand, in SuSiE, we observed multiple causal variants appearing in lower credible sets when applying stability guidance (Figure 2—figure supplements 21 and 23). A more extensive study involving more fine-mapping methods and metrics specific to violation of the one causal variant assumption could be pursued in future work.

      Reviewer #2:

      Aw et al. presents a new stability-guided fine-mapping method by extending the previously proposed PICS method. They applied their stability-based method to fine-map cis-eQTLs in the GEUVADIS dataset and compared it against what they call residualization-based method. They evaluated the performance of the proposed method using publicly available functional annotations and claimed the variants identified by their proposed stability-based method are more enriched for these functional annotations.

      While the reviewer acknowledges the contribution of the present work, there are a couple of major concerns as described below.

      Major:

      (1) It is critical to evaluate the proposed method in simulation settings, where we know which variants are truly causal. While I acknowledge their empirical approach using the functional annotations, a more unbiased, comprehensive evaluation in simulations would be necessary to assess its performance against the existing methods.

      Thank you for this point. We agree. We have performed a simulation study where we assume that causal variants are shared across populations (see response to Reviewer 1, Comment 1). Specifically, by mirroring the simulation approach described in Wang et al. (2020), we generated 2,400 synthetic gene expression phenotypes across 22 autosomes, using GEUVADIS gene expression metadata (i.e., gene transcription start site) to ensure cis expression phenotypes were simulated.

      (2) Also, simulations would be required to assess how the method is sensitive to different parameters, e.g., LD threshold, resampling number, or number of potential sets.

      Thank you for raising this point. The underlying PICS algorithm was not proposed by us, so we followed the default parameters set (LD threshold, r<sup>2</sup> \= 0.5; see Taylor et al., 2021 Bioinformatics) to focus on how stability considerations will impact the existing fine-mapping algorithm. We attempted to derive the asymptotic joint distribution of the p-values, but it was too difficult. Hence, we used 500 permutations because such a large number would allow large-sample asymptotics to kick in. However, following your critical suggestion we varied the number of potential sets in our analyses of simulated data. We briefly mention this in the Results.

      “In the Supplement, we also describe findings from investigations into the impact of including more potential sets on matching frequency and causal variant recovery…”

      A detailed write-up is provided in Supplementary File 1 Section S2 (p.2):

      “The number of credible or potential sets is a parameter in many fine-mapping algorithms. Focusing on stability-guided approaches, we consider how including more potential sets for stable fine-mapping algorithms affects both causal variant recovery and matching frequency in simulations…

      Causal variant recovery. We investigate both Stable PICS and Stable SuSiE. Focusing first on simulations with one causal variant, we observe a modest gain in causal variant recovery for both Stable PICS and Stable SuSiE, most noticeably when the number of sets was increased from 1 to 2 under the lowest signal-to-noise ratio setting…”

      We observed that increasing the number of potential sets helps with recovering causal variants for Stable PICS (Figure 2—figure supplements 13-15). This observation also accounts for the comparable power that Stable PICS has with SuSiE in simulations with low signal-to-noise ratio (SNR), when we increase the number of credible sets or potential sets (Figure 2—figure supplements 10-12).

      (3) Given the previous studies have identified multiple putative causal variants in both GWAS and eQTL, I think it's better to model multiple causal variants in any modern fine-mapping methods. At least, a simulation to assess its impact would be appreciated.

      We agree. In our simulations we considered up to three causal variants in cis, and evaluated how well the top three Potential Sets recovered all causal variants (Figure 2—figure supplements 13-15; Figure 2—figure supplement 15). We also reported the frequency of variant matches between Top and Stable PICS stratified by the number of causal variants simulated in Supplementary File 2B and 2C. Note Supplementary File 2C is for results from SuSiE fine-mapping; see Response to Reviewer 1, Comment 2.

      Supplementary File 2B. Frequencies with which Stable and Top PICS have matching variants for the same potential set. For each SNR/ “No. Causal Variants” scenario, the number of matching variants is reported in parentheses.

      Supplementary File 2C. Frequencies with which Stable and Top SuSiE have matching variants for the same credible set. For each SNR/ “No. Causal Variants” scenario, the number of matching variants is reported in parentheses.

      (4) Relatedly, I wonder what fraction of non-matching variants are due to the lack of multiple causal variant modeling.

      PICS handles multiple causal variants by including more potential sets to return, owing to the important caveat that causal variants in high LD cannot be statistically distinguished. For example, if one believes there are three causal variants that are not too tightly linked, one could make PICS return three potential sets rather than just one. To answer the question using our simulation study, we subsetted our results to just scenarios where the top and stable variants do not match. This mimics the exact scenario of having modeled multiple causal variants but still not yielding matching variants, so we can investigate whether these non-matching variants are in fact enriched in the true causal variants.

      Because we expect causal variants to appear in some potential set, we specifically considered whether these non-matching causal variants might match along different potential sets across the different methods. In other words, we compared the stable variant with the top variant from another potential set for the other approach (e.g., Stable PICS Potential Set 1 variant vs Top PICS Potential Set 2 variant). First, we computed the frequency with which such pairs of variants match. A high frequency would demonstrate that, even if the corresponding potential sets do not have a variant match, there could still be a match between non-corresponding potential sets across the two approaches, which shows that multiple causal variant modeling boosts identification of matching variants between both approaches — regardless of whether the matching variant is in fact causal.

      Low frequencies were observed. For example, when restricting to simulations where Top and Stable PICS Potential Set 1 variants did not match, about 2-3% of variants matched between the Potential Set 1 variant in Stable PICS and Potential Sets 2 and 3 variants in Top PICS; or between the Potential Set 1 variant in Top PICS and Potential Sets 2 and 3 variants in Stable PICS (Supplementary File 2D). When looking at non-matching Potential Set 2 or Potential Set 3 variants, we do see an increase in matching frequencies (between 10-20%) between Potential Set 2 variants and other potential set variants between the different approaches. However, these percentages are still small compared to the matching frequencies we observed between corresponding potential sets (e.g., for simulations with one causal variant this was 70-90% between Top and Stable PICS Potential Set 1, and for simulations with two and three causal variants this was 55-78% and 57-79% respectively).

      We next checked whether these “off-diagonal” matching variants corresponded to the true causal variants simulated. Here we find that the causal variant recovery rate is mostly less than the corresponding rate for diagonally matching variants, which together with the low matching frequency suggests that the enrichment of causal variants of “off-diagonal” matching variants is much weaker than in the diagonally matching approach. In other words, the fraction of non-matching (causal) variants due to the lack of multiple causal variant modeling is low.

      We discuss these findings in Supplementary File 1 Section S2 (bottom of p.2).

      (5) I wonder if you can combine the stability-based and the residualization-based approach, i.e., using the residualized phenotypes for the stability-based approach. Would that further improve the accuracy or not?

      This is a good idea, thank you for suggesting it. We pursued this combined approach on simulated gene expression phenotypes, but did not observe significant gains in causal variant recovery (Figure 2B; Figure 2—figure supplements 2, 13 and 15). We reported this Results “Searching for matching variants between Top PICS and Stable PICS improves causal variant Recovery.”

      “We thus explore ways to combine the residualization and stability-driven approaches, by considering (i) combining them into a single fine-mapping algorithm (we call the resulting procedure Combined PICS); and (ii) prioritizing matching variants between the two algorithms. Comparing the performance of Combined PICS against both Top and Stable PICS, however, we find no significant difference in its ability to recover causal variants (Figure 2B)...”

      However, we also confirmed in our simulations that prioritizing matching variants between the two approaches led to gains in causal variant recovery (Figure 2D; Figure 2—figure supplements 4, 19, 20 and 22). We reported this Results “Searching for matching variants between Top PICS and Stable PICS improves causal variant Recovery.”

      “On the other hand, matching variants between Top and Stable PICS are significantly more likely to be causal. Across all simulations, a matching variant in Potential Set 1 is 2.5X as likely to be causal than either a non-matching top or stable variant (Figure 2D) — a result that was qualitatively consistent even when we stratified simulations by SNR and number of causal variants simulated (Figure 2—figure supplements 19, 20 and 22)...”

      This finding is consistent with our analysis of real GEUVADIS gene expression data, where we reported larger functional significance of matching variants relative to non-matching variants returned by either Top of Stable PICS.

      (6) The authors state that confounding in cohorts with diverse ancestries poses potential difficulties in identifying the correct causal variants. However, I don't see that they directly address whether the stability approach is mitigating this. It is hard to say whether the stability approach is helping beyond what simpler post-hoc QC (e.g., thresholding) can do.

      Thank you for raising this fair point. Here is a model we have in mind. Gene expression phenotypes (Y) can be explained by both genotypic effects (G, as in genotypic allelic dosage) and the environment (E): Y = G + E. However, both G and E depend on ancestry (A), so that Y = G|A+E|A. Suppose that the causal variants are shared across ancestries, so that (G|A=a)=G for all ancestries a. Suppose however that environments are heterogeneous by ancestry: (E|A=a) = e(a) for some function e that depends non-trivially on a. This would violate the exchangeability of exogenous E in the full sample, but by performing fine-mapping on each ancestry stratum, the exchangeability of exogenous E is preserved. This provides theoretical justification for the stability approach.

      We next turned to simulations, where we investigated 1,440 simulated gene expression phenotypes capturing various ways in which ancestry induces heterogeneity in the exogenous E variable (simulation details in Lines 576-610 of Materials and Methods). We ran Stable PICS, as well as a version of PICS that did not residualize phenotypes or apply the stability principle. We observed that (i) causal variant recovery performance was not significantly different between the two approaches (Figure 2—figure supplements 24-32); but (ii) disagreement between the approaches can be considerable, especially when the signal-to-noise ratio is low (Supplementary File 2A). For example, in a set of simulations with three causal variants, with SNR = 0.11 and E heterogeneous by ancestry by letting E be drawn from N(2σ,σ<sup>2</sup>) for only GBR individuals (rest are N(0,σ<sup>2</sup>)), there was disagreement between Potential Set 1 and 2 variants in 25% of simulations — though recovery rates were similar (Probability of recovering at least one causal variant: 75% for Plain PICS and 80% for Stable PICS). These points suggest that confounding in cohorts can reduce power in methods not adjusting or accounting for ancestral heterogeneity, but can be remedied by approaches that do so. We report this analysis in Results “Simulations justify exploration of stability guidance”

      In the current version of our work, we have evaluated, using both simulations and empirical evidence, different ways to combine approaches to boost causal variant recovery. Our simulation study shows that prioritizing matching variants across multiple methods improves causal variant recovery. On GEUVADIS data, where we might not know which variants are causal, we already demonstrated that matching variants are enriched for functional annotations. Therefore, our analyses justify that the adverse consequence of confounding on reducing fine-mapping accuracy can be mitigated by prioritizing matching variants between algorithms including those that account for stability.

      (7) For non-matching variants, I wonder what the difference of posterior probabilities is between the stable and top variants in each method. If the difference is small, maybe it is due to noise rather than signal.

      We have reported differences in posterior probabilities returned by Stable and Top PICS for GEUVADIS data; see Figure 3—figure supplement 1. For completeness, we compute the differences in posterior probabilities and summarize these differences both as histograms and as numerical summary statistics.

      Potential Set 1

      - Number of non-matching variants = 9,921

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 1.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 1.

      Potential Set 2

      - Number of non-matching variants = 14,454

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 2.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 2.

      Potential Set 3

      - Number of non-matching variants = 16,814

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 3.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 3.

      We also compared the difference in posterior probabilities between non-matching variants returned by Stable PICS and Top PICS for our 2,400 simulated gene expression phenotypes. Focusing on just Potential Set 1 variants, we find two equally likely scenarios, as demonstrated by two distinct clusters of points in a “posterior probability-posterior probability” plot. The first is, as pointed out, a small difference in posterior probability (points lying close to y=x). The second, however, reveals stable variants with very small posterior probability (of order 4 x 10<sup>–5</sup> to 0.05) but with a non-matching top variant taking on posterior probability well distributed along [0,1]. Moving down to Potential Sets 2 and 3, the distribution of pairs of posterior probabilities appears less clustered, indicating less tendency for posterior probability differences to be small ( Figure 2—figure supplement 8).

      Here are the histograms and numerical summary statistics.

      Potential Set 1

      - Number of non-matching variants = 663 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 4.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 4.

      Potential Set 2

      Number of non-matching variants = 1,429 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 5.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 5.

      Potential Set 3

      - Number of non-matching variants = 1,810 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 6.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 6.

      (8) It's a bit surprising that you observed matching variants with (stable) posterior probability ~ 0 (SFig. 1). What are the interpretations for these variants? Do you observe functional enrichment even for low posterior probability matching variants?

      Thank you for this question. We have performed a thorough analysis of matching variants with very low stable posterior probability, which we define as having a posterior probability < 0.01 (Supplementary File 1 Section S11). Here, we briefly summarize the analysis and key findings.

      Analysis

      First, such variants occur very rarely — only 8 across all three potential sets in simulations, and 17 across all three potential sets for GEUVADIS (the latter variants are listed in Supplementary 2E). We begin interpreting these variants by looking at allele frequency heterogeneity by ancestry, support size — defined as the number of variants with positive posterior probability in the ALL slice* — and the number of slices including the stable variant (i.e., the stable variant reported positive posterior probability for the slice).

      *Note that the stable variant posterior probability need not be at least 1/(Support Size). This is because the algorithm may have picked a SNP that has a lower posterior probability in the ALL slice (i.e., not the top variant) but happens to appear in the most number of other slices (i.e., a stable variant).

      For variants arising from simulations, because we know the true causal variants, we check if these variants are causal. For GEUVADIS fine-mapped variants, we rely on functional annotations to compare their relative enrichment against other matching variants that did not have very low stable posterior probability.

      Findings

      While we caution against generalizing from observations reported here, which are based on very small sample sizes, we noticed the following. In simulations, matching variants with very low stable posterior probability are largely depleted in causal variants, although factors such as the number of slices including the stable variant may still be useful. In GEUVADIS, however, these variants can still be functionally enriched. We reported three examples in Supplementary File 1 Section S11 (pp. 8-9 of Supplement), where the variants were enriched in either VEP or biologically interpretable functional annotations, and were also reported in earlier studies. We partially reproduce our report below for convenience.

      “However, we occasionally found variants that stand out for having large functional annotation scores. We list one below for each potential set.

      - Potential Set 1 reported the variant rs12224894 from fine-mapping ENSG00000255284.1 (accession code AP006621.3) in Chromosome 11. This variant stood out for lying in the promoter flanking region of multiple cell types and being relatively enriched for GC content with a 75bp flanking region. This variant has been reported as a cis eQTL for AP006632 (using whole blood gene expression, rather than lymphoblastoid cell line gene expression in this study) in a clinical trial study of patients with systemic lupus erythematosus (Davenport et al., 2018). Its nearest gene is GATD1, a ubiquitously expressed gene that codes for a protein and is predicted to regulate enzymatic and catabolic activity. This variant appeared in all 6 slices, with a moderate support size of 23.

      - Potential Set 2 reported the variant rs9912201 from fine-mapping ENSG00000108592.9 (mapped to FTSJ3) in Chromosome 17. Its FIRE score is 0.976, which is close to the maximum FIRE score reported across all Potential Set 2 matching variants. This variant has been reported as a SNP in high LD to a GWAS hit SNP rs7223966 in a pan-cancer study (Gong et al., 2018). This variant appeared in all 6 slices, with a moderate support size of 32.

      - Potential Set 3 reported the variant rs625750 from fine-mapping ENSG00000254614.1 (mapped to CAPN1-AS1, an RNA gene) in Chromosome 11. Its FIRE score is 0.971 and its B statistic is 0.405 (region under selection), which lie at the extreme quantiles of the distributions of these scores for Potential Set 3 matching variants with stable posterior probability at least 0.01. Its associated mutation has been predicted to affect transcription factor binding, as computed using several position weight matrices (Kheradpour and Kellis, 2014). This variant appeared in just 3 slices, possibly owing to the considerable allele frequency difference between ancestries (maximum AF difference = 0.22). However, it has a small support size of 4 and a moderately high Top PICS posterior probability of 0.64.

      To summarize, our analysis of GEUVADIS fine-mapped variants demonstrates that matching variants with very low stable posterior probability could still be functionally important, even for lower potential sets, conditional on supportive scores in interpretable features such as the number of slices containing the stable variant and the posterior probability support size…”

    1. eLife Assessment

      This fundamental work reveals that the accessibility of the unstructured C-terminal tails of α- and β-tubulins differs with the state of the microtubule lattice. Their accessibility increases with the expansion of the lattice induced by GTP and certain MAPs, which can then dictate the subsequent interactions between MAPs and microtubules, and post-translational modifications of tubulin tails. The evidence supporting the conclusion is compelling, although the characterisation of the probes does not answer whether they directly affect the lattice or expose the C-terminal tails of tubulin. The probes can be used as tools in the future to study differences in microtubule lattice assembly under different conditions both in vitro and in vivo. This work will be of great interest to the cytoskeleton field.

    2. Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

    3. Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths and weaknesses:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification. Future work is required to distinguish CTT exposure in the microtubule lattice is sensitive to additional factors present in vivo but not in vitro.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      [Editors' note: the authors have responded to the reviewers and this version was assessed by the editors.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Weaknesses:

      There is no information on the status of the beta tubulin CTTs. The study is done with mixed isotype microtubules, both in cells and in vitro. It remains unclear whether all the alpha tubulins in a mixed isotype microtubule lattice behave equivalently, or whether the effect is tubulin isotype-dependent. It remains unclear whether local binding of effectors can locally expand the lattice and locally expose the alpha CTTs.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

      We thank the reviewer for their positive response to our work. We agree that it will be important to determine if the bCTT is subject to regulation similar to the aCTT. However, this will first require the development of sensors that report on the accessibility of the bCTT, which is a significant undertaking for future work.

      We also agree that it will be important to examine whether all tubulin isotypes behave equivalently in terms of exposure of the aCTT in response to conformational switching of the microtubule lattice.

      We thank the reviewer for the comment about local expansion of the microtubule lattice. We believe that Figure 3 does show that local binding of effectors can locally expand the lattice and locally expose the alpha-CTTs. We have added text to clarify this.

      Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      Weaknesses:

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

      We thank the reviewer for their positive response to our work. We are encouraged that the reviewer feels that the Discussion section does a good job of putting the findings, challenges, and possibility of confounding factors and indirect effects in context. 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      Weaknesses:

      There are a number of weaknesses in the paper, many of which can be addressed textually. Some of the supporting evidence is preliminary and would benefit from additional experimental validation and clearer presentation before the conclusions can be considered fully supported. In particular, the authors should directly test in vitro whether Taxol addition can induce lattice exchange (see comments below).

      We thank the reviewer for their positive response to our work. We have altered the text and provided additional experimental validation as requested (see below).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The resolution of the figures is insufficient.

      (2) The provision of scale bars is inconsistent and insufficient.

      (3) Figure 1E, the scale bar looks like an MT.

      (4) Figure 2C, what does the grey bar indicate?

      (5) Figure 2E, missing scale bar.

      (6) Figure 3 C, D, significance brackets misaligned.

      (7) Figure 3E, consider using the same alpha-beta tubulin / MT graphic as in Figure 1B.

      (8) Figure 5E, show cell boundaries for consistency?

      (9) Figure 6D, stray box above the y-axis.

      (11) Figure S3A, scale bar wrong unit again.

      (12) S3B "fixed" and mount missing scale bar in the inset.

      (13) S4 scale bars without scale, inconsistency in scale bars throughout all the figures.

      We apologize for issues with the figures. We have corrected all of the issues indicated by the reviewer.

      (10) Figure 6F, surprising that 300 mM KCL washes out rigor binding kinesin

      We thank the reviewer for this important point. To address the reviewer’s concern, we have added a new supplementary figure (new Figure 6 – Figure Supplement 1) which shows that the washing step removes strongly-bound (apo) KIF5C(1-560)-Halo<sup>554</sup> protein from the microtubules. In addition, we have made a correction to the Materials and Methods section noting that ATP was added in addition to the KCl in the wash buffer. We apologize for omitting this detail in the original submission. We also added text noting that the wash out step was based on Shima et al., 2018 where the observation chamber was washed with either 1 mM ATP and 300 mM K-Pipes or with 10 mM ATP and 500 mM K-Pipes buffer. In our case, the chamber was washed with 3 mM ATP and 300 mM KCl. It is likely that the addition of ATP facilitates the detachment of strongly-bound KIF5C.

      (14) Supplementary movie, please identify alpha and beta tubules for clarity. Please identify residues lighting up in interaction sites 1,2 & 3.

      Thank you for the suggestions. We have made the requested changes to the movie.

      Reviewer #2 (Recommendations for the authors):

      There appear to have been some minor issues (perhaps with .pdf conversion) that leave some text and images pixelated in the .pdf provided, alongside some slightly jarring text and image positioning (e.g., Figure 5E panels). The authors should carefully look at the figures to ensure that they are presented in the clearest way possible.

      We apologize for these issues with the figures. We have reviewed the figures carefully to ensure that they are presented in the clearest way possible.

      The authors might consider providing a more definitive structural description of compact vs expanded lattice, highlighting what specific parameters are generally thought to change and by what magnitude. Do these differ between taxol-mediated expansion or the effects of MAPs?

      Thank you for the suggestion. We have added additional information to the Introduction section.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 should include a schematic overview of all constructs used in the study. A clear illustration showing the probe design, including the origin and function of each component (e.g., tags, domains), would improve clarity.

      Thank you for the suggestion. We have added new illustrations to Figure 1 showing the origin and design (including domains and tags) of each probe.

      (2) Add Western blot data for the 4×CAP-Gly construct to Figure 1C for completeness.

      We thank the reviewer for this suggestion. We carried out a far-western blot using the purified 4xCAPGly-mEGFP protein to probe GST-Y, GST-DY, and GST-DC2 proteins (new Figure 1 – Figure Supplement 1C). We note that some bleed-through signal can be seen in the lanes containing GST-ΔY and GST-ΔC2 protein due to the imaging requirements and exposure needed to visualize the 4xCAPGly-mEGFP protein. Nevertheless, the blot shows that the purified CAPGly sensor specifically recognizes the native (tyrosinated) CTT sequence of TUBA1A.

      (3) Essential background information on the CAP-Gly domain, SXIP motif, and EB proteins is missing from the Introduction. These concepts appear abruptly in the Results and should be properly introduced.

      Thank you for the suggestion. We have added additional information to the Introduction section about the CAP-Gly domain. However, we feel that introducing the SXIP motif and EB proteins at this point would detract from the flow of the Introduction and we have elected to retain this information in the Results section when we detail development of the 4xCAPGly probe.

      (4) In Figure 2E, it remains possible that the CAP-Gly domain displacement simply follows the displacement of EB proteins. An experiment comparing EB protein localization upon Taxol treatment would clarify this relationship.

      We thank the reviewer for raising this important point. To address the reviewer’s concern, we utilized HeLa cells stably expressing EB3-GFP. We performed live-cell imaging before and after Taxol addition (new Figure 2 – Figure Supplement 1C). EB3-EGFP was lost from the microtubule plus ends within minutes and did not localize to the now-expanded lattice.

      (5) Statements such as "significantly increased" (e.g., line 195) should be replaced with quantitative information (e.g., "1.5-fold increase").

      We have made the suggested changes to the text.

      (6) Phrases like "became accessible" should be revised to "became more accessible," as the observed changes are relative, not absolute. The current wording implies a binary shift, whereas the data show a modest (~1.5-fold) increase.

      We have made the suggested changes to the text.

      (7) Similarly, at line 209, the terms "minimally accessible" versus "accessible" should be rephrased to reflect the small relative change observed; saturation of accessibility is not demonstrated.

      We have made the suggested changes to the text.

      (8) Statements that MAP7 "expands the lattice" (line 222) should be made cautiously; to my knowledge, that has not been clearly established in the literature.

      We thank the reviewer for this important comment. We have added text indicating that MAP7’s ability to induce or presence an expanded lattice has not been clearly established.

      (9) In Figures 3 and 4, the overexpression of MAP7 results in a strikingly peripheral microtubule network. Why is there this unusual morphology?

      The reviewer raises an interesting question. We are not sure why the overexpression of MAP7 results in a strikingly peripheral microtubule network but we suspect this is unique to the HeLa cells we are using. We have observed a more uniform MAP7 localization in other cell types [e.g. COS-7 cells (Tymanskyj et al. 2018), consistent with the literature [e.g. BEAS-2B cells (Shen and Ori-McKenney 2024), HeLa cells (Hooikaas et al. 2019)].

      (10) In Supplementary Figure 5C, the Western blot of detyrosination levels is inconsistent with the text. Untreated cells appear to have higher detyrosination than both wild-type and E254A-overexpressing cells. Do you have any explanation?

      We thank the reviewer for this important comment. We do not have an explanation at this point but plan to revisit this experiment. Unfortunately, the authors who carried out this work recently moved to a new institution and it will be several months before they are able to get the cell lines going and repeat the experiment. We thus elected to remove what was Supp Fig 5C until we can revisit the results. We believe that the important results are in what is now Figure 5 - Figure Supplement 1A,B which shows that the expression levels of the WT and E254E proteins are similar to each other.

      (11) The image analysis method in Figures 5B and 5D requires clarification. It appears that "density" was calculated from skeletonized probe length over total area, potentially using a strict intensity threshold. It looks like low-intensity binding has been excluded; otherwise, the density would be the same from the images. If so, this should be stated explicitly. A more appropriate analysis might skeletonize and integrate total fluorescence intensity relative to the overall microtubule network.

      We have added additional information to the Materials and Methods section to clarify the image analysis. We appreciate the reviewer’s valuable feedback and the suggestion to use the integrated total fluorescence intensity, which is a theoretically sound approach. While we agree that integrated intensity is a valid metric for specific applications, its appropriate use depends on two main preconditions:

      (1) Consistent microscopy image acquisition conditions.

      (2) Consistent probe expression levels across all cells and experiments.

      We successfully maintained consistent image acquisition conditions (e.g., exposure time) throughout the experiment. However, despite generating a stably-expressing sensor cell lines to minimize variation, there remains an inherent, biological variability in probe expression levels between individual cells. Integrated intensity is highly susceptible to this cell-to-cell variability. Relying on it would lead to a systematic error where differences in the total amount of expressed probe would be mistaken for differences in Y-aCTT accessibility.

      The density metric (skeletonized probe length / total cell area) was deliberately chosen as it serves as a geometric measure rather than an intensity-based normalization. The density metric quantifies the proportion of the microtubule network that is occupied by Y-aCTT-labeled structures, independent of fluorescence intensity. Thus, the density metric provides a more robust and interpretable measure of Y-aCTT accessibility under the variable expression conditions inherent to our experimental system. Therefore, we believe that this geometric approach represents the most appropriate analysis for our image dataset.

      (12) In Figure 5D, the fold-change data are difficult to interpret due to the compressed scale. Replotting is recommended. The text should also discuss the relative fold changes between E254A and Taxol conditions, Figure 2H.

      We appreciate the reviewer's insightful comment. We agree that the presence of significant outliers led to a compressed Y-axis scale in Figure 5D, obscuring the clear difference between the WT-tubulin and E254A-tubulin groups. As suggested, we have replotted Figure 5D using a broken Y-axis to effectively expand the relevant lower range of the data while still accurately representing all data points, including the outliers. We believe that the revised graph significantly enhances the clarity and interpretability of these results. For Figure 2, we have added the relative fold changes to the text as requested.

      (13) Figure 6. The authors should directly test in vitro whether Taxol addition can induce lattice exchange, for example, by adding Taxol to GDP-microtubules and monitoring probe binding. Including such an assay would provide critical mechanistic evidence and substantially strengthen the conclusions. I was waiting for this experiment since Figure 2.

      We thank the reviewer for this suggestion. As suggested, we generated GDP-MTs from HeLa tubulin and added it to two flow chambers. We then flowed in the YL1/2<sup>Fab</sup>-EGFP probe into the chambers in the presence of DMSO (vehicle control) or Taxol. Static images were taken and the fluorescence intensity of the probe on microtubules in each chamber was quantified. There was a slight but not statistically significant difference in probe binding between control and Taxol-treated GDP-MTs (Author response image 1). While disappointing, these results underscore our conclusion (Discussion section) that microtubule assembly in vitro may not produce a lattice state resembling that in cells, either due to differences in protofilament number and/or buffer conditions and/or the lack of MAPs during polymerization.

      Author response image 1.

      References

      Hooikaas, P. J., Martin, M., Muhlethaler, T., Kuijntjes, G. J., Peeters, C. A. E., Katrukha, E. A., Ferrari, L., Stucchi, R., Verhagen, D. G. F., van Riel, W. E., Grigoriev, I., Altelaar, A. F. M., Hoogenraad, C. C., Rudiger, S. G. D., Steinmetz, M. O., Kapitein, L. C. and Akhmanova, A. (2019). MAP7 family proteins regulate kinesin-1 recruitment and activation. J Cell Biol, 218, 1298-1318.

      Shen, Y. and Ori-McKenney, K. M. (2024). Microtubule-associated protein MAP7 promotes tubulin posttranslational modifications and cargo transport to enable osmotic adaptation. Dev Cell, 59, 1553-1570.

      Tymanskyj, S. R., Yang, B. H., Verhey, K. J. and Ma, L. (2018). MAP7 regulates axon morphogenesis by recruiting kinesin-1 to microtubules and modulating organelle transport. Elife, 7.

    1. eLife Assessment

      This manuscript presents useful insights into the molecular basis underlying the positive cooperativity between the co-transported substrates (galactoside sugar and sodium ion) in the melibiose transporter MelB. Building on years of previous studies, this convincing study improves on the resolution of previously published structures and reports the presence of a water molecule in the sugar binding site that would appear to be key for its recognition, introduces further structures bound to different substrates, and utilizes binding and transport assays, as well as HDX-MS and molecular dynamics simulations to further understand the positive cooperativity between sugar and the co-transported sodium cation. The work will be of interest to biologists and biochemists working on cation-coupled symporters, which mediate the transport of a wide range of solutes across cell membranes.

    2. Reviewer #1 (Public review):

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remains unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen deuterium exchange (HDX) mass spectrometry. They first report 4 different crystal structures of galactose derivatives to explore molecular recognition showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      The results from the crystallography appear sensible, though the resolution of the data is low with only the structure with NPG better than 3Å. Support for the conclusion of the water molecule in the binding site, as interpreted from the density, is given by MD studies.

      The HDX also appears to be well done and is explained reasonably well in the revision.

    3. Reviewer #3 (Public review):

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelBSt) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na⁺, H⁺, or Li⁺, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na+ by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na+ and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26 forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na+, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      The revised manuscript shows clear improvement, and the authors have addressed my concerns in a satisfactory manner. Of note, I noticed two mistakes that should be corrected:

      - page 11. Unless I am mistaken, the sentence "In contrast, Na+ alone or with melibiose primarily caused deprotections" should be corrected with "protections". The authors may wish to verify this sentence and also the previous one in the main text.

      - Figure 8 displays two cytoplasmic gates (one of them should be periplasmic)

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This manuscript presents useful insights into the molecular basis underlying the positive cooperativity between the co-transported substrates (galactoside sugar and sodium ion) in the melibiose transporter MelB. Building on years of previous studies, this work improves on the resolution of previously published structures and reports the presence of a water molecule in the sugar binding site that would appear to be key for its recognition, introduces further structures bound to different substrates, and utilizes HDX-MS to further understand the positive cooperativity between sugar and the co-transported sodium cation. Although the experimental work is solid, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion, as well as a clearer description of the new insight that is obtained in relation to previous studies. The work will be of interest to biologists and biochemists working on cation-coupled symporters, which mediate the transport of a wide range of solutes across cell membranes.

      We express our gratitude to the associate editor, review editor, and reviewers for their favorable evaluation of this manuscript, as well as their constructive comments and encouragement. Their feedback has been integrated to fortify the evidence, refine the data analysis, and elevate the presentation of the results, thereby enhancing the overall quality and clarity of the manuscript.

      A brief summary of the modifications in this revision:

      (a) We performed four new experiments: 1) intact cell [<sup>3</sup>H]raffinose transport assay; 2) intact cell p-nitrophenol detection to demonstrate α-NPG transport; 3) ITC binding assay for the D59C mutant; and 4) molecular dynamics to simulate the water-1 in sugar-binding site and the dynamics of side chains in the Na<sup>+</sup>- and melibiose-binding pockets. All data consistently support the conclusion draw in this article.

      (b) We have added a new figure to show the apo state dynamics (the new Fig. 5a,b) and annotated the amino acid residue positions and marked positions in sugar- or Na<sup>+</sup>-binding pockets.

      (c) As suggested by reviewer-3, we have moved the individual mapping of ligand effects on HDX data to the main figure, combined with the residual plots, and marked the amino-acid residue positions.

      (d) We have added more deuterium uptake plots to cover all residues in the sugar- or Na<sup>+</sup>-binding pockets in the current figure 7 (previously figure 6).

      (e) We have added a new figure 8 showing the positions at the well-studied cytoplasmic gating salt-bridge network and other loops likely important for conformational changes, along with a membrane topology marked with the HDX data. We have added a new figure 9 from MD simulations.

      Reviewer #1:

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remain unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen-deuterium exchange (HDX) mass spectrometry.

      (1) They first report 4 different crystal structures of galactose derivatives to explore molecular recognition, showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      We thank you for understanding what we've presented in this manuscript.

      (2) The results from the crystallography appear sensible, though the resolution of the data is low, with only the structure with NPG better than 3Å. However, it is a bit difficult to understand what novel information is being brought out here and what is known about the ligands. For instance, are these molecules transported by the protein or do they just bind? They measure the affinity by ITC, but draw very few conclusions about how the affinity correlates with the binding modes. Can the protein transport the trisaccharide raffinose?

      The four structures with bound sugars of different sizes were used to identify the binding motif on both the primary substrate (sugar) and the transporter (MelB<sub>St</sub>). Although the resolutions of the structures complexed with melibiose, raffinose, or a-MG are relatively low, the size and shape of the densities at each structure are consistent with the corresponding sugar molecules, which provide valuable data for confirming the pose of the bound sugar proposed previously. In this revision, we further refine the α-NPG-bound structure to 2.60 Å. The identified water-1 in this study further confirms the orientation of C4-OH. Notably, this transporter does not recognize or transport glucosides in which the orientation of the C4-OH at the glucopyranosyl ring is opposite. To verify the water in the sugar-binding site, we initiated a new collaborative study using MD simulations. Results showed that Wat-1 exhibited nearly full occupancy when melibiose was present, regardless of whether Na<sup>+</sup> was bound at the cation-binding site.

      As detailed in the Summary, we added two additional sets of transport assays and confirmed that raffinose and α-NPG are transportable substrates of MelB<sub>St</sub>. For α-NPG transport, we measured the end products of the process—enzyme hydrolysis and membrane diffusion of p-nitrophenol released from intracellular α-NPG.

      As a bonus, based on the WT-like downhill α-NPG transport activity by the D59C uniporter mutant that failed in active transport against a sugar concentration gradient, we further emphasized that the sugar translocation pathway is isolated from the cation-binding site. The new data strongly support the allosteric effects of cation binding on sugar-binding affinity. Thank you for this helpful suggestion.

      A meaningful analysis of ITC data heavily depends on the quality of the data. My laboratory has extensive experience with ITC and has gained rich, insightful mechanistic knowledge of MelB<sub>St</sub>. Because of the low affinity in raffinose and a-MG, unfortunately, no further information can be convincingly obtained. Therefore, we did not dissect the enthalpic and entropic contributions but focused on the Kd value and binding stoichiometry.

      (3) The HDX also appears to be well done; however, in the manuscript as written, it is difficult to understand how this relates to the overall mechanism of the protein and the conformational changes that the protein undergoes.

      We are sorry for not presenting our data clearly in the initial submission. In this revised manuscript, we have made numerous improvements, as described in the Summary. These enhancements in the HDX data analysis provided new mechanistic insights into the allosteric effects, leading us to conclude that protein dynamics and conformational transitions are coupled with sugar-binding affinity. Na<sup>+</sup> binding restricts protein conformational flexibility, thereby increasing sugar-binding affinity. The HDX study revealed that the major dynamic region includes a sugar-binding residue, Arg149, which also plays a gating role. Structurally, this dual-function residue undergoes significant displacement during the sugar-affinity-coupled conformational transition, thereby coupling the sugar binding and structural dynamics.

      Reviewer #2:

      This manuscript from Hariharan, Shi, Viner, and Guan presents x-ray crystallographic structures of membrane protein MelB and HDX-MS analysis of ligand-induced dynamics. This work improves on the resolution of previously published structures, introduces further sugar-bound structures, and utilises HDX to explore in further depth the previously observed positive cooperatively to cotransported cation Na<sup>+</sup>. The work presented here builds on years of previous study and adds substantial new details into how Na<sup>+</sup> binding facilitates melibiose binding and deepens the fundamental understanding of the molecular basis underlying the symport mechanism of cation-coupled transporters. However, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion.

      We appreciate this reviewer's time in reading our previous articles related to this manuscript.

      Comments on Crystallography and biochemical work:

      (1) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      This figure is a stereo view of a density map created in cross-eye style. In this revision, we changed this figure to Fig. 3 and showed only the density for sugar and water-1. 

      (2) It is slightly unclear what the ITC measurements add to this current manuscript. The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but it is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend. Additionally, the D59C mutant utilised here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium.

      Thank this reviewer for your helpful suggestions. We have performed the suggested ITC measurements with the D59C mutant. The purpose of the ITC experiments was to demonstrate that MelB<sub>St</sub> can bind raffinose and α-MG to support the crystal structures.

      Comments on HDX-MS work:

      While the use of HDX-MS to deepen the understanding of ligand allostery is an elegant use of the technique, this reviewer advises the authors to refer to the Masson et al. (2019) recommendations for the HDX-MS article (https://doi.org/10.1038/s41592-019-0459-y) on how to best present this data. For example:

      All authors value this reviewer's comments and suggestions, which have been included in this revision.

      (1) The Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilised protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      Yes, a lipid/detergent removal step was included in this study and previous ones, and this information was clearly described in the Methods.

      (2) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We have updated the Table S2 and addressed the reviewer’ request for the details of HDX experiments.

      (3) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We have prepared and presented deuterium uptake time-course plots for any peptides with ΔD > threshold in Fig. S5a-c.

      (4) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, this reviewer understands that working with dynamic transporters can lead to increased data variation; a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We appreciate this comment and have cited the suggested article on the hybrid significance method. We fully acknowledge that using a cutoff of P < 0.05 can increase the likelihood of false-positive identifications. By applying multiple levels of statistical testing, we determined that P < 0.05 is an appropriate threshold for this study. The threshold values were presented in the residual plots and explained in the text. For the previous Fig. 6 (renamed Fig. S4b in the current version), we have reported the P value. *, < 0.05; **, < 0.01. (The text for 0.01 was not visible in the previous version. Sorry for the confusion.)

      (5) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity. The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there is no difference between the dynamics of each site.

      The current Table S3 (combined from previous Tables S3 and S4 as suggested) was prepared to provide an overall view of the dynamic regions with SD values provided. For other questions, if we understand correctly, this reviewer asked us to comment on the effects of solvent accessibility or hydrophobic regions on the overall dynamics outside the binding residues of the peptides that cover them. Since HDX rates are influenced by two linked factors: solvent accessibility and hydrogen-bonding interactions that reflect structural dynamics, poor solvent accessibility in buried regions should result in low deuterium uptakes. The peptides in our dataset that include the Na<sup>+</sup>-binding site showed lower HDX, likely due to limited solvent accessibility and lower structural stability. It is unclear what this reviewer meant by "increased dynamics at peptides covering the Na binding site on overall more dynamic helices." We did not observe increased dynamics in peptides covering the Na<sup>+</sup>-binding site; instead, all Na<sup>+</sup>-binding residues and nearby sugar-binding residues have lower degrees of deuteriation.

      (6) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Thanks for this suggestion. The previous datasets were collected in the presence of Na<sup>+</sup>. In the current study, we also have two Na<sup>+</sup>-containing datasets. Both showed similar results: the multiple overlapping peptides covering the sugar-binding residues on helices I and V have higher HDX rates than those peptides covering the Na<sup>+</sup>-binding residues, even when Na<sup>+</sup> was present.

      (7) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Thank you for this suggestion. Comparing HDX-MS between the WT and the D59C mutant is certainly interesting, especially with the increasing amount of structural, biochemical, and biophysical data now available for this mutant. However, due to limited resources, we might consider it later.

      (8) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      We have shown that Li<sup>+</sup> also works positively with melibiose. Li<sup>+</sup> binds to MelB<sub>St</sub> with a higher affinity than Na<sup>+</sup> and modifies MelB<sub>St</sub> differently. It is important to study this thoroughly and separately. To answer the second question, H<sup>+</sup> is a weak coupling cation with little effect on melibiose binding. Since its pKa is around 6.5, only a small population of MelB<sub>St</sub> is protonated at pH 7.5. The order of sugar-binding cooperativity is highest with Na<sup>+</sup>, then Li<sup>+</sup>, and finally H<sup>+</sup>.

      (9) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to the OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      The sugar translocation free-energy landscape simulations showed that both helix bundles move relative to the membrane plane. This analysis aimed to clarify a hypothesis in the field—that the MFS transporter can use an asymmetric mode to perform the conformational transition between inward- and outward-facing states. In the case of MelB<sub>St</sub>, we clearly demonstrated that both domains move and each helix bundle moves as a unit. So only a small number of helices and loops showed labeling changes. Thanks for the suggestion about comparing with XylE. We have included that in the discussion.

      (10) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate, this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Thank this reviewer for reading the single-molecule force spectroscopy (SMFS) study on MelB<sub>St</sub>.  The C3 loop mentioned in this SMFS article is partially covered in the dataset Mel or Mel plus Na<sup>+</sup> vs. apo, and there is more coverage in the Na<sup>+</sup> vs. apo dataset. In either condition, no deprotection was detected. The labeling time point might not be long enough to detect it.

      Reviewer #3:

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelB<sub>St</sub>) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na<sup>+</sup>, H<sup>+</sup>, or Li<sup>+</sup>, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na<sup>+</sup> by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na<sup>+</sup> and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26, which forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na<sup>+</sup>, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      Thank this reviewer for your positive comments.

      Weaknesses:

      (1) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373.

      A water molecule can be modeled at a resolution ranging from 2.4 to 3.2 Å, and the quality of the model depends on the map quality and water location. In this revision, we refined the resolution to 2.6 Å using the same dataset and also performed all-atom MD simulations. All results support the occupancy of water-1 in the sugar-bound MelB<sub>St</sub>.

      (2) Site-directed mutagenesis could help strengthen the conclusions of the authors. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr121, Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the claims of the authors regarding the allosteric communication between the two substrate-binding sites.

      The authors thank this reviewer for the thoughtful suggestions. MelB<sub>St</sub> has been subjected to Cys-scanning mutagenesis (https://doi.org/10.1016/j.jbc.2021.101090). Placing a Cys residue at Gln372 significantly decreased the transport initial rate, accumulation, and melibiose fermentation, with minimal effect on protein expression, as shown in Figure 2 of this JBC article, which could support its role in the binding pocket. The T373C mutant retained most of the WT's activities. Our previous studies showed that Thr121 is only responsible for Na<sup>+</sup> binding in MelB<sub>St</sub>, and mutations decreased protein stability; now, HDX reveals that this is the rigid position. Additionally, our previous studies indicated that Arg295 is another conformationally important residue. In this version, we have added more HDX analysis to explore the relationship between the two substrate-binding sites with conformational dynamics, especially focusing on the gating salt-bridge network including Arg295, which has provided meaningful new insights.

      (3) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. It would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter more visible. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      Thanks for this important question. We have added more discussion of the deprotected data and prepared a new Fig. 8b to highlight the melibiose-binding-induced flexibility in several loops, especially the gating area on both sides of the membrane. We also proposed that these changes might facilitate the formation of the transition-competent state. The overall effects induced by substrate binding are relatively small, and the datasets for apo and Na were collected separately, so comparing melibiose&Na<sup>+</sup> versus Na<sup>+</sup> might not be as precise. In fact, the Na<sup>+</sup> effects on the sugar-binding site can be clearly seen in the deuterium uptake plots shown in Figures 7-8, by comparing the first and last panels.

      (4) For non-specialists, it would be beneficial to better introduce and explain the choice of using D59C for the structural analyses.

      Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this thermostable mutant D59C selectively abolishes all cation binding and associated cotransport activities, but it maintains intact sugar binding and exhibits conformational transition as the WT, as demonstrated by electroneutral transport reactions including α-NPG transport showed in this articles, and melibiose exchange and fermentation showed previously. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport, which supports the conclusion that the Na<sup>+</sup> functions as allosteric activator.

      (5) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226). I would recommend indicating more of them in areas where deuterium changes are substantial.

      We appreciate this comment and have modified the plots by marking the residue position as well as labeled several peptides of significant HDX in the Fig 5b. We also provided a deuteriation map based on peptide coverage (Fig. 5a).

      (6) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      This is an intriguing mechanistic question. In this HDX study, we found that the cation-binding pocket and nearby sugar-binding residues are conformationally rigid, while some sugar-binding residues farther from the cation-binding pocket are flexible. We concluded that conformational dynamics regulate sugar-binding affinity, but the increase in Na-binding affinity caused by melibiose is not related to protein dynamics. Our previous interpretation based on structural data remains our preferred explanation; therefore, the bound melibiose physically prevents the release of Na<sup>+</sup> or Li<sup>+</sup> from the cation-binding pocket. We also proposed the mechanism of intracellular NA<sup>+</sup> release in the 2024 JBC paper (https://doi.org/10.1016/j.jbc.2024.107427); after sugar release, the rotamer change of Asp55 will help NA<sup>+</sup> exit the cation pocket into the empty sugar pocket, and the negative membrane potential inside the cell will further facilitate movement from MelB<sub>St</sub> to the cytosol.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) It would help the reader if the previous work were introduced more clearly, and if the results of the experiments reported in this manuscript were put into the context of the previous work. Lines 283-296 discuss observations that are similar to previous reported structures as well as novel interpretations. It would help the reader to be clearer about what the new observations are.

      Thank you for the important comment. We have revised accordingly by adding related citations and words “as showed previously” when we stated our previous observations.

      (2) The affinity by ITC is measured for various ligands, but very few conclusions are drawn about how the affinity correlates with the binding modes. Are the other ligands that are investigated in this study transported by the protein, or do they just bind? Can the protein transport the trisaccharide raffinose? The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but this is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend.

      Additionally, the D59C mutant utilized here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium. For non-specialists, please better introduce and explain the choice of using D59C for the structural analyses.

      Thank you for the meaningful comments. We have comprehensively addressed all the concerns and suggestions as listed in the summary of this revision. Notably, the D59C mutant does not catalyze any electrogenic melibiose transport involved in a cation transduction but catalyze downhill transport location of the galactosides, as shown by the downhill α-NPG transport assay in Fig. 1a. The intact downhill transport results from D59C mutant further supports the allosteric coupling between the cation- and sugar-binding sites.

      The binding isotherm and poor affinity of the ITC measurements do not support to further analyze the binding mode since none showed sigmoidal curve, so the enthalpy change cannot be accurately determined. But authors thank this comment.

      (3) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #1.

      (4) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373. Please change line 278 to state "this OH-4 water molecule is likely part of sugar binding".

      We have addressed these concerns in the response to the Public Reviews at reviewer-3 #1.

      (5) Line 290-296: The Thr121 is not represented in any figures, while the Lys377 is. Their relative positioning between sugar water and sodium is not made clear by any figure.

      Thanks for this comment. This information has been clearly presented in the Figs. 7-8. Lys377 is closer to the cation site and related far from the sugar-binding site.

      (6) Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilized protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      (7) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (8) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (9) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, and this reviewer understands that working with dynamic transporters can lead to increased data variation, a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (10) The table (S3) and figure (S4) showing uncovered residues is an unclear interpretation of the data; this would be better given as a peptide sequence coverage heat map. This would also be more informative for the redundancy in covered regions, too. In this way, S3 and S4 can be combined.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (11) Residual plots in Figure 5 could be improved by a topological map to indicate how peptide number resembles the protein amino acid sequence.

      Thanks for the request, due to the figure 6 is big so that we add a transmembrane topology plot colored with the HDX results in Fig. 8c.

      (12) The presentation of data in S5 could be clarified. Does the number of results given in the brackets indicate overlapping peptides? What are the lengths of each of these peptides? Classical HDX data presentation utilizes blue for protection and red for deprotection. The use of yellow ribbons to show protection in non-sugar binding residues takes some interpretation and could be clarified by also depicting in a different blue. I also don't see the need to include ribbon and cartoon representation when also using colors to depict protection and deprotection. The authors should change or clarify this choice.

      We have moved this figure into the current Fig. 6b as suggested by Reviewer-3. To address your questions listed in the figure legend, the number of results shown in brackets indeed indicates overlapping peptides. What are the lengths of each of these peptides? The sequences of each peptide are shown in Figures 7-8 and are also included in Supplemental Figure S5. Regarding the use of color, both blue and green were used to distinguish peptides protecting the substrate-binding site from other regions. The ribbon and cartoon representations are provided for clarity, as the cartoon style hides many helices.

      (13) In Table S5, the difference between valid points and protection is unclear. And what is indicated by numbers in brackets or slashes? Additionally, it should be highlighted again here that single-residue information is inferred from peptide-level data. By value, are the authors referring to peptide-level differential data?

      Please review our responses in the Public Reviews at reviewer-2 #5.

      (14) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity? The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there isn't a difference between the dynamics of each site.

      Please review our responses in the Public Reviews at reviewer-2 #5.

      (15) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Please review our responses in the Public Reviews.

      (16) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to the OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences instead between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      Please review our responses in the Public Reviews.

      (17) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Please review our responses in the Public Reviews.

      (18) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation more visible to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. You would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      Please review our responses in the Public Reviews.

      (19) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226); I would recommend indicating more of them, in areas where deuterium changes are substantial.

      Please review our responses in the Public Reviews.

      (20) Figure 6, please indicate in the legend what the black and blue lines are (I assume black is for the apo?)

      We are sorry that we did not make it clear. Yes, the black was used for apo state and blue was used for all bound states

      (21) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      Please review our responses in the Public Reviews.

      Addressing the following three points would strengthen the manuscript, but also involve a significant amount of additional experimental work. If the authors decide not to carry out the experiments described below, they can still improve the assessment by focusing on points (1-21) described above.

      (22) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Please review our responses in the Public Reviews.

      (23) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      Please review our responses in the Public Reviews.

      (24) Site-directed mutagenesis could help strengthen the conclusions. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr 121 and Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the authors' claims regarding allosteric communication between the two substrate-binding sites.

      Please review our responses in the Public Reviews.

    1. eLife Assessment

      This important study uses standard single-cell RNA-seq analyses combined with methods from the social sciences to reduce heterogeneity in gene expression in Drosophila imaginal wing disc cells treated with 4000 rads of ionizing radiation. The use of this methodology from social sciences is novel in Drosophila and allows them to identify a subpopulation of cells that is disproportionately responsible for much of the radiation-induced gene expression. Their compelling analyses reveal genes that are expressed regionally after irradiation, including ligands and transcription factors that have been associated with regeneration, as well as others whose roles in response to irradiation are unknown. This paper would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyze transcription in single cells before and after 4000 rads of ionizing radiation. They use Seuratv5 for their analyses, which allows them to show that most of the genes cluster along the proximal-distal axis. Due to the high heterogeneity in the transcripts, they use the Herfindahl-Hirschman index (HHI) from Economics, which measures market concentration. Using the HHI, they find that genes involved in several processes (like cell death, response to ROS, DNA damage response (DDR)) are relatively similar across clusters. However, ligands activating the JAK/STAT, Pvr, and JNK pathways and transcription factors Ets21C and dysf are upregulated regionally. The JAK/STAT ligands Upd1,2,3 require p53 for their upregulation after irradiation, but the normal expression of Upd1 in unirradiated discs is p53-independent. This analysis also identified a cluster of cells that expressed tribbles, encoding a factor that downregulates mitosis-promoting String and Twine, that appears to be G2/M arrested and expressed numerous genes involved in apoptosis, DDR, the aforementioned ligands and TFs. As such, the tribbles-high cluster contains much of the heterogeneity.

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs and analyzing the data with Seurat v5 and HHI.<br /> (2) These data will be informative for the field.<br /> (3) Most of the data is well-presented.<br /> (4) The literature is appropriately cited.

      Weaknesses

      The authors have addressed my concerns in the revised article.

    3. Reviewer #2 (Public review):

      This manuscript investigates the question of cellular heterogeneity using the response of Drosophila wing imaginal discs to ionizing radiation as a model system. A key advance here is the focus on quantitatively expressing various measures of heterogeneity, leveraging single-cell RNAseq approaches. To achieve this goal, the manuscript creatively uses a metric from the social sciences called the HHI to quantify the spatial heterogeneity of expression of individual genes across the identified cell clusters. Inter- and intra-regional levels of heterogeneity are revealed. Some highlights include identification of spatial heterogeneity in expression of ligands and transcription factors after IR. Expression of some of these genes shows dependence on p53. An intriguing finding, made possible by using an alternative clustering method focusing on cell cycle progression, was the identification of a high-trbl subset of cells characterized by concordant expression of multiple apoptosis, DNA damage repair, ROS related genes, certain ligands and transcription factors, collectively representing HIX genes. This high-trbl set of cells may correspond to an IR-induced G2/M arrested cell state.

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      The authors responded well to my suggestions for improvement, which were incorporated in the revised version of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Cruz and colleagues report a single cell RNA sequencing analysis of irradiated Drosophila larval wing discs. This is a pioneering study because prior analyses used bulk RNAseq analysis so differences at single cell resolution were not discernable. To quantify heterogeneity in gene expression, the authors make clever use of a metric used to study market concentration, the Herfindahl-Hirschman Index. They make several important observations including region-specific gene expression coupled with heterogeneity within each region and the identification of a cell population (high Trbl) that seems disproportionately responsible for radiation-induced gene expression.

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occurs in response to uniform induction of damage by X-rays in a single layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration and development.

      Weaknesses:

      The authors have addressed my concerns adequately with changes made in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewing Editor Comment:

      The reviewers felt that the study could be improved by (1) better integrating the results with the existing literature in the field

      (1) In the Introduction and Results section of the manuscript, we had made every attempt to cite the relevant literature. (Reviewer 1 stated that “The literature is appropriately cited”). We agree with the Reviewing Editor that rather than simply cite the relevant literature, we could have done a better job of integrating our findings with what has been previously discovered by others. We have attempted to do this in the revised manuscript. Also, we have included many additional citations in the Introduction and in the first section of the Results where work by others has provided a framework for interpreting our single-cell studies.

      and (2) manipulating Trib expression and analyzing the expression of 1-2 HIX genes.

      (2) We are grateful for this suggestion. As suggested by the Reviewing Editor we have attempted to increase and decrease trbl expression and assess the effect on expression of two genes, Swim and CG15784.

      We increased trbl levels in the wing pouch using rn-Gal4, tub-Gal80<sup>ts</sup> and UAS-trbl. By transferring larvae for 24 h from 18oC to 31oC, we were able to induce trbl expression in the wing pouch. When these larvae were irradiated at 4000 rad, we found reduced levels of apoptosis in the wing pouch of discs that overexpressed trbl (Figure 7-figure supplement 1). This indicated that upregulation of trbl is radioprotective. Consistent with our findings, others have previously shown that upregulation of trbl and stalling in the G2 phase of the cells cycle protects cells from JNK-induced apoptosis (Cosolo et al., 2019, PMID:30735120) or that downregulating the G2/M progression promoting factor string protects cells from X-ray radiation induced apoptosis (Ruiz-Losada et al., 2021, PMID:34824391).

      As suggested by the Reviewing Editor, we also examined the effect of trbl overexpression on the induction of two “highly induced by X-ray irradiation (HIX)” gene, Swim and CG15784. Increasing trbl expression had no effect on the induction of Swim and only a modest decrease in the induction of CG15784 (Figure 7-figure supplement 2). Thus, increasing trbl expression, is in itself, insufficient to promote HIX gene expression indicating that other factors are necessary for HIX gene induction.

      We also attempted to reduce trbl expression, using three different RNAi lines. While some of these lines have been used previously by others to reduce trbl expression under unirradiated conditions (Cosolo et al., 2019, PMID:30735120), we nevertheless wanted to check if they reduced trbl induction following irradiation. For each of the three lines, we observed no obvious reduction in trbl RNA following irradiation when visualized using HCR (Author response image 1). Thus, any effects on gene expression that we observe could not be attributed to a decrease in trbl expression. We have therefore included the images showing a lack of knockdown in this Response to Reviews document but not included these experiments in the revised manuscript.

      Author response image 1.

      RNA in situ hybridizations using the hybridization chain reaction performed using probes to trbl. In A-F, the RNAi is expressed using nubbin-Gal4. In G-I the RNAi is expressed using rn-Gal4, tub-Gal80<sup>ts</sup>. white-RNAi was used as a control (A, B, G, H). Three different RNAi lines directed against trbl were tested: Vienna lines VDRC 106774 (C, D) and VDRC 22113 (E, F), and Bloomington line BL42523. In no case was a reduction in trbl RNA upregulation in the wing pouch following 4000 rad observed, except for one disc (n = 6) of VDRC 106774 crossed to nubbin-gal4.

      Reviewer #1 (Public review):

      Summary:

      The authors analyze transcription in single cells before and after 4000 rads of ionizing radiation. They use Seuratv5 for their analyses, which allows them to show that most of the genes cluster along the proximal-distal axis. Due to the high heterogeneity in the transcripts, they use the Herfindahl-Hirschman index (HHI) from Economics, which measures market concentration. Using the HHI, they find that genes involved in several processes (like cell death, response to ROS, DNA damage response (DDR)) are relatively similar across clusters. However, ligands activating the JAK/STAT, Pvr, and JNK pathways and transcription factors Ets21C and dysf are upregulated regionally. The JAK/STAT ligands Upd1,2,3 require p53 for their upregulation after irradiation, but the normal expression of Upd1 in unirradiated discs is p53-independent. This analysis also identified a cluster of cells that expressed tribbles, encoding a factor that downregulates mitosis-promoting String and Twine, that appears to be G2/M arrested and expressed numerous genes involved in apoptosis, DDR, the aforementioned ligands, and TFs. As such, the tribbles-high cluster contains much of the heterogeneity.

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented

      (4) The literature is appropriately cited.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      For each condition at least 5 discs were imaged but we imaged up to 15 discs in some cases. We tried to choose a representative disc for each condition after looking at all of them. All discs imaged under each condition are shown below; the disc chosen for the figure is indicated with an asterisk. All scale bars are 100 mm.

      Author response image 2.

      Images for discs shown in Manuscript Figure 1panels B, C

      Author response image 3.

      Images for discs shown in Manuscript Figure 1panels D, E

      Author response image 4.

      Images used in Manuscript Figure 1, F, G

      Author response image 5.

      Images used in Manuscript Figure 1H, I

      Author response image 6.

      Images used in Manuscript Figure 1J, K

      Author response image 7.

      Images used in Manuscript Figure 1L, M

      (2) Some of the figures are unclear.

      It is unclear to us exactly which figures the Reviewer is referring to. Perhaps this is the same issue mentioned below in “Recommendations for the authors”. We address it below.

      Reviewer #1 (Recommendations for the authors):

      (1) Regarding Figure 1, what is stained in blue? Is it DAPI? If so, this should be added to the figure legend.

      Thank you for pointing out this omission. This has been addressed in the revised manuscript.

      It is very difficult to see blue on black, so could the authors please outline the discs?

      Alternatively, they could show DAPI in green and the markers (pH2Av, etc) in magenta.

      We used DAPI (blue) as a way of outlining the discs. While we appreciate the reviewer’s concern, after reviewing the images, we found that the blue is clearly visible when the document is viewed on the screen. It is less obvious if the document is printed on some kinds or printers. Since boosting this channel would make the signal from the channels more difficult to see, we left the images as they were.

      (2) Figure 3, Figure Supplement 2, panel B. It is not possible to read the gene names in the panel's current form. Please break this up into 4 lines (as much as possible from the current 2).

      Thank you for this suggestion. We have done this in the revised manuscript.

      Reviewer #2 (Public review):

      This manuscript investigates the question of cellular heterogeneity using the response of Drosophila wing imaginal discs to ionizing radiation as a model system. A key advance here is the focus on quantitatively expressing various measures of heterogeneity, leveraging single-cell RNAseq approaches. To achieve this goal, the manuscript creatively uses a metric from the social sciences called the HHI to quantify the spatial heterogeneity of expression of individual genes across the identified cell clusters. Inter- and intra-regional levels of heterogeneity are revealed. Some highlights include the identification of spatial heterogeneity in the expression of ligands and transcription factors after IR. Expression of some of these genes shows dependence on p53. An intriguing finding, made possible by using an alternative clustering method focusing on cell cycle progression, was the identification of a high-trbl subset of cells characterized by concordant expression of multiple apoptosis, DNA damage repair, ROS-related genes, certain ligands, and transcription factors, collectively representing HIX genes. This high-trbl set of cells may correspond to an IR-induced G2/M arrested cell state.

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      Thank you for your assessment of the work.

      Reviewer #2 (Recommendations for the authors):

      I suggest two major points for improvement:

      (1) It is important to test whether manipulation of trbl levels (i.e., overexpression, knockdown, mutation) would result in measurable biological outcomes after IR, such as altered HIX gene expression, altered cell cycle progression, or both. This may help disentangle the question of whether high trbl expression and correlated HIX gene expression are a cause or consequence of G2/M stalling.

      We have described these experiments at the beginning of this Response to Reviews document when addressing the comments made by the Reviewing Editor. Please see Figure 7, figure supplements 1 and 2. These experiments suggest that upregulation of trbl offers some protection from radiation-induced death, yet it is itself insufficient to induce expression of two HIX genes tested. As we have also described earlier, three different RNAi lines tested did not reduce trbl upregulation after irradiation.

      (2) A more extensive characterization of the high-trbl cell state would also be appropriate, particularly in terms of their relationship to the cell cycle.

      We attempted to address this issue in two ways. First, we used the expression of a trbl-gfp transgene and RNA in-situ hybridization experiments to visualize the distribution of the high-trbl cells (shown in new manuscript figure, Figure 6-figure supplement 3). When examining trbl RNA in irradiated discs, there is no obvious demarcation between cells that express high levels of trbl and other cells. This is also apparent in the UMAP shown in Figure 6A and A’. Most cells seem to express trbl; cells in the “high trbl” cluster simply express more trbl than others. We observed cells expressing trbl and PCNA as well as cells expressing only one of those two genes at detectable levels. Thus, it was not possible to distinguish the “high trbl” cells from other cells by this approach.

      We decided instead to focus on examining the expression of other cell-cycle genes in the high-trbl cluster. We have added a paragraph in the Results section that details our findings. Many transcriptional changes are indeed consistent with stalling in G2 such as high levels of trbl and low levels of string (stg). Additionally, that the cells are likely in G2 is consistent with reduced levels of genes that are normally expressed at other stages of the cell cycle: G1 genes such as E2f1 and Dp, S-phase genes such as several Mcm genes, PCNA and RnrS, and genes that encode mitotic proteins such as polo, Incenp and claspin. There are however, several anomalies such as slightly increased expression of the early-G1 cyclin, CycD, and the retinoblastoma ortholog Rbf. Thus, at least as assessed by the transcriptome, this cluster may not correspond to a cell state that is found under normal physiological conditions.

      (3) Minor: p. 12, line 3. Figure 5A is mentioned, but it seems that it should be 4A instead.

      Thank you for pointing this out. We have addressed this in our revisions.

      Reviewer #3 (Public review):

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNA-seq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to 57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Work by others (Ruiz-Losada et al., 2021, PMID:34824391) has shown that almost 80% of cells have a 4C DNA content 4 h after 4,000 rad X-ray irradiation. The high-trbl cluster accounts for only 18% of cells and can therefore account for a minority of cells with a 4C DNA content.

      Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs.

      We expect that clusters 1 and 2 are largely comprised of cells in G2/M. Together, these clusters are marked by some genes previously found to be higher in FACS separated G2 cells compared to G1 cells (Liang et al., 2014, PMID: 24684830). These genes include Det, aurA, and ana1. Strangely, cluster 0 is not strongly marked by any of the 175 cell cycle genes used in our clustering (eff being the strongest marker) and has a lower-than-average expression of 165/175 cell cycle genes. Cluster 0 is however marked by the genes ac and sc, which are known to be expressed in proneuronal cell clusters interspersed throughout the disc that stall in G2 and form mitotically quiescent domains (Usui & Kimura 1992, Development, 116 (1992), pp. 601-610 (no PMID); Nègre et al., 2003, PMID: 12559497). Given these observations, we hypothesize that cluster 0 is largely comprised of stalled G2 cells like those found in ac/sc-expressing proneural clusters.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread as we now show in Figure 6-figure supplement 3.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high-trbl cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. Another possibility is that dysf upregulation might be acutely sensitive to the developmental stage of the disc. This would require experiments with very precisely-staged larvae. We have not investigated this further as it is not a central issue in our paper.

      Reviewer #3 (Recommendations for the authors):

      Please check the color-coding in Figure 1A. The region marked as pouch appears to include hinge folds that express Zfh2 (a hinge marker) in Figure 2A (even after accounting for low Zfh2 expression in part of the pouch).

      We have corrected this and have marked the pouch region based on the analysis of expression of different hinge and pouch markers by Ayala-Camargo et al. 2013 (PMID 2398534).

      The statement 'Furthermore, within tissues, stem cells are most sensitive while differentiated cells are relatively radioresistant' needs to be qualified, as there are differences in radiosensitivity of adult versus embryonic stem cells (e.g., PMID: 30588339)

      We thank the reviewer for bringing this point to our attention and for pointing us to an article that addresses this issue in detail. We appreciate that our statement was rather simplistic – we have modified it and added two additional references.

    1. eLife Assessment

      This important study, which tackles the challenge of analyzing genome integrity and instability in unicellular pathogens by introducing a novel single-cell genomics approach, presents compelling evidence that this new tool outperforms standard whole-genome amplification techniques. While thorough and rigorous, the work's impact would increase by providing scripts and data, as well as a description of the biological relevance that would make this method more appealing to the broad community studying genetic heterogeneity in diverse organisms.

    2. Reviewer #1 (Public review):

      Summary:

      Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

      In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

      Strengths:

      This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

      Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

      Weaknesses:

      The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

    3. Reviewer #2 (Public review):

      Summary:

      Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

      The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

      Strengths:

      (1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

      (2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

      (3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

      (4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

      (5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

      (6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

      (7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

      Weaknesses:

      (1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

      (2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

      (3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

      (4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

      (5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

      (6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

      (7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

      (8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

      (9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

    4. Reviewer #3 (Public review):

      In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

      I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

      (1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

      (2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

      (3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

      (4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

      (5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

      (5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

      (5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000.

      Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

      In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

      Strengths:

      This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

      Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

      We thank the reviewer for the positive feedback and appreciation of the potential applications for the methodology we describe here.

      Weaknesses:

      The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

      We thank the reviewer for the suggestions. Indeed, single-cell DNA sequencing has successfully revealed cell-to-cell variability in replication timing and fork progression in mammalian cells[1,2] and we believe that the SPC-PTA workflow could be used in similar studies in Leishmania to complement bulk-based observations[3,4]. Regarding nucleotide information, it is indeed of high relevance to detect minor circulating variants with potential virulence impact and/or effect on drug resistance which could be missed by bulk sequencing. This includes the ability to detect co-occurring variants with potential epistatic effects. These topics will be further developed in the revised version. Finally, we will explicitly discuss how this methodology can be applied beyond Leishmania, to investigate genome plasticity, adaptation, and evolutionary processes in other organisms.

      Reviewer #2 (Public review):

      Summary:

      Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

      The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

      Strengths:

      (1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

      (2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

      (3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

      (4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

      (5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

      (6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

      (7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

      We deeply appreciate the positive and encouraging feedback on our manuscript.

      Weaknesses:

      (1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

      The methanol fixation prior to lysis is part of the original protocol described in the Single-Microbe Genome Barcoding Kit manual and was meant to facilitate lysis and DNA denaturation in bacterial cells (for which the kit was originally developed). However, in our preliminary tests with bulk samples – described in the supplementary material – we noticed a strong negative effect on lysis efficiency/DNA recovery when parasites were fixed with methanol. Thus, we decided to test the effect of skipping this step in the single-cell DNA workflow. We kept the SPC_STD1 sample to have a safe control where the full workflow described in the kit manual was followed.

      As we were unsure if the standard lysis (25 ˚C for 15 minutes) would work efficiently for Leishmania, we included the heat-lysis (99˚C for 15 minutes) as well as the longer incubation lysis (25 ˚C for 1h). These modifications were listed as validated alternatives in the kit's manual.

      The 100k reads threshold was chosen based on the number of reads found in the 'true cell' with the lowest read count.

      Regarding variant calling, a variant was considered confidently called if it was covered, at single-cell level, by at least one deduplicated read with Phred quality above Q30 and mapping quality (MAPQ) also above 30.

      In the revised version, we will include these explanations and improve the explanation of the metrics used to estimate coverage quality.

      (2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

      We agree with the reviewer that the longer incubation period of PTA might explain the higher read count seen in the PTA samples, although the differences in amplification kinetics (linear in PTA, exponential in STD) and potential differences in amplification saturation points make it difficult to compare them. For instance, an updated version of PTA (ResolveDNA V2) uses a lower amplification time (2.5 h) and achieves similar amplification levels compared to the 10h incubation time, suggesting PTA amplification saturates well before the 10h time. In any case, all quality check metrics were done with the cells subsampled to 100 k reads to mitigate the effect of read count differences on the data quality.

      (3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

      We agree with the reviewer but during experimental execution SPCs were only assessed qualitatively via microscopy following the Single-cell microbe DNA barcoding kit manual. No quantitative analysis was done and therefore we do not have this data. Regarding doublet, this was done in silico based on the detection of SPCs containing mixed genomes from the two strains used in the study as described in the Materials and Methods. As pointed by another reviewer, this only allow the detection of inter-strain doublets. In the revised version, we explain this and add an estimation of total doublets based on the inter-strain doublet rate.

      (4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

      After exchange with the technical support team of the SPC generator kit, it was clarified that the heat lysis done in STD4 should have had a shorter incubation time (10 minutes instead of 15 minutes). We suspect that the longer incubation time, combined with the higher temperature and the harsh lysis condition with 0.8M KOH might have damaged SPCs and therefore DNA might have leaked out of them before WGA. In the microscopy images, SPCs in STD4 show a swollen aspect not seen in the other samples. In the revised version we will explain this more clearly.

      (5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

      As described above we will include more discussion on potential biological relevance of the method in the revised version of the manuscript. In the revised version we will attempt to use dedicated bioinformatic tools to discover de novo CNVs, as per the suggestion of other reviewers. This might also allow us to determine the detection limit of the methodology for CNVs.

      (6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

      We believe our data, which uses short-read sequences, does not allow to differentiate between intra-chromosomal CNVs and linear or circular episomal CNVs, so we cannot define if circular CNVs are over-amplified. Of note, we have previously demonstrated that the M-locus CNV in chromosome 36 is intrachromosomal, not circular (episomal)[5].

      (7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

      We believe the SPC-PTA workflow can be applied to organisms with larger genomes as PTA was developed specifically for mammalian cells[6], and also because, in our hands, it outperformed the 10X scDNA solution, which was developed for mammals.

      We believe direct comparison with other studies regarding coverage levels is elusive because other steps in the workflow apart from the WGA, such as the library preparation (PCR-based in our case), as well as genome features like GC content, size, and presence of repetitive regions, can also affect coverage levels and evenness. One strength of our approach was the use a single sample (the 50/50 mix between two L. donovani strain) for all conditions, thus removing potential parasite-specific biases. In addition, the application of a multiplexing system during barcoding allowed us to combine all samples prior to library preparation, thus removing potential differences introduced by this step.

      Regarding the effect of GC-content, we did notice a positive bias in all samples in regions with higher GC content, which had to be corrected in silico. This was the opposite to a negative bias observed in previous study[7] likely due to differences in WGA and/or library preparation. In the revised version, we will include a supplementary figure showing the GC bias.

      (8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

      In the revised version we will provide precise cost estimates and the rationale for the estimation.

      (9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

      The full Zenodo link (https://doi.org/10.5281/zenodo.17094083) will be included in the revised version.

      Reviewer #3 (Public review):

      Summary

      In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

      Strengths

      I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

      We thank the reviewer for the enthusiasm and positive feedback.

      Weaknesses

      (1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

      Both the Zenodo (https://doi.org/10.5281/zenodo.17094083) and the GitHub (https://github.com/gabrielnegreira/2025_scDNA_paper) repositories are now publicly available.

      (2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

      The reason for the higher presence of 'background' SPCs in the PTA samples is not clear, but we hypothesize that it could be due to PTA favoring amplification of small, free floating DNA molecules that might have been trapped in cell-free SPCs, as PTA works with shorter amplicons. Also, the longer incubation time seen in PTA (10 h) might have allowed enhanced amplification of low quantities of free-floating DNA to detectable levels. Regarding Table 1, indeed it only show the total number of reads per sample. In the revised version we will include the suggested column to Table 1.

      (3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

      We believe the 100.000 cutoff is already high for aneuploidy analysis as we have successfully reconstructed parasite karyotype with 20.000 reads per cell8, so a higher cutoff will likely not improve it. For CNV analysis, in the revised version, we will try to identify de novo CNVs using dedicated bioinformatic tools as per other reviewer suggestions. There, we will also test if a higher CNV detection sensitivity is achieved using the suggested 400,000 reads cutoff for the PTA samples.

      (4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

      We fully agree with the reviewer. We will make it clear in the revised version that we quantify inter-strain doublets only, and we will also provide an estimation of total doublets based on the inter-strain doublet rate.

      (5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

      (5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

      We will add a short discussion clarifying the limitations, while noting that our data demonstrate the ability of the approach to resolve very closely related cells, as illustrated by the fine-scale genetic differences observed within the clonal BPK081 population and by the detection of rare variants at targeted loci. We will also emphasize that the sensitivity to detect closely related genotypes depends on sequencing depth and the genomic regions considered.

      (5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000. Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

      In the revised version we will improve the description of the phylogenetic analysis. We will also investigate deeper the 8 mentioned cells to define if they have confounding factors that might have led to their discrepancy. The possibility of multiclonal infection in HU3 is not excluded as this strain was not cloned after isolation.

      References:

      (1) Dileep, V., Gilbert, D. M., Dileep, V. & Gilbert, D. M. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. Nat. Commun. 9, 427 (2018).

      (2) Miura, H. et al. Single-cell DNA replication profiling identifies spatiotemporal developmental dynamics of chromosome organization. Nat. Genet. 51, 1356–1368 (2019).

      (3) Marques, C. A. et al. Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe. Genome Biol. 16, 230 (2015).

      (4) Damasceno, J. D. et al. Leishmania major chromosomes are replicated from a single high-efficiency locus supplemented by thousands of lower efficiency initiation events. Cell Rep. 44, 116094 (2025).

      (5) Imamura, H. et al. Evolutionary genomics of epidemic visceral leishmaniasis in the Indian subcontinent. eLife 5, e12613 (2016).

      (6) Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl. Acad. Sci. 118, e2024176118 (2021).

      (7) Imamura, H. et al. Evaluation of whole genome amplification and bioinformatic methods for the characterization of Leishmania genomes at a single cell level. Sci. Rep. 10, 15043 (2020).

      (8) Negreira, G. H. et al. High throughput single-cell genome sequencing gives insights into the generation and evolution of mosaic aneuploidy in Leishmania donovani. Nucleic Acids Res. 50, 293–305 (2022).

    1. eLife Assessment

      This work reports the characterization of newly identified genetic variants of SLC4A1 in patients with distal renal tubular acidosis. Cell culture studies supplemented with histological analysis of a previously established disease mouse model provide convincing evidence that some of the variants increase intracellular pH, reduce ATP synthesis, and attenuate autophagic degradative flux. The study is valuable in establishing a mechanistic framework for future exploration of the link between intracellular pH and mutations in SLC4A1 in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      This study is an evaluation of patient variants in the kidney isoform of AE1 linked to distal renal tubular acidosis. Drawing on observations in the mouse kidney, this study extends findings to autophagy pathways in a kidney epithelial cell line.

      Strengths:

      Experimental data are convincing and nicely done.

      The revised manuscript incorporates most of the reviewer recommendations and presents a more cohesive story that is easier to read and assess. The data are convincing, of suitable quality and nicely presented. Statistical evaluation is rigorous. The link between kAE1 mutants and cell metabolism and autophagy is novel and provides insights on pathological observations in dRTA.

    3. Reviewer #2 (Public review):

      Context and significance:

      Distal renal tubular acidosis (dRTA) can be caused by mutations in a Cl-/HCO3- exchanger (kAE1) encoded by the SLC4A1 gene. The precise mechanisms underlying the pathogenesis of the disease due to these mutations is unclear, but it is thought that loss of the renal intercalated cells (ICs) that express kAE1 and/or aberrant autophagy pathway function in the remaining ICs may contribute to the disease. Understanding how mutations in SLC4A1 affect cell physiology and cells within the kidney, a major goal of this study, is an important first step to unraveling the pathophysiology of this complex heritable kidney disease.

      Summary:

      The authors identify a number of new mutations in the SLC4A1 gene in patients with diagnosed dRTA that they use for heterologous experiments in vitro. They also use a dRTA mouse model with a different SLC4A1 mutation for experiments in mouse kidneys. Contrary to previous work that speculated dRTA was caused mainly by trafficking defects of kAE1, the authors observe that their new mutants (with the exception of Y413H) traffic and localize at least partly to the basolateral membrane of polarized heterologous mIMCD3 cells, an immortalized murine collecting duct cell line. They go on to show that the remaining mutants induce abnormalities in the expression of autophagy markers and increased numbers of autophagosomes, along with an alkalinized intracellular pH. They also reported that cells expressing the mutated kAE1 had increased mitochondrial content coupled with lower rates of ATP synthesis. The authors also observed a partial rescue of the effects of kAE1 variants through artificially acidifying the intracellular pH. Taken together, this suggests a mechanism for dRTA independent of impaired kAE1 trafficking and dependent on intracellular pH changes that future studies should explore.

      Strengths:

      The authors corroborate their findings in cell culture with a well characterized dRTA KI mouse and provide convincing quantification of their images from the in vitro and mouse experiments. The data largely support the claims as stated. Some of the mutants induce different strengths of effects on autophagy and the various assays than others, and it is not clear why this is from the data in the manuscript. The authors provide discussion of potential reasons for these differences that future studies could explore.

      Weaknesses:

      The pH effects of their mutants are only explored in vitro, and the in vitro system has a number of differences from a living mouse kidney or ex vivo kidney slice.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have identified novel dRTA causing SLC4A1 mutations and studied the resulting kAE1 proteins to determine how they cause dRTA. Based on a previous study on mice expressing the dRTA kAE1 R607H variant, the authors hypothesize that kAE1 variants cause an increase in intracellular pH which disrupts autophagic and degradative flux pathways. The authors clone these new kAE1 variants and study their transport function and subcellular localization in mIMCD cells. The authors show increased abundance of LC3B II in mIMCD cells expressing some of the kAE1 variants, as well as reduced autophagic flux using eGFP-RFP-LC3. These data, as well as the abundance of autophagosomes, serve as the key evidence that these kAE1 mutants disrupt autophagy. Furthermore, the authors demonstrate that decreasing the intracellular pH abrogates the expression of LC3B II in mIMCD cells expressing mutant SLC4A1. Lastly, the authors argue that mitochondrial function, and specifically ATP synthesis, is suppressed in mIMCD cells expressing dRTA variants and that mitochondria are less abundant in AICs from the kidney of R607H kAE1 mice. Overall, the authors provide evidence about how new kAE1 mutants may cause dRTA.

      Strengths:

      The authors cloned novel dRTA causing kAE1 mutants into expression vectors to study the subcellular localization and transport properties of the variants. The immunofluorescence images are generally of high quality and the authors do well to include multiple samples for all of their western blots.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This study is an evaluation of patient variants in the kidney isoform of AE1 linked to distal renal tubular acidosis. Drawing on observations in the mouse kidney, this study extends findings to autophagy pathways in a kidney epithelial cell line. 

      Strengths: 

      Experimental data are convincing and nicely done.

      Thank you

      Weaknesses: 

      Some data are lacking or not explained clearly. Mutations are not consistently evaluated throughout the study, which makes it difficult to draw meaningful conclusions.

      We have revised our manuscript to clarify some earlier explanations and provided rationale for focusing on specific variants throughout the study.

      Reviewer #2 (Public review):

      Context and significance: 

      Distal renal tubular acidosis (dRTA) can be caused by mutations in a Cl-/HCO3- exchanger (kAE1) encoded by the SLC4A1 gene. The precise mechanisms underlying the pathogenesis of the disease due to these mutations are unclear, but it is thought that loss of the renal intercalated cells (ICs) that express kAE1 and/or aberrant autophagy pathway function in the remaining ICs may contribute to the disease. Understanding how mutations in SLC4A1 affect cell physiology and cells within the kidney, a major goal of this study, is an important first step to unraveling the pathophysiology of this complex heritable kidney disease. 

      Summary: 

      The authors identify a number of new mutations in the SLC4A1 gene in patients with diagnosed dRTA that they use for heterologous experiments in vitro. They also use a dRTA mouse model with a different SLC4A1 mutation for experiments in mouse kidneys. Contrary to previous work that speculated dRTA was caused mainly by trafficking defects of kAE1, the authors observe that their new mutants (with the exception of Y413H, which they only use in Figure 1) traffic and localize at least partly to the basolateral membrane of polarized heterologous mIMCD3 cells, an immortalized murine collecting duct cell line. They go on to show that the remaining mutants induce abnormalities in the expression of autophagy markers and increased numbers of autophagosomes, along with an alkalinized intracellular pH. They also reported that cells expressing the mutated kAE1 had increased mitochondrial content coupled with lower rates of ATP synthesis. The authors also observed a partial rescue of the effects of kAE1 variants through artificially acidifying the intracellular pH. Taken together, this suggests a mechanism for dRTA independent of impaired kAE1 trafficking and dependent on intracellular pH changes that future studies should explore. 

      Strengths: 

      The authors corroborate their findings in cell culture with a well-characterized dRTA KI mouse and provide convincing quantification of their images from the in vitro and mouse experiments

      Thank you  

      Weaknesses: 

      The data largely support the claims as stated, with some minor suggestions for improving the clarity of the work. Some of the mutants induce different strengths of effects on autophagy and the various assays than others, and it is not clear why this is from the present manuscript, given that they propose pHi and the unifying mechanism

      We have modified our manuscript to discuss the various strengths of the mutants and emphasize that alteration of cytosolic pH by kAE1 variants may not be the only mechanism leading to dRTA.  

      Reviewer #3 (Public review):

      Summary: 

      The authors have identified novel dRTA causing SLC4A1 mutations and studied the resulting kAE1 proteins to determine how they cause dRTA. Based on a previous study on mice expressing the dRTA kAE1 R607H variant, the authors hypothesize that kAE1 variants cause an increase in intracellular pH, which disrupts autophagic and degradative flux pathways. The authors clone these new kAE1 variants and study their transport function and subcellular localization in mIMCD cells. The authors show increased abundance of LC3B II in mIMCD cells expressing some of the kAE1 variants, as well as reduced autophagic flux using eGFP-RFP-LC3. These data, as well as the abundance of autophagosomes, serve as the key evidence that these kAE1 mutants disrupt autophagy. Furthermore, the authors demonstrate that decreasing the intracellular pH abrogates the expression of LC3B II in mIMCD cells expressing mutant SLC4A1. Lastly, the authors argue that mitochondrial function, and specifically ATP synthesis, is suppressed in mIMCD cells expressing dRTA variants and that mitochondria are less abundant in AICs from the kidney of R607H kAE1 mice. While the manuscript does reveal some interesting new results about novel dRTA causing kAE1 mutations, the quality of the data to support the hypothesis that these mutations cause a reduction in autophagic flux can be improved. In particular, the precise method of how the western blots and the immunofluorescence data were quantified, with included controls, would enhance the quality of the data and offer more supportive evidence of the authors' conclusions. 

      Strengths: 

      The authors cloned novel dRTA causing kAE1 mutants into expression vectors to study the subcellular localization and transport properties of the variants. The immunofluorescence images are generally of high quality, and the authors do well to include multiple samples for all of their western blots.

      Thank you

      Weaknesses: 

      Inconsistent results are reported for some of the variants. For example, R295H causes intracellular alkalinization but also has no effect on intracellular pH when measured by BCECF. The authors also appear to have performed these in vitro studies on mIMCD cells that were not polarized, and therefore, the localization of kAE1 to the basolateral membrane seems unlikely, based upon images included in the manuscript. Additionally, there is no in vivo work to demonstrate that these kAE1 variants alter intracellular pH, including the R607H mouse, which is available to the authors. The western blots are of varying quality, and it is often unclear which of the bands are being quantified. For example, LAMP1 is reported at 100kDa, the authors show three bands, and it is unclear which one(s) are used to quantify protein abundance. Strikingly, the authors report a nonsensical value for their quantification of LCRB II in Figure 2, where the ratio of LCRB II to total LCRB (I + II) is greater than one. The control experiments with starvation and bafilomyocin are not supportive and significantly reduce enthusiasm for the authors' findings regarding autophagy. There are labeling errors between the manuscript and the figures, which suggest a lack of vigilance in the drafting process.

      The R295H variant was identified in a dRTA patient and as such, it was important to report it. However, this is the first mutation located in the amino-terminus of the protein, which may be involved in protein-protein interactions, so other mechanisms may cause dRTA for this variant. We have therefore modified our manuscript to state that alteration of cytosolic pH may not be the only mechanism leading to dRTA. At this time, we are not able to measure cytosolic pH in vivo and hope to be able to do it in the future.

      In our revised manuscript, we also show cell surface biotinylation results supporting that plasma membrane abundance of the kAE1 S525F and R589H variants is not significantly different than WT in non-polarized mIMCD3 cells (Figure 3 A&B), in line with the predominant basolateral localization of the variants in polarized cells (Figure 1C). Therefore, these two mutant proteins are not mis-trafficked in non-polarized cells.  Finally, we have clarified which bands have been used for quantification and corrected quantifications (including ratio measurements).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) R295H is recessively inherited, whereas Y413H is dominantly inherited: this is interesting and may be linked to their cellular expression and function. Is this information known for the other mutations examined in this study? 

      The S25F and R589H dRTA variants have both been reported to exhibit autosomal dominant inheritance. This information is now updated in lines 146 and 158-159.

      (2) R589H expression levels are evaluated in the Western blot of Figure 1, but localization and activity are not examined in Figure 2. However, R589H is included in autophagy experiments shown in later figures. Similarly, mutant R607H is the subject of several experiments further into the manuscript, but no initial analysis is provided for this variant. 

      Protein abundance and localization of the R589H mutant in mIMCD3 cells have been shown in our previous publication in Supplementary Fig 5D and Supplementary Fig 2J [1]. This now indicated on lines 158-159. Our previous paper also presented a detailed study of the R607H dRTA mutant, the mouse model corresponding to the human R589H mutation. This is now indicated on lines 70, 118-119 and 180. The present study builds upon those published findings.

      (3) This inconsistency is confusing, detracts from the usefulness of the study, and makes the comparative analysis of mutations incomplete. It is difficult to extrapolate from published studies in MDCK1 cells, which show different results on trafficking. 

      The mIMCD3 cell line, which more closely resembles the physiology of the mouse collecting duct than MDCK cells, was selected for this study and our previous one [1]. Accordingly, the results obtained are better aligned with in vivo evidence. In contrast, differences in mutant protein expression and localization observed in other cell lines, like the MDCK cells, are likely attributable to differences in their cellular origin. 

      (4) In Figure 2, could the authors explain why total LC3B is graphed for the data shown in mouse lysates, whereas the ratio of bands is analysed for cell lysates? Both sets of data show the two LC3B bands.

      Total LC3B levels were significantly increased in the mutant compared to WT; however, no significant difference was observed in the lipidation ratio. For this reason, that graph is not shown in the main paper but has been included in the Supplementary Figure 1D. 

      (5) In Figure 3, representative fluorescence images should be shown for all cell lines.

      We have now included representative immunofluorescence images for all cell lines in Figure 3C.

      (6) pH effects: Suggest that steady state pHi (Figure 3E) and rate of alkalization (Figure 1F) would be more effective together in Figure 1. The authors should show data for the effect of nigericin on cytoplasmic pH in Figure 3. If the rate of alkalinization in the mutant cells is reduced, shouldn't the intracellular steady state pH be more acidic? A cartoon depicting the transporter activity in the cell and the expected changes in pHi would be helpful. Is there a way to activate/inhibit NHE1 and rescue the effect of the mutant kAE1? It is unclear if the link between the mutant kAE1 and mitochondrial ATP production is a consequence of the intracellular pH or an indirect effect.

      We opted to keep the effect of nigericin on pHi in Supplementary Fig1A given that Figure 3 already contains 11 panels. Also, in intercalated cells, the kAE1 protein physiologically exports 1 molecule of bicarbonate in exchange of 1 chloride ion import hence a reduced transport activity would result in a more alkaline intracellular pH. To clarify this point, we have included a diagram in Figure 1E as suggested. However, to calculate the rate of intracellular alkalinisation, the transporter is functioning in the opposite direction, i.e. extruding chloride and importing bicarbonate (see methods protocol for transport assay). Therefore, in this assay (Figure 1G), a defective chloride/bicarbonate activity results in a reduced rate of intracellular alkalinisation rate. This is now explained on lines 169-172.

      Disruption of NHE1 function would impair sodium homeostasis and as such, potentially affect the activity of other proteins associated with acid-base balance and autophagy in collecting duct cells. Therefore, any resulting effects may not be confidently attributed specifically to the mutant kAE1. With nigericin, we aimed to alter pHi while affecting the least possible other ion concentration. Due to space considerations, Figure 1 has been reorganised to include the rate of alkalinisation and pHi (panels F and G). 

      Reviewer #2 (Recommendations for the authors):

      (1) The authors could improve the readability of this manuscript for a general audience by clarifying and summarizing the respective phenotype(s)/effect(s) of the different mutants in some kind of table in the main figures. It is hard to keep track of the different disease mutants alongside the KI mouse mutations, as the text frequently discusses multiple mutants at a time. 

      As requested, we added two tables (Supplementary Tables 1 & 2) in Supplementary files summarizing the data obtained in this study. We hope this will help the readership to keep track of each variant’s phenotype.

      (2) The subtitle of the results section of Figure 2 should be reworded to reflect that  whole kidney lysates are used for the KI mice and not the other mutants.

      As requested, the title in the Results section has been modified (lines 178-179).

      (3) More discussion of why the different mutants cause different strengths of phenotypes should be included.

      Different variants induce different degree of functional defects as seen in Figure 1F & G. The kAE1 R295H, the only amino acid substitution in the amino-terminal cytosol causing dRTA, does not affect the transporter’s function or cells’ pHi. Therefore, this variant may cause dRTA via a different pathway than transport-defective S525F or partially inactive R589H variants that both affect pHi. Our study does not exclude that dRTA may be caused by other defects than pHi alterations, including defective proteinprotein interactions. This discussion is now included in the manuscript on lines 386-391.

      Reviewer #3 (Recommendations for the authors):

      In general, I found the subject matter of this manuscript interesting and of value to the scientific community. The interpretation of the data and how much it supports the conclusion that "kAE1 variants increases pHi which alters mitochondrial function and leads to reduced cellular energy levels that eventually attenuate energy-dependent autophagic pathways" is largely incomplete. There are significant concerns about the quantification of Western blot data. Additionally, including the R607H variant in the in vitro experiments would improve the interpretation and extrapolation of in vitro data to the kidney.

      We apologize for the confusion with R589H and R607H variants. The R607H mutant is the murine ortholog to the human R589H dRTA variation. To clarify this, we have added this information on line 180, in addition to lines 118-119 and line 70.

      Suggestions:

      (1) Can an anion replacement experiment be performed in the mIMCD cells (no Cl or no HCO3) to determine that bicarbonate transport through AE1 is responsible for the reduced ATP rates in Figure 5? Inclusion of WT +dox control would be helpful to convince the reader of the effects.

      Because Seahorse real-time cell metabolism ATP rates measurements require specific and patented buffers with un-specified compositions, it was not possible to modify the Cl⁻ or HCO₃⁻ content during the ATP measurement assay. All cell lines, including empty vector cells (EV) were treated with doxycycline; thus, WT + dox was already included. The empty vector cell line treated with doxycycline allowed the exclusion of specific effects of doxycycline on mitochondrial activity as a control. This is now clarified in Figure 5 legend, lines 655-656.

      (2) Can the authors measure pHi in fresh kidney sections from the R607H mouse?

      Unfortunately, we are not currently able to measure pHi in fresh kidney sections and although we recognize it would benefit greatly to our study, establishing a new collaboration to perform this measurement would significantly delay the publication of this work; therefore, these results will not be available for the present manuscript. 

      (3) Does pH 7.0 media have any effect on autophagy, as shown in Figure 3? Why was pH 6.6 selected?

      The idea was to artificially acidify pHi in mutant cell lines (that have a steady state alkaline pHi) and assess whether this acidification corrects autophagy defects. We first determined that incubation in cell culture medium at pH 6.6 with 0.033 µM nigericin (final potassium concentration: 168 mM) for 2 hours provided optimal conditions, i.e. ensuring cell viability over the 2-hour period while effectively lowering intracellular pH to 6.9, as demonstrated in Supplementary Figure 1A-C.

      (4) In vitro experiments should be performed on polarized cells with kAE1 properly inserted in the basolateral membrane. Experiments on subconfluent, non-polarized cells do not support the hypothesis that transport functions of AE1 initiate the cascade of events attributed to these SLC4A1 mutations.

      To address this point, we have performed cell surface biotinylations on 70-80 % confluent mIMCD3 cells expressing kAE1 WT, S525F or R589H mutants and show that cell surface abundance of the mutants is not significantly different from the WT protein. This is now shown in Figure 3 A&B. As cell surface biotinylation provides a more quantitative assessment of protein cell surface abundance, we have removed the immunofluorescence images from non-polarised cells and replaced them with representative immunoblots from a cell surface biotinylation assay.

      Concerns:

      (1) No information about the B1 ATPase antibody used.

      Now provided in Supplementary Material, ATP6V1B1 Antibody from Bicell cat#20901.

      (2) No actin band in Figure 1E (as prepared).

      Actin bands are provided for each blot in Figure 1D.

      (3) Figures 1E and 1F are labelled wrong in the figure versus the results section. 

      Thank you for letting us know, this is now corrected.

      (4) The cortical sections shown in Figure 4 for the KI/KI do not appear to have the morphology of a CCD. The authors may want to consider including glomeruli to convince the reader of the localization of the tubules. Same concern with Figure 5G and I. The WT image in 5G does not have the morphology of a CCD. Principal cells should be predominant, and ICs should be dispersed.

      Both figures 4 and 5 have been updated with images showing glomeruli (light blue “G” on figure) with neighbour and dispersed IC staining.

      (5) The quantification of LAMP1 in Figure 4 is unclear. How did the authors determine the boundary of AICs, and how did they calculate the volume of lysosomes? If a zstack was used, how are the authors sure that their 10um section includes the entire AIC?

      The quantification of LAMP1 is detailed under “Image analysis”, then “Volocity” sections in Supplementary Material. The boundary of A-IC was manually detected in Volocity based on the presence of the H<sup>+</sup>-ATPase before Volocity analysis for lysosomal volume as described in the Methods.

      The 10 micron sections are expected to include full AIC as well as partial AIC, but the frequency of these events should be the same between WT and variants’ sections, therefore they were all included in the analysis if cells displayed H<sup>+</sup>-ATPase signal. 

      (6) Figure 5: There is no description of how ATP rates are calculated from the provided traces.

      We used Agilent Seahorse XF ATP rate assay kit for this experiment. In this assay, the total ATP rate is the sum of ATP production rate from both glycolysis and oxidative phosphorylation. Glycolysis releases protons in a 1:1 ratio with ATP hence the glycolytic ATP rate is calculated from the glycolytic proton efflux rate (glycoPER). GlycoPER is determined by subtracting respiration linked proton efflux from total proton efflux by inhibiting complex I and III. This information is now added to Supplementary Material, in the “Metabolic Flux analysis” section.

      (7) Figure labels in Figure 5 are wrong. It seems 5H (as presented) should actually be labeled 5G. In 5H (G?), why did some cells not have any TOM20 pixel intensity for S525F and R589H variants?

      Confocal image acquisition in this experiment was kept under the same settings to allow comparison between samples. Therefore, some cells show dimer fluorescence than others. From the figure 5 panels, all cells showed TOM 20 pixel intensity. Figure 5H panel has been relabelled Figure 5G.

      (8) In Figure 2, the summary graphs show analysis of more samples than are visible on the included western blots. What is the rationale for this? Why does S525F have 9 samples in BafA1 while R295H only has 3 (2H)? Yet, R295H has 6 samples in 2I. In 2D, S525F has at least 9 samples. Explain.

      Figure 2A-C shows representative immunoblots, among several ones independently conducted. Therefore, the final number of samples is higher than showed on Figure 2. This is now indicated in Figure 2 legend, line 603. It became clear quite early in our study that the recessive kAE1 R295H variant does not behave similarly to the other variants studied, maybe because it affects the cytosolic domain, so we did not perform as many replicates for this variant as we did for the others. However, we felt it was valuable to the research community to report the characterization of this variant and decided to keep it in our study. 

      (9) In general, the actin loading does not appear to be equal between samples. And some figures show the same actin blot twice (2A, C) while some show independent actin bands for LC3B and p62. Equal loading seems a fairly significant control, considering the importance of quantification in the figures.

      In addition to performing protein assays, we systematically conduct immunoblot with anti-b-actin antibody to control for loading variability. When possible, two or three proteins, including actin, are detected on the same blot, when molecular weight differ enough. This sometimes results in b-actin being used as a loading control for two different proteins, as seen on Figure 2A and 2C. This is now indicated on lines 605606.

      (10) In the Supplemental Figure 2, which band is being quantified for mature CTSD at 33kDa? Same for intermediate CTSD. The quantification of V-ATPase seems questionable based on the actin variance shown in the blot. Surely the ratio of the fourth sample is greater than 1.

      Supplementary Figure 2 has been updated to include arrows indicating which band was selected for the quantification. After verifying the measurements of band intensities from “Image Lab” quantification software, we confirm the results, including that fourth KI/KI sample has a ratio of 0.78 (Adj Total Band Vol (Int), lanes 10). Screen shots of quantifications are attached below.

      Author response image 1.

      Author response image 2.

      (11) Why are the experiments performed on non-confluent IMCD cells? Figure 1D shows good basolateral localization of AE1, yet the other experiments in the manuscript appear to use IMCD cells in low confluent states, without proper localization of AE1. Figure 3A shows AE1 dispersed throughout the cytoplasm. Why have the authors decided to study the effects of an anion exchanger without it being properly localized to the basolateral membrane? Shouldn't all experiments be performed in polarized IMCDs? If AE1 isnt properly in the membrane, and the cells do not have defined apico-basolateral polarity, then what role can AE1-mediated intracellular pH change have on the results of the experiments? Were the pHi experiments in 3E performed on polarized cells? Or even 1F?

      To address this point, we have performed cell surface biotinylations on 70-80 % confluent mIMCD3 cells expressing kAE1 WT, S525F or R589H mutants and show that cell surface abundance of the mutants is not significantly different from the WT protein. This is now shown in Figure 3A & B. As it provides a more quantitative assessment of protein cell surface abundance, we have removed the immunofluorescence images from non-polarised cells and replaced them with a representative immunoblot from a cell surface biotinylation assay.

      (12) As mentioned in the public comments, how is the ratio A/(A+B) greater than 1? With A and B > 0. In Figure 3, the data is reasonable, but in Figure 2, the data is simply impossible. What is the explanation for this phenomenon? Why was this presentation of data approved? Is it supposedly a fold of WT, like 2K and 2L? Is the reader also to believe that total LC3B is 2-fold greater in KI/KI mice, as shown in 2K? My eyes, though not densitometry equipment, cannot confirm this. The actin bands are not equal. Yet again, there are 4 lanes of KI/KI mice, but the quantification shows 5 samples.

      The ratios in figure 2D, 2F, 2H and 2L have been re-calculated and corrected. As indicated above, immunoblots are representative and quantification of additional blots has been included in the graphs.

      (12) Spelling error Figure 4B: cels.

      Corrected

      References 

      (1) Mumtaz, R. et al. Intercalated Cell Depletion and Vacuolar H+-ATPase Mistargeting in an Ae1 R607H Knockin Model. Journal of the American Society of Nephrology 28, 1507–1520 (2017).

    1. eLife Assessment

      This important study reports convincing evidence of associations between 35 polygenic indices (PGIs) for social, behavioural, and psychological traits, as well as other health conditions (e.g., BMI) and all-cause mortality, based on data from Finnish population-based surveys and a twin cohort linked to administrative registers. PGIs for education, depression, alcohol use, smoking, BMI, and self-rated health showed the strongest associations with all-cause mortality, in the order of ~10% increment in risk per PGI standard deviation. Effect sizes from twin-difference analyses tended to be slightly larger than those from population cohorts, a pattern opposite that generally observed when testing PGI associations with their target phenotypes, and supporting the robustness of findings to confounding by population stratification.

    2. Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype.

      Comments on revised version:

      The authors answered my concerns well. I don't have any further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Comments on revised version:

      I am happy with the revision. No further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype. I have two main comments:

      (1) The authors should clarify whether the p-value reported in the text will remain significant after multiple testing adjustment. Some of the large effects might be significant; for example, Figure 2C

      We have now added Benjamini-Hochberg multiple-testing adjusted p-values in the text each time we present nominal p-values. Additionally, supplementary tables S5 and S6 provide multiple-adjusted p-values for all analysed PGIs.

      Although this was not always the case, many comparisons remained significant after multiple testing adjustments, especially in Figure 2C that the reviewer commented on. In the revised version, we have placed more emphasis on describing these HRs that have low p-values after multiple-test adjustment. The revised text for Figure 2C in the Results section now reads:

      Panel C analyses mortality in three age-specific follow-up periods. The PGIs were more predictive of death in younger age groups, although the difference between the 25–64 and 65–79 age groups was small, except for the PGI of ADHD (HR=1.14, 95% CI 1.08; 1.21 for 25–64-year-olds; HR=1.04, 95% CI 1.00; 1.08 for 65–79-year-olds; p=0.008 for difference, p=0.27 after multiple-testing adjustment). PGIs predicted death only negligibly among those aged 80+, and the largest differences between the age groups 25–64 and 80+ were for PGIs of self-rated health (HR 0.87, 95% CI 0.82; 0.93 for 25–64-year-olds, HR 1.00, 95% CI 0.94; 1.04 for 80+ year-olds, p=2*10<sup>-4</sup> for difference, p=0.006 after multiple-testing adjustment), ADHD (HR 1.14, 95% CI 1.08; 1.21 for 25–64-year-olds, HR 0.99, 95% CI 0.95; 1.03 for 80+ year-olds, p=7*10<sup>-4</sup> for difference, p=0.012 after multiple-testing adjustment) and depressive symptoms (HR 1.12, 95% CI 1.06; 1.18 for 25–64-year-olds, HR 1.00, 95% CI 0.96; 1.04 for 80+ year-olds, p=0.002 for difference, p=0.032 after multiple-testing adjustment). Additionally, the difference in HRs between these age groups achieved significance after multiple testing adjustment at the conventional 5% level for PGIs of cigarettes per day, educational attainment, and ever smoking.

      We have also included the recent study by Argentieri et al. (2025) in the literature review, which was missing from our previous version. We appreciate the reference. Other references mentioned were already included in the previous version of the manuscript.

      (note that the small prediction accuracy of PGI in older age groups has been extensively studied, see Jiang, Holmes, and McVean, 2021, PLoS Genetics).

      We would like to thank the reviewer for suggesting the relevant reference by Jiang et al. We have now expanded on the discussion of age-specific differences in the discussion section and included this reference.

      (2) The authors might check if PGI+Phenotype has improved performance over Phenotype only. This is similar to Model 2 in Table 1, but slightly different.

      The reviewer raises an interesting angle to approach the analysis. We have now added an analysis assessing the information criteria and the significance of improvement between nested models in Supplementary table S8. All the tested PGI+phenotype models show improvement over the phenotype-only model that is statistically significant at all conventional levels when tested by likelihood-ratio tests between nested models . Additionally,  improvement was found when using Akaike and Bayesian (Schwarz) information criteria (albeit sometimes modest in size). We have added a passage in the results section briefly summarising this analysis:

      Supplementary table S8 presents information criteria and significance tests on corresponding models. Models with PGI+phenotype (Models 2a–f) showed improvement over models with the phenotype only (Models 1a, 1c, 1e, 1g, 1i, 1k, with a p=0.0006 or lower) in terms of both Akaike information criterion (AIC) as well as Bayesian (Schwarz) information criterion (BIC) with a p=0.0006 or lower in all comparisons. The full Model 4 again showed improvement over the model with all PGIs jointly (Model 3b, with a p=0.0002 or p=0.00002, depending on continuous/categorical phenotype measurement), which had a lower AIC but not BIC.

      Reviewer #2 (Public review): 

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Weaknesses:

      It is unclear whether the PGIs used for each trait represent the most current or optimal versions based on the latest GWAS data.

      To our reading, this comment is closely related to the “recommendations for the author” number 3 by reviewer 2, and we thus address them together. 

      If the Finnish data used in this study also contributed to the development of some of the PGIs, there is a risk of overestimating their associations with mortality due to overfitting or "double-dipping." Similar inflation of effect sizes has been observed in studies using the UK Biobank, which is widely used for PGI construction.

      To our reading, this comment is closely related to the “recommendations for the author” 4 by reviewer 2, and we thus address them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) Cited reference 1 also investigated the PRS association with life span; cited reference 8 explains PRS association with healthy lifespan. Can authors be clearer about what is new in the context of these references? Specifically, what are the PGIs studied here that were not analyzed in the cited analyses?

      Although some previous studies on the topic do exist, our analysis arguably has novelty in touching upon several unstudied or scarcely studied themes. These include:

      A set of PGIs focusing on social, psychological, and behavioural phenotypes or PGIs for typically non-fatal health conditions.

      An assessment of direct genetic effects/ confounding with a within-sibship design.

      An assessment of potential heterogeneous effects by several socio-demographic characteristics.

      An analysis of external causes of deaths (which can be hypothesised to be particularly relevant here, given the choice of our PGIs not focusing directly on typical causes of death).

      A detailed assessment of the interplay of the most predictive PGIs with their corresponding phenotypes.

      We have substantially revised the Introduction section focusing on making these novel contributions more explicit.

      (2) In the Methods section, it is not very clear why the authors specifically study the "within-sibship" samples. Is this for avoiding nurturing effects from parental genotypes or for controlling assortative mating? The authors should clarify the rationale behind the design.

      The substance-related rationale behind this approach was briefly discussed in the Introduction section while in the Methods section, we focused more on the technical description of our analyses. However, it is certainly worthwhile to clarify to the reader why within-sibship methods have been used. The revised passage in the methods section now states:

      “In addition to this population sample, we used a within-sibship analysis sample to assess the extent of direct and indirect genetic associations captured by the PGIs, as discussed in the introduction.”

      (3) Residual correlations of PGIs were no more than 0.050..." As a minor comment, since PGIs is a noisy variable, the correlation would be low; however, I don't think there are better ways to evaluate Cox assumptions, and in many cases, this assumption is not correct for strong predictors.

      Yes, these points are true. Overall, it is often implausible that empirical distributions exactly match distributional assumptions in statistical models. For example, it may not be realistic to expect that the mortality hazards across categories of independent variables stay exactly proportional during long mortality-follow-ups; some deviations from constant proportions are almost inevitable. However, there are reasonable grounds to argue that in case of moderate violations of the proportional hazards assumption, the estimates still remain interpretable for practical uses. They can be read as approximating average relative hazards over the study period (for discussion, see pages 42–47 in Allison P. 2014. Event history and survival analysis: Regression for longitudinal event data (second edition). Thousand Oaks: SAGE).

      (4) "PGI of ADHD (HR=1.08 95%CI 1.04;1.11 among men; HR=1.01 95%CI 0.97;1.05 among women; p=0.012 for difference)." Is this difference significant after multiple testing correction?

      We have presented multiple-testing adjusted p-values together with nominal ones in this and in all other instances where they are mentioned in the text. Additionally, Supplementary tables S5–S6 present multiple-adjusted p-values for each PGIs studied.

      (5) "Panel D displays that most PGIs had stronger associations with external (accidents, violent, suicide, and alcohol related deaths) than natural causes of death." Similar to the comment above, are there any results that are significantly different between internal and external?

      We have added the p-values of those variables that had larger differences in the revised text. Quoting from the revised article: “The HR differences between external and natural causes of death were nominally significant at the conventional 5% level for cannabis use (p=0.016), drinks per week (p=0.028), left out of social activity (p=0.029), ADHD (p=0.031), BMI (p=0.035) and height (p=0.049), but none of these differences remained significant after adjusting for 35 multiple tests. “

      (6) Table 1: The effect of the phenotype is stronger than the PGI; this is expected as PGI is a weak predictor and can be considered as "noised" measurement of true genetic value (Becker 2021 Nature Human behavior). Is there a way to adjust for the impact of noise in PGI at tagging genetic value and compare if the PGI effect is different from the phenotype effect?

      PGIs are certainly imperfect measures that contain a lot of noise. However, extracting new information from what is unknown is an extremely demanding exercise, and still further complicated for example, by that we do not know the exact benchmark of total genetic effect we should be aiming at. Different methods of heritability estimation, for instance, often give dramatically differing results – for reasons that are still up to scrutiny.

      We are thus not familiar with a method that could achieve satisfactory answer for this challenging task.

      Reviewer #2 (Recommendations for the authors):

      (3) Justification and Selection of PGIs:

      For several traits, such as BMI, multiple polygenic indices (PGIs) are currently available. The criteria used to select specific PGIs for this study are not clearly described. A more systematic and reproducible approach-for example, leveraging metadata from the PGS Catalog-could strengthen the justification for PGI selection and enhance the study's generalizability.

      There are numerous PGIs developed in the extensive GWAS literature, but a finite set of PGIs always needs to be chosen for any analysis. The rationale behind our decision to include every PGI from the repository of Becker et al. 2021 (full reference in the manuscript, see also https://www.thessgac.org/pgi-repository) that was available for the Finnish data (including the possibility to exclude overlapping samples, see our response to the next comment for more discussion) was to provide rigorous analysis by limiting the researchers degrees of freedom in arbitrarily choosing PGIs. Although it would have been tempting to not use some PGIs that were not expected to substantially correlate with mortality, we believe that our conservative strategy increases the credibility of the reported p-values, particularly the multiple adjustment should now work as intended. 

      We also mention now this rationale when discussing the chosen PGIs in the methods section: “As the independent variables of main interest, we used 35 different PGIs in the Polygenic Index repository by Becker et al., which were mainly based on GWASes using UK Biobank and 23andMe, Inc. data samples, but also other data collections. They were tailored for the Finnish data, i.e., excluding overlapping individuals between the original GWAS and our analysis and performing linkage-disequilibrium adjustment. We used every single-trait PGI defined in the repository (except for subjective well-being, for which we were unable to obtain a meta-analysis version that excluded the overlapping samples). By limiting the researchers’ freedom in selecting the measures, this conservative strategy should increase the validity of our estimates, particularly with regards to multiple-testing adjusted p-values.”

      (4) Overlap Between PGI Training Data and Study Sample:

      The authors should describe any overlap between the data used to develop the PGIs and the current study sample. If such overlap exists, it may lead to overestimation of effect sizes due to "double-dipping." A discussion of this issue and its potential implications is warranted, as similar concerns have been raised in studies using UK Biobank data.

      This is, fortunately, not a concern of our analysis. Overlapping samples were excluded in creating the PGIs that we used. We have now described this more clearly in the revised methods section.

      (1) Clarify the Methodology for Family-Based Cox Analysis:

      It is unclear what specific method was used to perform Cox regression in the family-based analysis. Please provide additional methodological details. ”

      We have described the method further and added an additional reference in the revision. The text now stands:

      “We compared these models to the corresponding within-sibship models, using the sibship identifier as the strata variable. This method employs a sibship-specific (instead of a whole-sample-wide baseline hazard in the population models) baseline hazard, and corresponds to a fixed-effects model in some other regression frameworks (e.g., linear model with sibship-specific intercepts)”

      (2) Clarify Timing of Measured Risk Factors Relative to Follow-Up:

      The main text should provide more detailed information regarding the timing of data collection for directly measured risk factors. Specifically, it should be clarified whether the measurements used correspond to the first available data for each individual after the start of follow-up, or if a different criterion was applied.

      BMI, self-rated health, alcohol consumption and smoking status were measured at the baseline survey of each dataset. Education was registered as the highest completed degree up to the end of 2019. Depression was a composite of survey self-report (at the time of the baseline survey), as well as depression-related medicine purchases and hospitalizations over a two-year period before the start of the individual’s follow-up.

      We have added more comprehensive information on the measurement of the phenotypes of interest in Supplementary table 2, including the timing of the measurement.

    1. eLife Assessment

      This work significantly advances our understanding of chromatin organization within regions of repetitive sequences in the parasitic protozoan Trypanosoma brucei. Using cutting edge interdisciplinary tools, the authors provide compelling evidence for two discrete types of repetitive DNA element-associated proteins- one set involved in essential centromere function; and, the other involved in glycoprotein antigenic variation via homologous recombination. Thus, these fundamental findings have implications for this parasite's biology, and for therapeutic targeting in kinetoplastid diseases. This work will be exciting to those in the centromere/mitosis and parasite immunity fields.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeat-containing intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Strengths and Weaknesses:

      The manuscript was previously reviewed through Review Commons. As noted there, the experiments are well controlled, the claims are well supported, and the methods are clearly described. The conclusions are convincing. All concerns I raised have been addressed except one (minor point #8):

      "The way the authors mapped the ChIP-seq data is potentially problematic when analyzing the same repeat type in different genomic regions. Reads with multiple equally good mapping positions were assigned randomly. This is fine when analyzing repeats by type, independent of genomic position, which is what the authors do to reach their main conclusions. However, several figures (Fig. 3B, Fig. 4B, Fig. 5B, Fig. 7) show the same repeat type at specific genomic locations." Due to the random assignment, all of these regions merely show the average signal for the given repeat. I find it misleading that this average is plotted out at "specific" genomic regions.<br /> Initially, I suggested a workaround, but the authors clarified why the workaround was not feasible, and their explanation is reasonable to me. That said, the figures still show a signal at positions where they can't be sure it actually exists. If this cannot be corrected analytically, it should at least be noted in the figure legends, Results, or Discussion.

      Importantly, the authors' conclusions do not hinge on this point; they are appropriately cautious, and their interpretations remain valid regardless.

      Significance:

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with mini-chromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Comments on revised version:

      All my recommendations have been addressed.

    3. Reviewer #2 (Public review):

      The Trypanosoma brucei genome, like that of other eukaryotes, contains diverse repetitive elements. Yet, the chromatin-associated proteome of these regions remains largely unexplored. This study represents a very important conceptual and technical advancement by employing synthetic TALE DNA-binding proteins fused to YFP to selectively capture proteins associated with specific repetitive sequences in T. brucei chromatin. The data presented here are convincing, supported by appropriate controls and a well-validated methodology, aligned with current state-of-the-art approaches.

      The authors used synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in T. brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to identify specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and a protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosomes.

      This study represents a significant conceptual and technical advancement. To the best of our knowledge, it is the first report of employing TALE-YFP for affinity-based detection of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding the organization in these important regions of the trypanosomal chromatin and provides the foundation for investigating the functional roles of associated proteins in parasite biology. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as to scientists investigating the roles of repetitive genomic elements in chromatin structure and their functional role in higher eukaryotes.

      Importantly, any essential or unique interacting partners identified using the approach employed here, could serve as a potential target for therapeutic intervention in severe tropical diseases cause by kinetoplastids.

    4. Author response:

      Point-by-point description of the revisions:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      In this article, the authors used the synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in Trypanosoma brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to detect and identified, using YFP-pulldown, specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and the protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosome.

      Major comments:

      Are the key conclusions convincing?

      The authors reported that they have successfully used TALE-based affinity selection of proteinassociated with repetitive sequences in the T. brucei genome. They claimed that this study has provided new information regarding the relevance of the repetitive region in the genome to chromosome integrity, telomere biology, chromosomal segregation and immune evasion strategies. These conclusions are based on high-quality research, and it is, basically, merits publication, provided that some major concerns, raised below, will be addressed before acceptance for publication.

      (1) The authors used TALE-YFP approach to examine the proteome associated with five different repetitive regions of the T. brucei genome and confirmed the binding of TALE-YFP with Chip-seq analyses. Ultimately, they got the list of proteins that bound to synthetic proteins, by affinity purification and LS-MS analysis and concluded that these proteins bind to different repetitive regions of the genome. There are two control proteins, one is TRF-YFP and the other KKT2-YFP, used to confirm the interactions. However, there are no experiment that confirms that the analysis gives some insight into the role of any putative or new protein in telomere biology, VSG gene regulation or chromosomal segregation. The proteins, which have already been reported by other studies, are mentioned. Although the author discovered many proteins in these repetitive regions, their role is yet unknown. It is recommended to take one or more of the new putative proteins from the repetitive elements and show whether or not they (1) bind directly to the specific repetitive sequence (e.g., by EMSA); (2) it is recommended that the authors will knockdown of one or a small sample of the new discovered proteins, which may shed light on their function at the repetitive region, as a proof of concept.

      The main request from Referee 1 is for individual evaluation of protein-DNA interaction for a few candidates identified in our TALE-YFP affinity purifications, particularly using EMSA to identify binding to the DNA repeats used for the TALE selection. In our opinion, such an approach would not actually provide the validation anticipated by the reviewer. The power of TALE-YFP affinity selection is that it enriches for protein complexes that associate with the chromatin that coats the target DNA repetitive elements rather than only identifying individual proteins or components of a complex that directly bind to DNA assembled in chromatin.

      The referee suggests we express recombinant proteins and perform EMSA for selected candidates, but many of the identified proteins are unlikely to directly bind to DNA – they are more likely to associate with a combination of features present in DNA and/or chromatin (e.g. specific histone variants or histone post-translational modifications). Of course, a positive result would provide some validation but only IF the tested protein can bind DNA in isolation – thus, a negative result would be uninformative.

      In fact, our finding that KKT proteins are enriched using the 177R-TALE (minichromosome repeat sequence) identifies components of the trypanosome kinetochore known (KKT2) or predicted (KKT3) to directly bind DNA (Marciano et al., 2021; PMID: 34081090), and likewise the TelR-TALE identifies the TRF component that is known to directly associate with telomeric (TTAGGG)n repeats (Reis et al 2018; PMID: 29385523). This provides reassurance on the specificity of the selection, as does the lack of cross selectivity between different TALEs used (see later point 3 below). The enrichment of the respective DNA repeats quantitated in Figure 2B (originally Figure S1) also provides strong evidence for TALE selectivity.

      It is very likely that most of the components enriched on the repetitive elements targeted by our TALE-YFP proteins do not bind repetitive DNA directly. The TRF telomere binding protein is an exception – but it is the only obvious DNA binding protein amongst the many proteins identified as being enriched in our TelR-TALE-YFP and TRF-YFP affinity selections.

      The referee also suggests that follow up experiments using knockdown of the identified proteins found to be enriched on repetitive DNA elements would be informative. In our opinion, this manuscript presents the development of a new methodology previously not applied to trypanosomes, and referee 2 highlights the value of this methodological development which will be relevant for a large community of kinetoplastid researchers. In-depth follow-up analyses would be beyond the scope of this current study but of course will be pursued in future. To be meaningful such knockdown analyses would need to be comprehensive in terms of their phenotypic characterisation (e.g. quantitative effects on chromosome biology and cell cycle progression, rates and mechanism of recombination underlying antigenic variation, etc) – simple RNAi knockdowns would provide information on fitness but little more. This information is already publicly available from genome-wide RNAi screens (www.tritrypDB.org), with further information on protein location available from the genome-wide protein localisation resource (Tryptag.org). Hence basic information is available on all targets selected by the TALEs after RNAi knock down but in-depth follow-up functional analysis of several proteins would require specific targeted assays beyond the scope of this study.

      (2) NonR-TALE-YFP does not have a binding site in the genome, but YFP protein should still be expressed by T. brucei clones with NLS. The authors have to explain why there is no signal detected in the nucleus, while a prominent signal was detected near kDNA (see Fig.2). Why is the expression of YFP in NonR-TALE almost not shown compared to other TALE clones?

      The NonR-TALE-YFP immunolocalisation signal indeed is apparently located close to the kDNA and away from the nucleus. We are not sure why this is so, but the construct is sequence validated and correct. However, we note that artefactual localisation of proteins fused to a globular eGFP tag, compared to a short linear epitope V5 tag, near to the kinetoplast has been previously reported (Pyrih et al, 2023; PMID: 37669165).

      The expression of NonR-TALE-YFP is shown in Supplementary Fig. S2 in comparison to other TALE proteins. Although it is evident that NonR-TALE-YFP is expressed at lower levels than other TALEs (the different TALEs have different expression levels), it is likely that in each case the TALE proteins would be in relative excess.

      It is possible that the absence of a target sequence for the NonR-TALE-YFP in the nucleus affects its stability and cellular location. Understanding these differences is tangential to the aim of this study.

      However, importantly, NonR-TALE-YFP is not the only control for used for specificity in our affinity purifications. Instead, the lack of cross-selection of the same proteins by different TALEs (e.g. TelR-TALE-YFP, 177R-TALE-YFP) and the lack of enrichment of any proteins of interest by the well expressed ingiR-TALE-YFP or 147R-TALE-YFP proteins each provide strong evidence for the specificity of the selection using TALEs, as does the enrichment of similar protein sets following affinity purification of the TelR-TALE-YFP and TRF-YFP proteins which both bind telomeric (TTAGGG)n repeats. Moreover, control affinity purifications to assess background were performed using cells that completely lack an expressed YFP protein which further support specificity (Figure 6).

      We have added text to highlight these important points in the revised manuscript:

      Page 8:

      “However, the expression level of NonR-TALE-YFP was lower than other TALE-YFP proteins; this may relate to the lack of DNA binding sites for NonR-TALE-YFP in the nucleus.”

      Page 8:

      “NonR-TALE-YFP displayed a diffuse nuclear and cytoplasmic signal; unexpectedly the cytoplasmic signal appeared to be in the vicinity the kDNA of the kinetoplast (mitochrondria). We note that artefactual localisation of some proteins fused to an eGFP tag has previously been observed in T. brucei (Pyrih et al, 2023).”

      Page 10:

      Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4). Thus, the most enriched proteins are specific to TelR-TALE-YFP-associated chromatin rather than to the TALE-YFP synthetic protein module or other chromatin.

      (3) As a proof of concept, the author showed that the TALE method determined the same interacting partners enrichment in TelR-TALE as compared to TRF-YFP. And they show the same interacting partners for other TALE proteins, whether compared with WT cells or with the NonR-TALE parasites. It may be because NonR-TALE parasites have almost no (or very little) YFP expression (see Fig. S3) as compared to other TALE clones and the TRF-YFP clone. To address this concern, there should be a control included, with proper YFP expression.

      See response to point 2, but we reiterate that the ingi-TALE -YFP and 147R-TALE-YFP proteins are well expressed (western original Fig. S3 now Fig. S2) but few proteins are detected as being enriched or correspond to those enriched in TelR-TALE-YFP or TRF-YFP affinity purifications (see Fig. S9). Therefore, the ingi-TALE -YFP and 147R-TALE-YFP proteins provide good additional negative controls for specificity as requested. To further reassure the referee we have also included additional volcano plots which compare TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP to the ingiR-TALE-YFP affinity selection (new Figure S8). As with No-YFP or NonR-TALE-YFP controls, the use of ingiR-TALE-YFP as a negative control demonstrates that known telomere associated proteins are enriched in TelR-TALE-YFP affinity purification, RPA subunits enriched with 70R-TALE-YFP and Kinetochore KKT poroteins enriched with 177RTALE-YFP. These analyses demonstrate specificity in the proteins enriched following affinity purification of our different TALE-YFPs and provide support to strengthen our original findings.

      We now refer to use of No-YFP, NonR-TALE-YFP, and ingiR-TALE -YFP as controls for comparison to TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP in several places:

      Page10:

      “Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4).”

      Page 11:

      “Thus, the nuclear ingiR-TALE-YFP provides an additional chromatin-associated negative control for affinity purifications with the TelR-TALE-YFP, 70R-TALE-YFP and 177R-TALE-YFP proteins (Fig. S8).”

      “Proteins identified as being enriched with 70R-TALE-YFP (Figure 6D) were similar in comparisons with either the No-YFP, NonR-TALE-YFP or ingiR-TALE-YFP as negative controls.”

      Top Page 12:

      “The same kinetochore proteins were enriched regardless of whether the 177R-TALE proteomics data was compared with No-YFP, NonR-TALE or ingiR-TALE-YFP controls.”

      Discussion Page 13:

      “Regardless, the 147R-TALE and ingiR-TALE proteins were well expressed in T. brucei cells, but their affinity selection did not significantly enrich for any relevant proteins. Thus, 147R-TALE and ingiR-TALE provide reassurance for the overall specificity for proteins enriched TelR-TALE, 70R-TALE and 177R-TALE affinity purifications.”

      (4) After the artificial expression of repetitive sequence binding five-TALE proteins, the question is if there is any competition for the TALE proteins with the corresponding endogenous proteins? Is there any effect on parasite survival or health, compared to the control after the expression of these five TALEs YFP protein? It is recommended to add parasite growth curves, for all the TALE proteins expressing cultures.

      Growth curves for cells expressing TelR-TALE-YFP, 177R-TALE-YFP and ingiR-TALE-YFP are now included (New Fig S3A). No deficit in growth was evident while passaging 70R-TALE-YFP, 147R-TALE-YFP, NonR-TALE-YFP cell lines (indeed they grew slightly better than controls).

      The following text has been added page 8:

      “Cell lines expressing representative TALE-YFP proteins displayed no fitness deficit (Fig. S3A).”

      (5) Since the experiments were performed using whole-cell extracts without prior nuclear fractionation, the authors should consider the possibility that some identified proteins may have originated from compartments other than the nucleus. Specifically, the detection of certain binding proteins might reflect sequence homology (or partial homology) between mitochondrial DNA (maxicircles and minicircles) and repetitive regions in the nuclear genome. Additionally, the lack of subcellular separation raises the concern that cytoplasmic proteins could have been co-purified due to whole cell lysis, making it challenging to discern whether the observed proteome truly represents the nuclear interactome.

      In our experimental design, we confirmed bioinformatically that the repeat sequences targeted were not represented elsewhere in the nuclear or mitochondrial genome (kDNA). The absence of subcellular fractionation could result in some cytoplasmic protein selection, but this is unlikely since each TALE targets a specific DNA sequence but is otherwise identical such that cross-selection of the same contaminating protein set would be anticipated if there was significant non-specific binding. We have previously successfully affinity selected 15 chromatin modifiers and identified associated proteins without major issues concerning cytoplasmic protein contamination (Staneva et al 2021 and 2022; PMID: 34407985 and 36169304). Of course, the possibility that some proteins are contaminants will need to be borne in mind in any future follow-up analysis of proteins of interest that we identified as being enriched on specific types of repetitive element in T. brucei. Proteins that are also detected in negative control, or negative affinity selections such as No-YFP, NoR-YFP, IngiR-TALE or 147R-TALE must be disregarded.

      (6) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      As mentioned earlier, the author claimed that this study has provided new information concerning telomere biology, chromosomal segregation mechanisms, and immune evasion strategies. But there are no experiments that provides a role for any unknown or known protein in these processes. Thus, it is suggested to select one or two proteins of choice from the list and validate their direct binding to repetitive region(s), and their role in that region of interaction.

      As highlighted in response to point 1 the suggested validation and follow up experiments may well not be informative and are beyond the scope of the methodological development presented in this manuscript. Referee 2 describes the study in its current form as “a significant conceptual and technical advancement” and “This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology.”

      The Referee’s phrase ‘validate their direct binding to repetitive region(s)’ here may also mean to test if any of the additional proteins that we identified as being enriched with a specific TALE protein actually display enrichment over the repeat regions when examined by an orthogonal method. A key unexpected finding was that kinetochore proteins including KKT2 are enriched in our affinity purifications of the 177R-TALE-YFP that targets 177bp repeats (Figure 6F). By conducting ChIP-seq for the kinetochore specific protein KKT2 using YFP-KKT2 we confirmed that KKT2 is indeed enriched on 177bp repeat DNA but not flanking DNA (Figure 7). Moreover, several known telomere-associated proteins are detected in our affinity selections of TelRTALE-YFP (Figure 6B, FigS6; see also Reis et al, 2018 Nuc. Acids Res. PMID: 29385523; Weisert et al, 2024 Sci. Reports PMID: 39681615).

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The answer for this question depends on what the authors want to present as the achievements of the present study. If the achievement of the paper was is the creation of a new tool for discovering new proteins, associated with the repeat regions, I recommend that they add a proof for direct interactions between a sample the newly discovered proteins and the relevant repeats, as a proof of concept discussed above, However, if the authors like to claim that the study achieved new functional insights for these interactions they will have to expand the study, as mentioned above, to support the proof of concept.

      See our response to point 1 and the point we labelled ‘6’ above.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      I think that they are realistic. If the authors decided to check the capacity of a small sample of proteins (which was unknown before as a repetitive region binding proteins) to interacts directly with the repeated sequence, it will substantially add of the study (e.g., by EMSA; estimated time: 1 months). If the authors will decide to check the also the function of one of at least one such a newly detected proteins (e.g., by KD), I estimate the will take 3-6 months.

      As highlighted previously the proposed EMSA experiment may well be uninformative for protein complex components identified in our study or for isolated proteins that directly bind DNA in the context of a complex and chromatin. RNAi knockdown data and cell location data (as well as developmental expression and orthology data) is already available through tritrypDB.org and trtyptag.org

      Are the data and the methods presented in such a way that they can be reproduced? Yes

      Are the experiments adequately replicated, and statistical analysis adequate?

      The authors did not mention replicates. There is no statistical analysis mentioned.

      The figure legends indicate that all volcano plots of TALE affinity selections were derived from three biological replicates. Cutoffs used for significance: P < 0.05 (Student's t-test).

      For ChiP-seq two biological replicates were analysed for each cell line expressing the specific YFP tagged protein of interest (TALE or KKT2). This is now stated in the relevant figure legends – apologies for this oversight. The resulting data are available for scrutiny at GEO: GSE295698.

      Minor comments:

      Specific experimental issues that are easily addressable.

      The following suggestions can be incorporated:

      (1) Page 18, in the material method section author mentioned four drugs: Blasticidine, Phleomycin and G418, and hygromycin. It is recommended to mention the purpose of using these selective drugs for the parasite. If clonal selection has been done, then it should also be mentioned.

      We erroneously added information on several drugs used for selection in our labaoratory. In fact all TALE-YFP construct carry the Bleomycin resistance genes which we select for using Phleomycin. Also, clones were derived by limiting dilution immediately after transfection. We have amended the text accordingly:

      Page 17/18:

      “Cell cultures were maintained below 3 x 106 cells/ml. Pleomycin 2.5 µg/ml was used to select transformants containing the TALE construct BleoR gene.”

      “Electroporated bloodstream cells were added to 30 ml HMI-9 medium and two 10-fold serial dilutions were performed in order to isolate clonal Pleomycin resistant populations from the transfection. 1 ml of transfected cells were plated per well on 24-well plates (1 plate per serial dilution) and incubated at 37°C and 5% CO2 for a minimum of 6 h before adding 1 ml media containing 2X concentration Pleomycin (5 µg/ml) per well.”

      (2) In the method section the authors mentioned that there is only one site for binding of NonR-TALE in the parasite genome. But in Fig. 1C, the authors showed zero binding site. So, there is one binding site for NonR-TALE-YFP in the genome or zero?

      We thank the reviewer for pointing out this discrepancy. We have checked the latest Tb427v12 genome assembly for predicted NonR-TALE binding sites and there are no exact matches. We have corrected the text accordingly.

      Page 7:

      “A control NonR-TALE protein was also designed which was predicted to have no target sequence in the T. brucei genome.”

      Page 17:

      “A control NonR-TALE predicted to have no recognised target in the T. brucei geneome was designed as follows: BLAST searches were used to identify exact matches in the TREU927 reference genome. Candidate sequences with one or more match were discarded.”

      (3) The authors used two different anti-GFP antibodies, one from Roche and the other from Thermo Fisher. Why were two different antibodies used for the same protein?

      We have found that only some anti-GFP antibodies are effective for affinity selection of associated proteins, whereas others are better suited for immunolocalisation. The respective suppliers’ antibodies were optimised for each application.

      (4) Page 6: in the introduction, the authors give the number of total VSG genes as 2,634. Is it known how many of them are pseudogenes?

      This value corresponds to the number reported by Consentino et al. 2021 (PMID: 34541528) for subtelomeric VSGs, which is similar to the value reported by Muller et al 2018 (PMID: 30333624) (2486), both in the same strain of trypanosomes as used by us. Based on the earlier analysis by Cross et al (PMID: 24992042), 80% of the identified VSGs in their study (2584) are pseudogenes. This approximates to the estimation by Consentino of 346/2634 (13%) being fully functional VSG genes at subtelomeres, or 17% when considering VSGs at all genomic locations (433/2872).

      (5) I found several typos throughout the manuscript.

      Thank you for raising this, we have read through the manuscipt several times and hopefully corrected all outstanding typos.

      (6) Fig. 1C: Table: below TOTAL 2nd line: the number should be 1838 (rather than 1828)

      Corrected- thank you.

      - Are prior studies referenced appropriately? Yes

      - Are the text and figures clear and accurate? Yes

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Suggested above

      Reviewer #1 (Significance):

      Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field:

      This study represents a significant conceptual and technical advancement by employing a synthetic TALE DNA-binding protein tagged with YFP to selectively identify proteins associated with five distinct repetitive regions of T. brucei chromatin. To the best of my knowledge, it is the first report to utilize TALE-YFP for affinity-based isolation of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology. Importantly, any essential or unique interacting partners identified could serve as potential targets for therapeutic intervention.

      - Place the work in the context of the existing literature (provide references, where appropriate). I agree with the information that has already described in the submitted manuscript, regarding its potential addition of the data resulted and the technology established to the study of VSGs expression, kinetochore mechanism and telomere biology.

      - State what audience might be interested in and influenced by the reported findings. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as scientists investigating chromatin structure and the functional roles of repetitive genomic elements in higher eukaryotes.

      - (1) Define your field of expertise with a few keywords to help the authors contextualize your point of view. Protein-DNA interactions/ chromatin/ DNA replication/ Trypanosomes

      - (2) Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. None

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeatcontaining intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Major Comments

      None. The experiments are well-controlled, claims well-supported, and methods clearly described. Conclusions are convincing.

      Thank you for these positive comments.

      Minor Comments

      (1) Fig. 2 - I couldn't find an uncropped version showing multiple cells. If it exists, it should be linked in the legend or main text; Otherwise, this should be added to the supplement.

      The images presented represent reproducible analyses, and independently verified by two of the authors. Although wider field of view images do not provide the resolution to be informative on cell location, as requested we have provided uncropped images in new Fig. S4 for all the cell lines shown in Figure 2A.

      In addition, we have included as supplementary images (Fig. S3B) additional images of TelRTALE-YFP, 177R-TALE-YFP and ingiR-TALE YFP localisation to provide additional support their observed locations presented in Figure 1. The set of cells and images presented in Figure 2A and in Fig S3B were prepared and obtained by a different authors, independently and reproducibly validating the location of the tagged protein.

      (2) I think Suppl. Fig. 1 is very valuable, as it is a quantification and summary of the ChIP-seq data. I think the authors could consider making this a panel of a main figure. For the main figure, I think the plot could be trimmed down to only show the background and the relevant repeat for each TALE protein, leaving out the non-target repeats. (This relates to minor comment 6.) Also, I believe, it was not explained how background enrichment was calculated.

      We are grateful for the reviewer’s positive view of original Fig. S1 and appreciate the suggestion. We have now moved these analysis to part B of main Figure 2 in the revised manuscript – now Figure 2B. We have also provided additional details in the Methods section on the approaches used to assess background enrichment.

      Page 19:

      “Background enrichment calculation

      The genome was divided into 50 bp sliding windows, and each window was annotated based on overlapping genomic features, including CIR147, 177 bp repeats, 70 bp repeats, and telomeric (TTAGGG)n repeats. Windows that did not overlap with any of these annotated repeat elements were defined as "background" regions and used to establish the baseline ChIP-seq signal. Enrichment for each window was calculated using bamCompare, as log₂(IP/Input). To adjust for background signal amongst all samples, enrichment values for each sample were further normalized against the corresponding No-YFP ChIP-seq dataset.”

      Note: While revising the manuscript we also noticed that the script had a nomalization error. We have therefore included a corrected version of these analyses as Figure 2B (old Fig. S1)

      (3) Generally, I would plot enrichment on a log2 axis. This concerns several figures with ChIP-seq data.

      Our ChIP-seq enrichment is calculated by bamCompare. The resulting enrichment values are indeed log2 (IP/Input). We have made this clear in the updated figures/legends.

      (4) Fig. 4C - The violin plots are very hard to interpret, as the plots are very narrow compared to the line thickness, making it hard to judge the actual volume. For example, in Centromere 5, YFP-KKT2 is less enriched than 147R-TALE over most of the centromere with some peaks of much higher enrichment (as visible in panel B), however, in panel C, it is very hard to see this same information. I'm sure there is some way to present this better, either using a different type of plot or by improving the spacing of the existing plot.

      We thank the reviewer for this suggestion; we have elected to provide a Split-Violin plot instead. This improves the presentation of the data for each centromere. The original violin plot in Figure 4C has been replaced with this Split-Violin plot (still Figure 4C).

      (5) Fig. 6 - The panels are missing an x-axis label (although it is obvious from the plot what is displayed).

      Maybe the "WT NO-YFP vs" part that is repeated in all the plot titles could be removed from the title and only be part of the x-axis label?

      In fact, to save space the X axis was labelled inside each volcano plot but we neglected to indicate that values are a log2 scale indicating enrichment. This has been rectified – see Figure 6, and Fig. S7, S8 and S9.

      (6) Fig. 7 - I would like to have a quantification for the examples shown here. In fact, such a quantification already exists in Suppl. Figure 1. I think the relevant plots of that quantification (YFPKKT2 over 177bp-repeats and centromere-repeats) with some control could be included in Fig. 7 as panel C. This opportunity could be used to show enrichment separated out for intermediate-sized, mini-, and megabase-chromosomes. (relates to minor comment 2 & 8)

      The CIR147 sequence is found exclusively on megabase-sized chromosomes, while the 177 bp repeats are located on intermediate- and mini-sized chromosomes. Due to limitations in the current genome assembly, it is not possible to reliably classify all chromosomes into intermediate- or mini- sized categories based on their length. Therefore, original Supplementary Fig. S1 presented the YFP-KKT2 enrichment over CIR147 and 177 bp repeats as a representative comparison between megabase chromosomes and the remaining chromosomes (corrected version now presented as main Figure 2B). Additionally, to allow direct comparison of YFP-KKT2 enrichment on CIR147 and 177 bp repeats we have included a new plot in Figure 7C which shows the relative enrichment of YFP-KKT2 on these two repeat types.

      We have added the following text , page 12:

      “Taking into account the relative to the number of CIR147 and 177 bp repeats in the current T.brucei genome (Cosentino et al., 2021; Rabuffo et al., 2024), comparative analyses demonstrated that YFP-KKT2 is enriched on both CIR147 and 177 bp repeats (Figure 7C).”

      (7) Suppl. Fig. 8 A - I believe there is a mistake here: KKT5 occurs twice in the plot, the one in the overlap region should be KKT1-4 instead, correct?

      Thanks for spotting this. It has been corrected

      (8) The way that the authors mapped ChIP-seq data is potentially problematic when analyzing the same repeat type in different regions of the genome. The authors assigned reads that had multiple equally good mapping positions to one of these mapping positions, randomly.

      This is perfectly fine when analysing repeats by their type, independent of their position on the genome, which is what the authors did for the main conclusions of the work.

      However, several figures show the same type of repeat at different positions in the genome. Here, the authors risk that enrichment in one region of the genome 'spills' over to all other regions with the same sequence. Particularly, where they show YFP-KKT2 enrichment over intermediate- and mini-chromosomes (Fig. 7) due to the spillover, one cannot be sure to have found KKT2 in both regions.

      Instead, the authors could analyze only uniquely mapping reads / read-pairs where at least one mate is uniquely mapping. I realize that with this strict filtering, data will be much more sparse. Hence, I would suggest keeping the original plots and adding one more quantification where the enrichment over the whole region (e.g., all 177bp repeats on intermediate-/mini-chromosomes) is plotted using the unique reads (this could even be supplementary). This also applies to Fig. 4 B & C.

      We thank the reviewer for their thoughtful comments. Repetitive sequences are indeed challenging to analyze accurately, particularly in the context of short read ChIP-seq data. In our study, we aimed to address YFP-KKT2 enrichment not only over CIR147 repeats but also on 177 bp repeats, using both ChIP-seq and proteomics using synthetic TALE proteins targeted to the different repeat types. We appreciate the referees suggestion to consider uniquely mapped reads, however, in the updated genome assembly, the 177 bp repeats are frequently immediately followed by long stretches of 70 bp repeats which can span several kilobases. The size and repetitive nature of these regions exceeds the resolution limits of ChIP-seq. It is therefore difficult to precisely quantify enrichment across all chromosomes.

      Additionally, the repeat sequences are highly similar, and relying solely on uniquely mapped reads would result in the exclusion of most reads originating from these regions, significantly underestimating the relative signals. To address this, we used Bowtie2 with settings that allow multi-mapping, assigning reads randomly among equivalent mapping positions, but ensuring each read is counted only once. This approach is designed to evenly distribute signal across all repetitive regions and preserve a meaningful average.

      Single molecule methods such as DiMeLo (Altemose et al. 2022; PMID: 35396487) will need to be developed for T. brucei to allow more accurate and chromosome specific mapping of kinetochore or telomere protein occupancy at repeat-unique sequence boundaries on individual chromosomes.

      Reviewer #2 (Significance):

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with minichromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Thank you for supporting the novelty and broad interest of our manuscript

      My field of expertise / Point of view:

      I'm a computer scientist by training and am now a postdoctoral bioinformatician in a molecular parasitology laboratory. The laboratory is working on antigenic variation in T. brucei. The focus of my work is on analyzing sequencing data (such as ChIP-seq data) and algorithmically improving bioinformatic tools.

    1. eLife Assessment

      This important study examines the role of map3k1, a MAP3K family member that has both kinase and ubiquitin ligase domains, in the differentiation of progenitors in the flatworm Planaria. The convincing analyses demonstrate that map3k1 acts within progenitors to restrict their premature differentiation and to prevent formation of teratomas. This work would be of interest to researchers in the fields of regeneration, developmental biology, and aging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors, ectopic neurons and glands along the AP axis and pharynx in ectopic anterior positions. The rest of the study shows that positional information is largely unaffected by loss of map3k1. However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route. They also show that "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas. In short, this study convincingly demonstrates that in planaria, map3k1 maintains progenitor cells in an undifferentiated state, preventing premature fate commitment until they encounter the appropriate signals, either positional cues within a designated region or contact-dependent inputs from surrounding tissues.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work is high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not other invertebrates.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Weaknesses:

      The authors have satisfactorily addressed our previous concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors (Fig. 1), ectopic neurons and glands along the AP axis (Fig. 2) and pharynx in ectopic anterior positions (Fig. 3). The rest of the study show that positional information is largely unaffected by loss of map3k1 (Fig. 4,5). However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route (Fig. 6). They also show that an ill-defined "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work appears to be high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not in other invertebrates.

      Weaknesses:

      (1) The paper is largely descriptive with no mechanistic insights. 

      The mechanistic insights we aim to address are primarily at the cellular systems level – how adult progenitor cells produce pattern. Specifically, we uncovered strong evidence that regulation of differentiation is an active process occurring in migratory progenitors and that this regulation is a major component of pattern formation during the adult processes of tissue turnover and regeneration. The map3k1 phenotype provided a tool used to reveal the existence of this regulation, and to understand the patterning abnormalities prevented by this regulatory mechanism. We updated the text in several places to make clearer some of this emphasis. For example, in the Discussion: "We suggest that differentiation is restricted during migratory targeting as an essential component of pattern formation, with the map3k1 RNAi phenotype indicating the existence and purpose of this element of patterning." 

      Naturally, identifying a particular molecule involved in this process is of interest for understanding molecular mechanism; this would allow for comparison to other cellular systems in other organisms and would focus future molecular inquiry. Future molecular studies into the mechanism of Map3k1 regulation and its downstream signaling will be fascinating as next steps towards understanding the process at the molecular level more deeply. We also added some discussion considering the types of upstream activation cues that could potentially be associated with Map3k1 regulation to suppress differentiation. 

      (2) Given the severe phenotypes of long-term depletion of map3k1, it is important that this exact timepoint is provided in the methods, figures, figure legends and results. 

      We removed the use of the term “long-term” and instead added timepoints used to all figure legends. We also added a summary of timepoints used in the methods section and included RNAi timepoint labels in figures where a phenotype progression over time is relevant to interpretation. For timecourses, we also added suitable time information to text in the results. 

      (3) Figure 1C, the ectopic eyes are difficult to see, please add arrows. 

      To improve visualization, we replaced the example animal in the original Figure 1C with one that has a stronger phenotype, including arrows pointing to every ectopic event. Additionally, we included magnified images of optic cup cells and photoreceptor neurons in the trunk and tail region. This is now Figure 1B.

      (4) line 217 - why does the n=2/12 animals not match the values in Figure 3B, which is 11/12 and 12/12. The numbers don't add up. Please correct/explain. 

      In Figure 3B in the submitted version (3/18 had cells in the tail) had more animals scored (6 animals from a replicate experiment where 1/6 showed the cells in the tail) than the total scored (2/12 had cells in the tail) in the text, which did not have the animals from the replicate added during writing. The results are the same, just different sample sizes were noted in those locations and we fixed this issue. In the updated Figure 3, the order of presentation has shifted (e.g., prior 3B is now in 3C and Figure 3_figure supplement 1). We made sure to include numbers to all figure panels. 

      (5) Figure panels do not match what is written in the results section. There is no Figure 6E. Please correct.

      Thank you for catching this. We have gone through figures and text after editing to make sure that text callouts are appropriately matched to the figures. 

      Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Thank you to the reviewer for the positive feedback

      Weaknesses:

      The article presents a provocative idea related to the importance of positional control for organs and cells, which is at least in part regulated by map3k1. Nonetheless, the role of map3k1 or its potential interaction with regulators of the anterior-posterior, mediolateral axes, and PCGs is somewhat superficial. The authors could elaborate or even speculate more in the discussion section and the different scenarios incorporating these axial modulators into the map3k1 model presented in Figure 8 

      First, to strengthen the support for our finding that positional information is largely unaffected in map3k1 RNAi animals, we added data regarding the expression of additional relevant position control genes (PCGs) –ndl-4, ptk7, sp5, and wnt11-1 – to the PCG panel in Figure 5. The expression domain of ndl-4, an FGF receptor-like protein family member that contributes to head patterning and anterior pole maintenance, was normal in map3k1 RNAi. wnt11-1, a PCG with expression concentrated in the posterior end of the animal and with expression dependent on general Wnt activity, was also normal in map3k1 RNAi animals. ptk7, RNAi of which can result in supernumerary pharynges, also showed normal expression in map3k1 RNAi animals. Finally, sp5, a Wnt-activated gene with expression in the tail, also showed normal expression in map3k1 RNAi animals. 

      Second, to further support the conclusion that cells are not suitably responding to positional information after map3k1 RNAi, which we argue normally dictates where differentiation should occur, we added examples of differentiated cell types that are ectopically positioned within an atypical PCG expression domain for that cell type (Figure 5C). This underscores that following map3k1 RNAi the PCG expression domains do not change, but the pattern of differentiated cell types relative to these domains does shift. We also added data showing that regenerating tails had a proper wntP-2 gradient, but an anterior regenerating pharynx appeared outside of this wntP-2<sup>+</sup> zone and inside of an ndl-5<sup>+</sup> zone (Figure 5- figure supplement 1E). We added some discussion of these new data in the Figure 5 results section. We also noted, regarding independent recent map3k1 work (Lo, 2025), some evidence exists that a minor posterior shift in ndl-5 expression can occur after map3k1 RNAi.

      Next, we added a new element to the model figure (Figure 8B) depicting that PCG expression domains remain normal after map3k1 RNAi, with ectopic differentiation occurring in an incorrect positional information environment. We refer to this new panel in the discussion: "We suggest that map3k1 is not required for the spatial distribution of progenitor-extrinsic differentiation-promoting cues themselves, but for progenitors to be restricted from differentiating until these cues are received (Figure 8B)."; we then follow this statement with a summary in the Discussion of six pieces of evidence that support this model.

      Finally, we added some additional text to the discussion section about candidate mechanisms by which extrinsic cues could potentially regulate Map3k1, pointing to potential future inquiry directions. We suggest that map3k1 is not involved in regulating PCG activity domains themselves, but instead acts as a brake on differentiation within migratory progenitors through active signaling. This brake is then lifted when the progenitors hit their correct PCG-based migratory target, or when they hit their target tissue. How that occurs mechanistically is unknown. One scenario is that each progenitor is tuned to respond to a particular PCG-regulated environment (such as a particular ECM or signaling environment) to generate a molecular change that inactivates Map3K1 signaling, such as by inactivating or disengaging an RTK signal. Alternatively, the migratory process in progenitors could engage the Map3K1 signal, enabling signal cessation with arrival at a target location. When Map3K1 is active it could result in a transcriptional state that prevents full expression of differentiated factors required for maturation, tissue incorporation, and cessation of migration. These considerations are now added to the discussion.

      The article can be improved by addressing inconsistencies and adding details to the results, including the main figures and supplements. This represents one of the most significant weaknesses of this otherwise intriguing manuscript. Below are some examples of a few figures, but the authors are expected to pay close attention to the remaining figures in the paper.

      Details associated with the number of animals per experiment, statistical methods used, and detailed descriptions of figures appear inconsistent or lacking in almost all figures. In some instances, the percentage of animals affected by the phenotype is shown without detailing the number of animals in the experiment or the number of repeats. Figures and their legends throughout the paper lack details on what is represented and sometimes are mislabeled or unrelated. 

      We endeavored to ensure that these noted details are present throughout the legends and figures for all figure panels.

      Specifically, the arrows in Figure 1A are different colors. Still, no reasoning is given for this, and in the exact figure, the top side (1A) shows the percentages and the number of animals below. 

      The only reason for the different colored arrows was for visibility purposes. To avoid confusion, we now use white arrows for all FISH images in figure 1, and where ever else possible. We also replaced the percentages with n numbers in the bottom left corner of the live images in Figure 1A. 

      Conversely, in Figures 1B, C, and D, no details on the number of animals or percentages are shown, nor an explanation of why opsin was used in Figure 1A but not 1B. 

      The original Figure 1B represented a few different examples of ectopic eye/eye cell patterns in the map3k1 RNAi animals to demonstrate the variable and disorganized nature of the phenotype. To address this, we added further explanation in the legend. We also merged 1A and 1B for simplicity of interpretation. opsin was used in Figure 1A to label cell bodies of photoreceptors. anti-Arrestin was used in the example FISH images to see if these cells were interconnected via projections, which we now clarify in the legend and in the text. 

      Is Figure 1B missing an image for the respective control? Figure 1C needs details regarding what the two smaller boxes underneath are. 

      The control for Figure 1B was in Figure 1A; the merger of Figures 1A/B should address this. Boxes in Figure 1C were labelled with numbers corresponding to the image above them.

      Figure 1C could use an AP labeling map in 10 days (e.g., AP6 has one optic cup present). Figure 1C and F counts do not match. 

      We added a cartoon to 1C to show the region imaged. Note that the 36d trunk image (now Fig. 1B) has now been replaced with a full animal image and magnified boxes that show locations of example ectopic cells. That cell in 1C was categorized as in AP5. Note that additional animals were analyzed and data added to the distribution (now Fig. 1D). 

      In Figure 1C, we do not know the number of animals tested, controls used, the scale bar sizes in the first two images, nor the degree of magnification used despite the pharynx region appearing magnified in the second image.  Figure 1C is also shown out of chronological order; 36 days post RNAi is shown before 10 days post RNAi. Moreover, the legends for Figures 1C and 1D are swapped.

      We have endeavored to ensure sample numbers, control images, and appropriate scale bar notation in legends are present for all images. Figure 1C has now been split into two panels: Figure 1B and Figure 1C. It does not follow a chronological order in presentation for the following logic flow: we uncover and describe the phenotype; then, with knowledge of the defect, we walk back to see how early the phenotype starts after RNAi and what the pattern of ectopic cell distribution is when the phenotype starts to emerge (using the knowledge of which cells are affected from the overt phenotype described in 1A/B). 

      Additionally, Figure 1F and many other figures throughout the paper lack overall statistical considerations. Furthermore, Figure 1F has three components, but only one is labeled. Labeling each of them individually and describing them in the corresponding figure legend may be more appropriate.

      The main point of the graphs in 1F (now 1D) was the overt overall pattern difference with the wild-type, which never has ectopic eye cells in the midbody or tail, and that the ectopic eye cells progress throughout the entire AP axis. However, we concur that a statistical test is a reasonable thing to show here and that is now included in the legend. The 3 components (in Figure 1F, now Figure 1D) where kept together with one figure label (D) for simplicity, but were rearranged (top and bottom) with a cartoon to the side and with modified labeling for extra clarity. 

      Figure 2C shows images of gene expression for two genes, but the counts are shown for only one in Figure 2D. It is challenging to follow the author's conclusions without apparent reasoning and by only displaying quantitative considerations for one case but not the other. These inconsistencies are also observed in different figures. 

      In Figure 2C, FISH images of cintillo+ and dd_17258+ neurons are shown to display the specificity of this effect to some neurons and not others. Because cintillo+ cells did not expand at all (n=24/24 animals), the counts for them would all be zero values. We only counted data for dd_17258 cells because it was the neuron that expanded compared to the control animals. We have now added a note in the legend explaining this.

      In Figure 2D, 24/24 animals were reported to show the phenotype, but only eight were counted (is there a reason for this?).

      8 animals were used to quantitatively characterize the spread of cells along the AP axis, as it was deemed an adequate sample size to capture the change in distribution of 17258+ cells from being head restricted to being present throughout the body. Through multiple cohorts of animals in replicates, a total of 24/24 examined animals showed this expansion phenotype. Double FISH experiments were additionally carried out using dd_17258 and various PCGs; these data are now included in Figure 5C, and these animals were added to the total counts regarding quantitative analysis of the phenotype in Figure 2D. 

      In Figure 2E, the expression for three genes is shown, with some displaying anterior and posterior regions while others only show the anterior picture. Is there a particular reason for this? 

      The original first panel in Figure 2E showed an example of a non-expanding gland cell type, dd_9223, which is very restricted to the head in both control and map3k1 RNAi animals. Because we did not observe a phenotype for this cell type (no cells in all control and map3k1 RNAi animal tails), we only included tail images of cell types that showed an abnormal phenotype with clear expanded to the posterior (dd_8476 and dd_7131). However, we have now included tail images of dd_9223 cells and added data for dd_9223 to the graph in Figure 2E. 

      Also, in Figure 2F, the counts are shown for only the posterior region of two genes out of the three displayed in Figure 2E. It is unclear why the authors do not show counts for the anterior areas considered in Figure 2E. Furthermore, the legend for Figure 2D is missing, and the legend for 2F is mislabeled as a description for Figure 2D.

      We now include tail images for dd_9223 in Figure 2E to show that there are no ectopic cells in tails. We did not originally include counts of dd_9223 because there was no phenotype observed. dd_7131 and dd_8476 cell types appeared in the posterior of even control animals at a low frequency, unlike dd_9223 cells. However, we did now add counts for dd_9223 tail regions in the graph. We did not count the anterior regions of the animal because our goal was to show data for the visible phenotype (ectopic cells in the tail) not only with an example image, but also by showing the number of cells in the tail with a graph and statistical test. Legends have been updated with correct details.

      Supplement Figure 1 B reports data up to 6 weeks, but no text in the manuscript or supplement mentions any experiment going up to 6 weeks. There are no statistics for data in Supplement Figure 1E. Any significance between groups is unclear.

      More details about the RNAi feeding schedules have been added in the methods section. All RNAi timepoints are now specified specifically in the legends. The Figure 1F and Figure 1- figure supplement 1E (additional data: ovo<sup>+</sup>; smedwi-1<sup>-</sup> cell counts) and legends now mention the statistical tests performed and annotations (not significant *ns) or p values have been added to the graphs. For simplicity, we decided to include all smedwi-1+ counts together rather than splitting them into low and high smedwi-1+ cells, because we weren't really making any claims about low and high cells. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It would be good to acknowledge in the discussion the recent paper from the Petersen lab on map3k1, published in PLoS Genet 2025, especially if the results differ between the two labs.

      We added reference/discussion regarding the recent PLoS Genetics Lo, 2025 map3k1 paper at several suitable points in the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Please pay close attention to the description of experimental details and the consistency throughout the paper. It seems like the reader has to assume or come across information that is not readily available from the text or the legends in the paper. This is an interesting paper with intriguing findings. However, the version presented here appears rushed or put together on the flight.

      Thank you for your thorough feedback. We have endeavored to ensure all appropriate details are present in figures and/or figure legends.

    1. eLife Assessment

      This important study employs a closed-loop, theta-phase-specific optogenetic manipulation of medial septal parvalbumin-expressing neurons in rats and reports that disrupting theta-timescale coordination impairs performance of challenging aspects of spatial behaviors, while sparing hippocampal replay and spatial coding in hippocampal place cells. The findings are expected to advance theoretical understanding of learning and memory operations and to provide practical implications for the application of similar optogenetic approaches. The experiments were viewed as technically rigorous, but the strength of evidence provided in the current version of the manuscript was viewed as incomplete, mostly due to limited analyses and the descriptions of some of the experimental protocols.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Joshi and colleagues demonstrates that the precise theta-phase timing of spikes is causal for CA1 hippocampal theta sequences during locomotion on a linear track and is necessary for learning the cognitively demanding outbound component of a hippocampus-dependent alternation task (W-maze), independently of replay during immobility. To reach these conclusions, the authors developed a theta-phase-specific, closed-loop manipulation that used optogenetic activation of medial septal parvalbumin (PV) interneurons at the ascending phase of theta during locomotion. This protocol preserved immobility periods, allowing a clean and elegant dissociation from SWR-associated replay.

      The manuscript is well written and was a pleasure to read. The work described is of high quality and introduces several notable advances to the field:

      (a) It extends prior studies that manipulated theta oscillations by examining precise temporal structure (specifically theta sequences) rather than only LFP features.

      (b) The closed-loop manipulation enabled dissociation between deficits in theta sequences during a behavioural task and SWR-associated replay activity.

      (c) As controls, the authors included rats with suboptimal viral transduction or optic-fibre placement, and, within subjects, both stimulation-on (stim-on) and stimulation-off (stim-off) trials. Notably, sequence disruption persisted into stim-off periods within the same session.

      Overall, this is a strong manuscript that will provide valuable insights to the field. I have only minor comments:

      (1) As the authors note, it is striking that both behavioural performance and spike patterns are altered during stim-off trials. They propose that "disruption of theta sequences during the initial experience in an environment is sufficient to have lasting effects," implying that rapid, experience-dependent plasticity is driven by sequential firing. Does this imply that if rats were previously trained on the task, subsequent stim-on and stim-off trials would yield different outcomes, with stim-off trials showing improved performance and intact theta sequences? For example, if the sequence of one-third stim-on, one-third stim-off, one-third stim-on were inverted to off-on-off, would theta sequences be expected to emerge, disappear, and potentially re-emerge? While I am not asking for additional experiments, I think the discussion could be extended in this aspect.

      Alternatively, could the number of stim-off trials (one third of the total) be insufficient to support learning/induce plasticity? In the controls, ~50-100 trials appear necessary to achieve high performance.

      (2) In line with the point above, the authors characterise the behavioural changes induced by MS optogenetic stimulation specifically as a "learning deficit," as rats failed to improve across 300 trials in an initially novel environment (W-maze). While they present this as complementary to prior demonstrations of impaired performance on previously learned tasks (Zutshi et al., 2018; Quirk et al., 2021; Etter et al., 2023; Petersen et al., 2020), an alternative interpretation is a working-memory deficit. This would produce the same behavioural pattern, with reference memory (the less cognitively demanding trials) remaining intact despite stimulation and concomitant changes in theta sequences. This interpretation would also be consistent with work in certain disease models, where reduced synaptic plasticity and working-memory deficits co-occur with preserved place coding despite impaired theta sequences (e.g., Viana da Silva et al., 2024; Donahue et al., 2025).

      (3) It was not immediately clear whether SWR-associated activity was derived from the interleaved ~15-min rest sessions in a rest box, or from periods of immobility or reward consumption in the maze (aSWR, as in Jadhav et al 2012). Regardless, it would be informative to compare aSWR events within the maze to rest-box SWRs that may occur during more prolonged slow-wave episodes (even if not full sleep). This contrasts with Liu et al. (2024), who analysed replay during ~1.5-h sleep sessions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this study developed a closed-loop optogenetic stimulation system with high temporal precision in rats to examine the effect of medial septum (MS) stimulation on the disruption of hippocampal activity at both behavioral and compressed time scales. They found that this manipulation preserved hippocampus single-cell-level spatial coding but affected theta sequences and performance during a spatial alternation task. The performance deficits were observed during the more cognitively demanding component of the task and even persisted after the stimulation was turned off. However, the effects of this disruption were confined to locomotor periods and did not impact waking rest replay, even during the early phase of stimulation-on. Their conclusion is consistent with previous findings from the Pastalkova lab, where MS disruption (using different methods) affected theta sequences and task performance but spared replay (Wang et al., 2015; Wang et al., 2016). However, it differs from a recent study in which optogenetic disruption of EC inputs during running affected both theta sequences and replay (Liu et al., 2023).

      Strengths:

      The experiments were well designed and controlled, and the results were generally well presented.

      Weaknesses:

      Major concerns are primarily technical but also conceptual. To further increase the impact of this study by contrasting findings from different disruptions, it is necessary to better align the analysis and detection methods.

      Major concerns:

      (1) To show that MS disruption does not affect spatial tuning, the authors computed the KL divergence of tuning curves between stimulation-on and stimulation-off conditions. I have two main questions about this analysis:

      (1.1) The authors seem to impose stringent inclusion criteria requiring a large number of spikes and a strong concentration of tuning curves. These criteria may have selected strongly spatially tuned cells, which are typically more stable and potentially less vulnerable to perturbations. Based on the Figure 2 caption, it seems that fewer than 10% of cells were included in the KL divergence analysis, which is lower than the usual proportion of place cells reported in the literature. What is the rationale for using such strict inclusion criteria? What happens to the cells that are not as strongly tuned but are still identified as significant place cells?

      (1.2) The KL divergence was computed between stimulation-on and stimulation-off conditions within the same animal group. However, the authors also showed that MS stimulation had lasting effects on theta sequences and performance even during stimulation-off periods. Would that lasting effect also influence spatial tuning? Based on these questions, the authors should perform additional analyses that directly measure spatial tuning quality and compare results across control and experimental groups - for example, spatial information of spikes (Skaggs et al., 1996), tuning stability, field length, and decoding error during running.

      (2) The authors compared their results with those from Liu et al. (2023) and proposed that the different outcomes could be explained by different sites of disruption. However, the detection and quantification methods for theta sequences and replay differ substantially between the two studies, emphasizing different aspects of the phenomenon. I am not suggesting that either method is superior, but providing additional analyses using aligned detection methods would better support the authors' interpretations and benefit the field by enabling clearer comparisons across studies. In the current analysis, the power spectrum of the decoded ahead/behind distance only indicates that there is a rhythmic pattern, without specifying the decoding features at different theta phases. Moreover, the continuous non-local representations during ripples could include stationary representations of a location or zigzag representations that do not exhibit a linear sequential trace. Given that, the authors should show averaged decoding results corrected by the animal's actual position within theta cycles and compute a quadrant ratio. For replay analysis, they could use a linear fit (as in Liu et al., 2023) and report the proportion of significant replay events.

      (3) The finding that theta sequences and performance were impaired even during stimulation-off periods is particularly interesting and warrants deeper exploration. In the Discussion, the authors claim that this may arise from "the rapid plasticity engaged during early learning." However, this explanation does not fully account for the observation. Previous studies have shown that theta sequences can develop very rapidly (Feng et al., Foster lab, 2015; Zhou et al., Dragoi lab, 2025). If the authors hypothesize that rapid plasticity during early stimulation-on disrupts the theta sequence, then the plasticity window must also be short and terminate during the subsequent stimulation-off period. Otherwise, why can't animals redevelop theta sequences during stimulation-off? The authors should conduct additional analyses during the stimulation-off periods of the W-maze task. For example:

      (3.1) What is the spike-theta phase relationship? Do the phases return to normal or remain altered as during stimulation-on?

      (3.2) Is there a significant place-field remapping from stimulation-on to stimulation-off? (Supplementary Figure 3F includes only a small subset of cells; what if population vector correlations are computed across all cells, or Bayesian decoding of stimulation-on spikes is performed using stimulation-off tuning curves?)

      (3.3) The authors should also discuss why the stimulation-off epochs were not sufficient to support learning, and if the stimulation-off place cell sequences could have supported replay.

      (4) Citations and/or discussion of key studies relevant to the current work are missing: Wang et al. in Pastalkova lab 2015-2016 studies for disruption of theta sequence (but not place cell sequence) disrupting learning but not replay, Drieu et al. in Zugaro lab 2018 study on disruption of theta sequence affecting sleep replay, Farooq and Dragoi 2019 for association between a lack of theta sequence and presence of waking rest replay during postnatal development, etc. The authors should discuss what the conceptually new findings in the current study are, given the findings of the previous literature above.

      (5) The assessment of theta sequence is not state-of-the-art:

      (5.1) Detecting the peak of cross-correlograms between neurons (CCG) relates to behavioral timescale CCG, not the theta sequence one; for the theta sequence, the closest to zero local peak should be used instead.

      (5.2) How were other methods of detecting theta sequences performing on the stimulation-on/stimulation-off data: Bayesian decoding, firing sequences?

      (5.3) How was phase precession during stimulation-on/stimulation-off?

      (6) It would be important to calculate additional variables in the replay part of the study to compare the quality of replay across the 2 groups:

      (6.1) Proportion of significant replay events out of the detected multiunit events.

      (6.2) The average extent of trajectory depicted by the significant replay events in the targeted compared to the control, stimulation-on/stimulation-off.

    4. Reviewer #3 (Public review):

      Joshi et al. present an elegant and technically rigorous study examining how the temporal structure of hippocampal spiking during locomotion contributes to spatial learning. Using a closed-loop, theta phase-specific optogenetic manipulation of medial septal parvalbumin-expressing neurons in rats, the authors demonstrate that disrupting theta-timescale coordination impairs performance on the cognitively demanding component outbound trajectory of a spatial alternation task, while sparing hippocampal replay, place coding, and the simpler inbound learning. The work aims to dissociate the role of theta-associated temporal organization during navigation from sharp-wave ripple-associated replay during subsequent rest periods, providing a mechanistic link between theta sequences and learning. The findings have important implications for models of septo-hippocampal coordination and the functional segregation between online (theta) and offline (SWR) network states. That said, there are a few conceptual and methodological issues that need to be addressed.

      One concern is the overall novelty of this work; the dissociation between online temporal sequence and offline replay events following memory deficits has previously been shown by Wang et al., 2016 elife. While the authors discuss Lui et al., 2023, which demonstrates MEC activation of inhibitory neurons at gamma frequencies during locomotion disrupts theta sequences, subsequent replay and learning (line 65-66), they do not reference Wang et al., 2016 who performed a very similar study with MS pharmacological inactivation, and report large decreases in theta power, attenuated theta frequencies together with behavioural deficits but SWR replay persisted. Given strong similarities in the manipulation and findings, this study should be discussed.

      Along the same lines, it should be noted that Brandon et al. (2014, Neuron) demonstrated that hippocampal place codes can still form in novel environments despite MS inactivation and loss of theta, indicating that spatial representations can emerge without intact septal drive. Referencing this study would strengthen the discussion of how temporal coordination, rather than spatial coding per se, underlies the learning deficits observed here.

      The conclusion that disrupting "theta microstructure" impairs learning relies on the assumption that the observed behavioral deficits arise from altered temporal coding from within hippocampal CA1 only. However, optogenetic modulation of medial septal PV neurons influences multiple downstream regions (entorhinal cortex, retrosplenial cortex) via widespread GABAergic projections. While the authors do touch on this, their discussion should expand to include the network-level consequences of entorhinal grid-cell disruption and how this could affect temporal coding both online and offline.

      The finding that replay content, rate, and duration are unchanged is critical to the paper's claim of dissociation. However, the analysis is restricted to immobility on the track. Given evidence for distinct awake vs. sleep replay, confirming that off-track rest and post-session sleep replays are similarly unaffected would confirm the conclusions of the paper. If these data are unavailable, the limitation should be acknowledged explicitly. Moreover, statistical power for detecting subtle differences in replay organization or spatial bias should be added to the supplement (n of events per animal, variability across sessions).

      The exact protocol for optogenetic stimulation is a bit confusing. For the task, the first and final third (66%) of trials were disrupted and were only stimulated when away from the reward well and only when the animal was moving. What proportion of time within "stimulated" trials remained unstimulated? Why were only 66% of trials stimulated?

    5. Author response:

      We thank all reviewers for their overall assessment, thoughtful comments, and suggestions. We are working to address each reviewer’s comment in detail. In this provisional response, we provide clarifications regarding our experimental approach and the novelty of our work, and include additional analyses that we have performed since the submission of the manuscript. We are also happy to report that we have now shared the raw data, intermediate analysis files, and the complete repository to facilitate replication of the analysis and figures.

      Code repo: github.com/LorenFrankLab/ms_stim_analysis

      Data repo: dandiarchive.org/dandiset/001634

      Docker containers (see GitHub repo for use instructions):

      Database: https://hub.docker.com/r/samuelbray32/spyglass-db-ms_stim_analysis

      Python notebooks: https://hub.docker.com/r/samuelbray32/spyglass-hub-ms_stim_analysis

      (1) Novelty and contrast with earlier manipulations:

      We thank the reviewers for suggesting that we explicitly contrast our results with prior pharmacological (Wang et al., 2016; Wang et al., 2015; Koenig et al., 2011; Brandon et al., 2014), systemic (Robbe & Buzsaki 2009; Petersen and Buzsáki 2020), and behavioral (Drieu et al., 2018) manipulations that also assessed some of the physiological features we evaluated. We will add a discussion of these studies, which will help us emphasize both the insights and discrepancies observed using these prior approaches. We will also more clearly explain the the novelty and importance of our specific approach for temporally and physiologically precise manipulation. Specifically, our approach (closed-loop theta-phase stimulation during locomotion) provides a level of physiological specificity that made it possible to dissociate theta-state dynamics from other hippocampal processes. This in turn allowed us to address a question that has remained unresolved across prior studies: Are hippocampal spatial sequences during locomotion (i.e., theta sequences) necessary to learn a novel hippocampal-dependent task?

      (2) Additional analysis on SWRs during rest:

      since submitting the manuscript, we have conducted additional analysis on the rate and length of SWRs in the rest box and found that their rate and length are also indistinguishable between targeted and control animals (effect of manipulation between control and targeted animals; rSWR rate: p=0.45; rSWR length: p=0.94, mixed effect model). We also find evidence for sequential neural representations in the rest box, when the encoding was performed in the behavioral arena. Example trajectories are shown below. These results are consistent with our observations on SWRs rate, length, and content in the behavioral arena. Additionally, we are in the process of evaluating and quantifying the results of decoding the rSWRs and will include those in the next version of the manuscript.

      Author response image 1.

      Sequential replay events observed in the rest box

      (3) Theta sequence measurement in the absence of theta:

      In the next version of the manuscript, we will explicitly explain why our manipulation makes it is more appropriate to measure sequential hippocampal representations during locomotion (i.e., theta sequences) without using theta oscillation or an epoch-averaged relatively large sliding window as a reference. The key insight here is that our manipulation suppresses theta and thus makes it difficult or impossible to accurately identify theta phase. We understand that theta-phase based approaches were used in prior work; however, these prior analyses may have confounded the absence of hippocampal theta sequences during locomotion by the inability to detect theta oscillatory phase reliably. We will show that our method of using clusterless Bayesian decoding in which we estimate the decoded position at every 2ms timestep is indeed able to capture endogenous hippocampal sequences even without imposing any requirements of aligning to theta oscillations, thus providing an unbiased estimate of the rhythmicity of hippocampal spatial representations.

      (4) Additional analysis on place cell stability and tuning:

      We thank the reviewer for this question. For the KL divergence analysis, we have imposed a spike-count criterion (100 spikes for each interval type —stimulation-off, stimulation-on, and the stimulus sub-interval) and a coverage criterion (50% HPD of the units’ spatial firing distribution was contained within 40cm on the linear track and 100cm on the w-track). These criteria were chosen to ensure that spatial tuning curves were sufficiently well sampled and localized to allow reliable estimation of KL divergence, which is particularly sensitive to noise arising from low spike counts or diffuse firing. Based on the reviewer’s suggestion, we have relaxed the unit inclusion criteria for KL divergence by relaxing the criteria for number of spikes and spatial coverage criterion to include more weakly tuned place cells and replicated our results (p=.146). Further, we have also evaluated the stability of place field order between stimulation-on and stimulation-off conditions using more standard methods (as in Wang et. al., 2015; spearman correlation of place field order, control vs targeted, p = .920, t-test). These results are consistent with our observations about place field stability during stimulation-off and stimulation-on conditions (Fig. 2F).

      Author response image 2.

      Spearman correlation of place field order during stimulation-on and stimulation-off conditions.

    1. eLife Assessment

      This is a useful study that investigates the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). The authors generate Dreg1-/- mice and show a reduction of group 2 innate lymphoid cells (ILC2). However, the strength of evidence supporting the impact of Dreg1 on Gata3 expression, a transcription factor required for ILC2 cell fate decisions, and the cell-intrinsic requirement of Dreg1 for ILC2 remain incomplete. This study will be of interest to immunologists.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines the role of the long non-coding RNA Dreg1 in regulating Gata3 expression and ILC2 development. Using Dreg1-deficient mice, the authors show a selective loss of ILC2s but not T or NK cells, suggesting a lineage-specific requirement for Dreg1. By integrating public chromatin and TF-binding datasets, they propose a Tcf1-Dreg1-Gata3 regulatory axis. The topic is relevant for understanding epigenetic regulation of ILC differentiation.

      Strengths:

      (1) Clear in vivo evidence for a lineage-specific role of Dreg1.

      (2) Comprehensive integration of genomic datasets.

      (3) Cross-species comparison linking mouse and human regulatory regions.

      Weaknesses:

      (1) Mechanistic conclusions remain correlative, relying on public data.

      (2) Lack of direct chromatin or transcriptional validation of Tcf1-mediated regulation.

      (3) Human enhancer function is not experimentally confirmed.

      (4) Insufficient methodological detail and limited mechanistic discussion.

    3. Reviewer #2 (Public review):

      The authors investigate the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). Dreg1 is encoded close to the Gata3 locus, a transcription factor implicated in the differentiation of T cells and ILC, and in particular of type 2 immune cells (i.e., Th2 cells and ILC2). The center of the paper is the generation of a Dreg1-deficient mouse. While Dreg1-/- mice did not show any profound ab T or gd T cell, ILC1, ILC3, and NK cell phenotypes, ILC2 frequencies were reduced in various organs tested (small intestine, lung, visceral adipose tissue). In the bone marrow, immature ILC2 or ILC2 progenitors were reduced, whereas a common ILC progenitor was overrepresented, suggesting a differentiation block. Using ATAC-seq, the authors find that the promoter of Dreg1 is open in early lymphoid progenitors, and the acquisition of chromatin accessibility downstream correlates with increased Dreg1 expression in ILC2 progenitors. Examining publicly available Tcf1 CUT&Run data, they find that Tcf1 was specifically bound to the accessible sites of the Dreg1 locus in early innate lymphoid progenitors. Finally, the syntenic region in the human genome contains two non-coding RNA genes with an expression pattern resembling mouse Dreg1.

      The topic of the manuscript is interesting. However, there are various limitations that are summarized below.

      (1) The authors generated a new mouse model. The strategy should be better described, including the genetic background of the initially microinjected material. How many generations was the targeted offspring backcrossed to C57BL/6J?

      (2) The data is obtained from mice in which the Dreg1 gene is deleted in all cells. A cell-intrinsic role of Dreg1 in ILC2 has not been demonstrated. It should be shown that Dreg1 is required in ILC2 and their progenitors.

      (3) The data on how Dreg1 contributes to the differentiation and or maintenance of ILC2 is not addressed at a very definitive level. Does Dreg1 affect Gata3 expression, mRNA stability, or turnover in ILC2? Previous work of the authors indicated that knockdown of Dreg1 does not affect Gata3 expression (PMID: 32970351).

      (4) How Dreg1 exactly affects ILC2 differentiation remains unclear.

    1. eLife Assessment

      This study presents a platform to implement closed-loop experiments in mice based on auditory feedback. The authors provide convincing evidence that their platform enables a variety of closed-loop experiments using neural or movement signals, indicating that it will be a valuable resource to the neuroscience community. The paper could be strengthened by the addition of additional tutorials, such as on how to run an experiment.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a resource to the systems neuroscience community by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively. The revised preprint has improved substantially upon the previous submission.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab, and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex, and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper, they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      Many of the original weaknesses have been addressed in the revised preprint.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, my excitement for these experiments is tempered by the relative incompleteness of the dataset.

      Additionally, adoption of the platform may be hindered by the absence of a tutorial on how to run a session.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, they present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried out using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2021 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears incomplete, and more generally, the paper shows weaknesses regarding several points:

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. Still, they definitely have value as a starting point for laboratories interested in implementing such approaches.

      Throughout the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not really analyzed, while this finding which does not match previous reports (Clancy et al. 2020) would be important to further examine.

    4. Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and the authors call it CLoPy. Authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenarios.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the closed-loop system, mice have shown to better performance, learnt arbitrary tasks and can adapt to changes in the rules as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems design. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary: 

      The authors provide a resource to the systems neuroscience community, by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      There are several weaknesses in the paper that diminish the impact of its strengths. First, the value of the CLoPy platform is not clearly articulated to the systems neuroscience community. Similarly, the resource could be better positioned within the context of the broader open-source neuroscience community. For an example of how to better frame this resource in these contexts, I recommend consulting the pyControl paper. Improving this framing will likely increase the accessibility and interest of this paper to a less technical neuroscience audience, for instance by highlighting the types of experimental questions CLoPy can enable.

      We appreciate the editor’s feedback regarding the clarity of the CLoPy platform's value and its positioning within the broader neuroscience community. We agree and understand the importance of effectively communicating the utility of CLoPy to both the systems neuroscience field and the wider open-source neuroscience community.

      To address this, we have revised the introduction and discussion sections of the manuscript to more clearly articulate the unique contributions of the CLoPy platform. Specifically:

      (1) We have emphasized how CLoPy can address experimental questions in systems neuroscience by highlighting its ability to enable real-time closed-loop experiments, such as investigating neural dynamics during behavior or studying adaptive cortical reorganization after injury. These examples are aimed at demonstrating its practical utility to the neuroscience audience.

      (2) We have positioned CLoPy within the broader open-source neuroscience ecosystem, drawing comparisons to similar resources like pyControl. We describe how CLoPy complements existing tools by focusing on real-time optical feedback and integration with genetically encoded indicators, which are becoming increasingly popular in systems neuroscience. We also emphasize its modularity and ease of adoption in experimental settings with limited resources.

      (3) To make the manuscript more accessible to a less technically inclined audience, we have restructured certain sections to focus on the types of experiments CLoPy enables, rather than the technical details of the implementation.

      We have consulted the pyControl paper, as suggested, and have used it as a reference point to improve the framing of our resource. We believe these changes will increase the accessibility and appeal of the paper to a broader neuroscience audience.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, and an analysis of the movement-feedback experiments, my excitement for these experiments is tempered by the relative incompleteness of the dataset, as well as its description and analysis in the text. For instance, in the neurofeedback experiment, many of these regions only have data from a single mouse, limiting the conclusions that can be drawn. Additionally, there is a lack of reporting of the quantitative results in the text of the document, which is needed to better understand the degree of the results. Finally, the writing of the results section could use some work, as it currently reads more like a methods section.

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate the time and effort you took to review our work and provide detailed suggestions for improvement. Below, we address the key points raised in your review:

      (1) Dataset Completeness: We acknowledge that some of the neurofeedback experiments include data from only a single mouse for some cortical regions while for some cortical regions, there are several animals. This was due to practical constraints during the study, and we understand the limitations this poses for drawing broad conclusions. We felt it was still important to include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future. To address this, we have revised the text to explicitly acknowledge these limitations and clarify that the results for some regions are exploratory in nature. We believe our flexible tool will provide a means for our lab and others include more animals representing additional cortical regions in future studies. Importantly, we have included all raw and processed data as well as code for future analysis.

      (2) Quantitative Results: We recognize the importance of reporting quantitative results in the text for better clarity and interpretation. In response, we have added more detailed description of the quantitative findings from both the neurofeedback and movement-feedback experiments. This will include effect sizes, statistical measures, and key numerical results to provide a clearer understanding of the degree and significance of the observed effects.

      (3) Results Section Writing: We appreciate your observation that parts of the results section read more like a methods section. To improve clarity and focus, we have restructured the results section to present the findings in a more concise and interpretative manner, while moving overly detailed descriptions of experimental procedures to the methods section.

      Suggestions for improved or additional experiments, data or analyses:

      Not necessary for this paper, but it would be interesting to see if the CLNF group could learn without auditory feedback.

      This is a great suggestion and certainly something that could be done in the future.

      There are no quantitative results in the results section. I would add important results to help the reader better interpret the data. For example, in: "Our results indicated that both training paradigms were able to lead mice to obtain a significantly larger number of rewards over time," You could show a number, with an appropriate comparison or statistical test, to demonstrate that learning was observed.

      Thank you for pointing this out. We have mentioned quantification values in the results now, along with being mentioned in the figure legends, and we are quoting it in following sentences. “A ΔF/F0 threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, N=23, n=60 and CLNF Rule-change, N=17, n=60) were able to discover the task rule and perform above 80% over ten days of training (Figure 4A, RM ANOVA p=2.83e-5), and Rule-change mice even learned a change in ROIs or rule reversal (Figure 4A, RM ANOVA p=8.3e-10, Table 5 for different rule changes). There were no significant differences between male and female mice (Supplementary Figure 3A).”

      For: "Performing this analysis indicated that the Raspberry Pi system could provide reliable graded feedback within ~63 {plus minus} 15 ms for CLNF experiments." The LED test shows the sending of the signal, but the actual delay for the audio generation might be longer. This is also longer than the 50 ms mentioned in the abstract.

      We appreciate the reviewer’s insightful comment. The latency reported (~63ms) was measured using the LED test, which captures the time from signal detection to output triggering on the Raspberry Pi GPIO. We agree that the total delay for auditory feedback generation could include an additional latency component related to the digital-to-analog conversion and speaker response. In our setup, we employ a fast Audiostream library written in C to generate the audio signal and expect the delay contribution to be negligible compared to the GPIO latency. Though we did not do this, it can be confirmed by an oscilloscope-based pilot measurement (for additional delay calculation). We have updated the manuscript to clarify that the 63 ± 15 ms value reflects the GPIO-triggered output latency, and we have revised the abstract to accurately state the delay as “~63 ms” rather than 50 ms. This ensures consistency and avoids underestimation of the latency. We have corrected the LED latency for CLNF and CLMF experiments in the abstract as well.

      It could be helpful to visualize an individual trial for each experiment type, for instance how the audio frequency changes as movement speed / calcium activity changes.

      We have added Supplementary Figure 8 that contains this data where you can see the target cortical activity trace, target paw speed, rewards, along with the audio frequency generated.

      The sample sizes are small (n=1) for a few groups. I am excited by the variety of regions recorded, so it could be beneficial for the authors to collect a few more animals to beef up the sample sizes.

      We've acknowledged that some of the sample sizes are small. Importantly, we have included raw and processed data as well as code for future analysis. We felt it was still important to still include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future.

      I am curious as to why 60 trials sessions were used. Was it mostly for the convenience of a 30 min session, or were the animals getting satiated? If the former, would learning have occurred more rapidly with longer sessions?

      This is a great observation and the answer is it was mostly due to logistical reasons. We tried to not keep animals headfixed for more than 45 minutes in each session as they become less engaged with long duration headfixed sessions. After headfixing them, it takes about 15 minutes to get the experiment going and therefore 30 - 40 minutes long recorded sessions seemed appropriate before they stop being engaged or before they get satiated in the task. We provided supplemental water after the sessions and we observed that they consumed water after the sessions so they were not fully satiated during the sessions even when they performed well in the task and got maximum rewards. We also had inter-trial rest periods of 10s that elongated the session duration. We think it would be interesting to explore the relationship between session duration(number of trials) and task learning progression over the days in a separate study.

      Figure 4E is interesting, it seems like the changes in the distribution of deltaF was in both positive and negative directions, instead of just positive. I'd be curious as to the author's thoughts as to why this is the case. Relatedly, I don't see Figure 4E, and a few other subplots, mentioned in the text. As a general comment, I would address each subplot in the text.

      We have split Figure 4 into two to keep the figures more readable. Previous Figure 4E-H are now Figure 5A-D in the revised manuscript. The online real-time CLNF sessions were using a moving window average to calculate ΔF/F<sub>0</sub>  and the figures were generated by averaging the whole recorded sessions. We have added text in Methods under “Online ΔF/F<sub>0</sub>calculation” and “Offline ΔF/F<sub>0</sub> calculation” sections making it clear about how we do our ΔF/F<sub>0</sub> normalization based on average fluorescence over the entire session. Using this method of normalization does increase the baseline so that some peaks appear to be below zero. Additionally, it is unclear what strategy animals are employing to achieve the rule specific target activity. The task did not constrain them to have a specific strategy for cortical activation - they were rewarded as long as they crossed the threshold in target ROI(s). For example, in 2-ROI experiments, to increase ROI1-ROI2 target activity, they could increase activity of ROI1 relative to ROI2 or decreased activity of ROI1 relative to ROI1 - both would have led to a reward as long as the result crossed the threshold.

      We have now addressed and added reference to the figures in the text in Results under “Mice can explore and learn an arbitrary task, rule, and target conditions” and “Mice can rapidly adapt to changes in the task rule” sections - thanks for pointing this out.

      For: "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time," I would provide a visual summary showing the learning curves for the different types of regions.

      We have rewritten this section to emphasize that these conclusions were based on pooled data from multiple regions of interest. The sample sizes for each type of region are different and some are missing. We believe it would be incomplete and not comparable to present this as a regular analysis since the sample sizes were not balanced. We would be happy to dive deeper into this and point to the raw and processed dataset if anyone would like to explore this further by GitHub or other queries.

      Relatedly, I would further explain the fast vs slow learners, and if they mapped onto certain regions.

      Mice were categorized into fast or slow learners based on the slope of learning over days (reward progression over the days) as shown in Supplementary Figure 3C,D. Our initial aim was not to probe cortical regions that led to fast vs slow learning but this was a grouping we did afterwards. Based on the analysis we did, the fast learners included the sensory (V1), somatosensory (BC, HL), and motor (M1, M2) areas, while the slow learners included the motor (M1, M2), and higher order (TR, RL) cortical areas. Testing all dorsal cortical areas would be prudent to establish their role in fast or slow learning and it is an interesting future direction.

      Also I would make the labels for these plots (e.g. Supp Fig3) more intuitive, versus the acronyms currently used.

      We have made more expressive labels and explained the acronyms below the Supplementary Figure 3.

      The CLMF animals showed a decrease in latency across learning, what about the CLNF animals? There is currently no mention in the text or figures.

      We have now incorporated the CLNF task latency data into both the Results text and Figure 4C. Briefly, task latency decreased as performance improved, increased following a rule change, and then decreased again as the animals relearned the task. The previous Figure 4C has been updated to Figure 4D, and the former Figure 4D has been moved to Supplementary Figure 4E.

      Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, the present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2020 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears uncomplete, and more generally the paper tends to offer overstatements regarding several points:

      In contrast to the introductory statements of the article, closed-loop physiology in rodents is a well-established research topic. Beyond auditory feedback, this includes optogenetic feedback (O'Connor et al. 2013, Abbasi et al. 2018, 2023), electrical feedback in hippocampus (Girardeau et al. 2009), and much more.

      We have included and referenced these papers in our introduction section (quoted below) and rephrased the part where our previous text indicated there are fewer studies involving closed-loop physiology.

      “Some related studies have demonstrated the feasibility of closed-loop feedback in rodents, including hippocampal electrical feedback to disrupt memory consolidation (Girardeau et al.2009), optogenetic perturbations of somatosensory circuits during behavior (O'Connor et al.2013), and more recent advances employing targeted optogenetic interventions to guide behavior (Abbasi et al. 2023).”

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. In particular, the closed-loop latency that they achieve (>60 ms) may be perceived by the mice. This is in contrast with other available closed-loop setups.

      We thank the reviewer for this thoughtful comment and fully agree that our closed-loop latency is larger than that achieved in some other contemporary setups. Our primary aim in presenting this work, however, is not to compete with the lowest possible latencies, but to provide an open-source, accessible, and flexible platform that can be readily adopted by a broad range of laboratories. By building on widely available and lower-cost components, our design lowers the barrier of entry for groups that wish to implement closed-loop imaging and behavioral experiments, while still achieving latencies well within the range that can support many biologically meaningful applications.

      For example, our latency (~60 ms) remains compatible with experimental paradigms such as:

      Motor learning and skill acquisition, where sensorimotor feedback on the scale of tens to hundreds of milliseconds is sufficient to modulate performance.

      Operant conditioning and reward-based learning, in which reinforcement timing windows are typically broader and not critically dependent on sub-20 ms latencies.

      Cortical state dependent modulation, where feedback linked to slower fluctuations in brain activity (hundreds of milliseconds to seconds) can provide valuable insight.

      Studies of perception and decision-making, in which stimulus response associations often unfold on behavioral timescales longer than tens of milliseconds.

      We believe that emphasizing openness, affordability, and flexibility will encourage widespread adoption and adaptation of our setup across laboratories with different research foci. In this way, our contribution complements rather than competes with ultra-low-latency closed-loop systems, providing a practical option for diverse experimental needs.

      Through the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      We fully agree that such a control would provide valuable insight into the contribution of feedback to learning in the CLNF paradigm. In designing our initial experiments, we envisioned multiple potential control conditions, including No-feedback and Random-feedback. However, our first and primary objective was to establish whether mice could indeed learn to modulate cortical ROI activation through auditory feedback, and to further investigate this across multiple cortical regions. For this reason, we focused on implementing the CLNF paradigm directly, without the inclusion of these additional control groups. To broaden the applicability of the system, we subsequently adapted the platform to the CLMF experiments, where we did incorporate a No-feedback group. These results, as the reviewer notes, strengthen the evidence for the role of feedback in shaping task performance. We agree that the inclusion of a No-feedback control group in the CLNF paradigm will be crucial in future studies to further dissect the specific contribution of feedback to cortical conditioning.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not clearly analyzed (for instance looking at the timing of the calcium signal modulation across the two ROIs. It seems that overall ROIs1 and 2 covariate, in contrast to Clancy et al. 2020. How can this be explained?

      We agree that the possibility of increased performance being driven by modulation of a single ROI is an important consideration. Our study indeed began with 1-ROI closed-loop experiments. In those early experiments, while we did observe animals improving performance across days, we realized that daily variability in ongoing cortical GCaMP activity could lead to fluctuations in threshold-crossing events. The 2-ROI design was subsequently introduced to reduce this variability, as the target activity was defined as the relative activity between the two ROIs (e.g., ROI1 – ROI2). This approach offered a more stable signal by normalizing ongoing fluctuations. In our analysis of the early 2-ROI experiments, we observed that animals adopted diverging strategies to achieve threshold crossings. Specifically, some animals increased activity in ROI1 relative to ROI2, while others decreased activity in ROI2 to accomplish the same effect. Once discovered, each animal consistently adhered to its chosen strategy throughout subsequent training sessions. This was an early and intriguing observation, but as the experiments were not originally designed to systematically test this effect, we limited our presentation to the analysis of a small number of animals (shown in Figure 11). We have added details about this observation in our Results section as well, quoted below-

      “In the 2-ROI experiment where the task rule required “ROI1 - ROI2” activity to cross a threshold for reward delivery, mice displayed divergent strategies. Some animals predominantly increased ROI1 activity, whereas others reduced ROI2 activity, both approaches leading to successful threshold crossing (Figure 11)”.

      We hope this clarifies how the use of two ROIs helps explain the apparent covariation of the signals, and why some divergence from the observations of Clancy et al. (2020) may be expected.

      Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and authors call it CLoPy. The authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenario.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the close loop system mice have shown better performance, learnt arbitrary task and can adapt to change in the rule as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems designed. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

      We appreciate the reviewer’s comment and agree that latency is an important factor in our setup. The latency arises partly from the inherent slow kinetics of calcium signaling and GCaMP6s, and partly from the imaging rate of 15 FPS (every 66 ms). These limitations can be addressed in several ways: for example, using faster calcium indicators such as GCaMP8f, or adapting the system to electrophysiological signals, which would require additional processing capacity. In our implementation, image acquisition was fixed at 15 FPS to enable real-time frame processing (256 × 256 resolution) on Raspberry Pi 4B devices. With newer hardware, such as the Raspberry Pi 5, substantially higher acquisition and processing rates are feasible (although we have not yet benchmarked this extensively). More powerful platforms such as Nvidia Jetson or conventional PCs would further support much faster data acquisition and processing.

      Major comments:

      (1) Page 5 paragraph 1: "We tested our CLNF system on Raspberry Pi for its compactness, general-purpose input/output (GPIO) programmability, and wide community support, while the CLMF system was tested on an Nvidia Jetson GPU device." Can these programs and hardware be integrated with windows-based system and a microcontroller (Arduino/ Tency). As for the broad adaptability that's what a lot of labs would already have (please comment/discuss)?

      While we tested our CLNF system on a Raspberry Pi (chosen for its compactness, GPIO programmability, and large user community) and our CLMF system on an Nvidia Jetson GPU device (to leverage real-time GPU-based inference), the underlying software is fully written in Python. This design choice makes the system broadly adaptable: it can be run on any device capable of executing Python scripts, including Windows-based PCs, Linux machines, and macOS systems. For hardware integration, we have confirmed that the framework works seamlessly with microcontrollers such as Arduino or Teensy, requiring only minor modifications to the main script to enable sending and receiving of GPIO signals through those boards. In fact, we are already using the same system in an in-house project on a Linux-based PC where an Arduino is connected to the computer to provide GPIO functionality. Furthermore, the system is not limited to Raspberry Pi or Arduino boards; it can be interfaced with any GPIO-capable devices, including those from Adafruit and other microcontroller platforms, depending on what is readily available in individual labs. Since many neuroscience and engineering laboratories already possess such hardware, we believe this design ensures broad accessibility and ease of integration across diverse experimental setups.

      (2) Hardware Constraints: The reliance on Raspberry Pi and Nvidia Jetson (is expensive) for real-time processing could introduce latency issues (~63 ms for CLNF and ~67 ms for CLMF). This latency might limit precision for faster or more complex behaviors, which authors should discuss in the discussion section.

      In our system, we measured latencies of approximately ~63 ms for CLNF and ~67 ms for CLMF. While such latencies indeed limit applications requiring millisecond precision, such as fast whisker movements, saccades, or fine-reaching kinematics, we emphasize that many relevant behaviors, including postural adjustments, limb movements, locomotion, and sustained cortical state changes, occur on timescales that are well within the capture range of our system. Thus, our platform is appropriate for a range of mesoscale behavioral studies that probably needs to be discussed more. It is also important to note that these latencies are not solely dictated by hardware constraints. A significant component arises from the inherent biological dynamics of the calcium indicator (GCaMP6s) and calcium signaling itself, which introduce slower temporal kinetics independent of processing delays. Newer variants, such as GCaMP8f, offer faster response times and could further reduce effective biological latency in future implementations.

      With respect to hardware, we acknowledge that Raspberry Pi provides a low-cost solution but contributes to modest computational delays, while Nvidia Jetson offers faster inference at higher cost. Our choice reflects a balance between accessibility, cost-effectiveness, and performance, making the system deployable in many laboratories. Importantly, the modular and open-source design means the pipeline can readily be adapted to higher-performance GPUs or integrated with electrophysiological recordings, which provide higher temporal resolution. Finally, we agree with the reviewer that the issue of latency highlights deeper and interesting questions regarding the temporal requirements of behavior classification. Specifically, how much data (in time) is required to reliably identify a behavior, and what is the minimum feedback delay necessary to alter neural or behavioral trajectories? These are critical questions for the design of future closed-loop systems and ones that our work helps frame.

      We have added a slightly modified version of our response above in the discussion section under “Experimental applications and implications”.

      (3) Neurofeedback Specificity: The task focuses on mesoscale imaging and ignores finer spatiotemporal details. Sub-second events might be significant in more nuanced behaviors. Can this be discussed in the discussion section?

      This is a great point  and we have added the following to the discussion section. “In the case of CLNF we have focused on regional cortical GCAMP signals that are relatively slow in kinetics. While such changes are well suited for transcranial mesoscale imaging assessment, it is possible that cellular 2-photon imaging (Yu et al. 2021) or preparations that employ cleared crystal skulls (Kim et al. 2016) could resolve more localized and higher frequency kinetic signatures.”

      (4) The activity over 6s is being averaged to determine if the threshold is being crossed before the reward is delivered. This is a rather long duration of time during which the mice may be exhibiting stereotyped behaviors that may result in the changes in DFF that are being observed. It would be interesting for the authors to compare (if data is available) the behavior of the mice in trials where they successfully crossed the threshold for reward delivery and in those trials where the threshold was not breached. How is this different from spontaneous behavior and behaviors exhibited when they are performing the test with CLNF? 

      We would like to emphasize that we are not directly averaging activity over 6 s to compare against the reward threshold. Instead, the preceding 6 s of activity is used solely to compute a dynamic baseline for ΔF/F<sub>0</sub> ( ΔF/F<sub>0</sub> = (F –F<sub>0</sub> )/F<sub>0</sub>). Here, F<sub>0</sub>is calculated as the mean fluorescence intensity over the prior 6 s window and is updated continuously throughout the session. This baseline is then subtracted from the instantaneous fluorescence signal to detect relative changes in activity. The reward threshold is therefore evaluated against these baseline-corrected ΔF/F<sub>0</sub> values at the current time point, not against an average over 6 s. This moving-window baseline correction is a standard approach in calcium imaging analyses, as it helps control for slow drifts in signal intensity, bleaching effects, or ongoing fluctuations unrelated to the behavior of interest. Thus, the 6-s window is not introducing a temporal lag in reward assignment but is instead providing a reference to detect rapid increases in cortical activity.  We have added the term dynamic baseline to the Methods to clarify.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      Additional suggestions for improved or additional experiments, data or analyses.

      For: "Looking closely at their reward rate on day 5 (day of rule change), they had a higher reward rate in the second half of the session as compared to the first half, indicating they were adapting to the rule change within one session." It would be helpful to see this data, and would be good to see within-session learning on the rule change day

      Thank you for pointing this out. We had missed referencing the figure in the text, and have now added a citation to Supplementary Figure 4A, which shows the cumulative rewards for each day of training. As seen in the plot for day 5, the cumulative rewards are comparable to those on day 1, with most rewards occurring during the second half of the session.

      For: "These results suggest that motor learning led to less cortical activation across multiple regions, which may reflect more efficient processing of movement-related activity," it could also be the case that the behaviour became more stereotyped over learning, which would lead to more concentrated, correlated activity. To test this, it would be good to look at the limb variability across sessions. Similarly, if it is movement-related, there should be good decoding of limb kinematics.

      Indeed, we observed that behavior became more stereotyped over the course of learning, as shown in Supplementary Figure 4C, 4D. One plausible explanation for the reduction in cortical activation across multiple regions is that behavior itself became more stereotyped, a possibility we have explored in the manuscript. Specifically, forelimb movements during the trial became increasingly correlated as mice improved on the task, particularly in the groups that received auditory feedback (Rule-change and No-rule-change groups; Figure 8). As movements became more correlated, overall body movements during trials decreased and aligned more closely with the task rule (Figure 9D). This suggests that reduced cortical activity may in part reflect changes in behavior. Importantly, however, in the Rule-change group, we observed that on the day of the rule switch (day 5), when the target shifted from the left to the right forelimb, cortical activity increased bilaterally (Figure 9A–C). This finding highlights our central point: groups that received feedback (Rule-change and No-rule-change) were able to identify the task rule more effectively, and both their behavior and cortical activity became more specifically aligned with the rule compared to the No-feedback group. We agree with the reviewers that additional analyses along these lines would be valuable future directions. To facilitate this, we have included the movement data for readers who may wish to pursue further analyses, details can be found under “Data and code availability” in Methods section. However, given the limited sample sizes in our dataset and the need to keep the manuscript focused on the central message, we felt that including these additional analyses here would risk obscuring the main findings.

      For: "We believe the decrease in ΔF/F0peak is unlikely to be driven by changes in movement, as movement amplitudes did not decrease significantly during these periods (Figure 7D CLMF Rule-change)." I would formally compare the two conditions. This is an important control. Also, another way to see if the change in deltaF is related to movement would be to see if you can predict movement from the deltaF.

      Figure 7D in the previous version is Figure 9D in the current revision of the manuscript. We've assessed this for the examples shown based on graphing the movement data, unfortunately there is not enough of that data to do a group analysis of movement magnitude. We would suggest that this would be an excellent future direction that would take advantage of the flexible open source nature of our tool.

      Recommendations for improving the writing and presentation.

      In the abstract there is no mention of the rationale for the project, or the resulting significance. I would modify this to increase readership by the behavioral neuroscience community. Similarly, the introduction also doesn't highlight the value of this resource for the field. Again, I think the pyControl paper does a good job of this. For readability, I would add more subheadings earlier in the results, to separate the different technical aspects of the system.

      We have revised the introduction to include the rationale for the project, its potential implications, and its relevance for translational research. We have also framed the work within the broader context of the behavioral and systems neuroscience community. We greatly appreciate this suggestion, as we believe it enhances the clarity and accessibility of the manuscript for the community.

      For: "While brain activity can be controlled through feedback, other variables such as movements have been less studied, in part because their analysis in real time is more challenging." I would highlight research that has studied the control of behavior through feedback, such as the Mathis paper where mice learn to pull a joystick to a virtual box, and adapt this motion to a force perturbation.

      We have added a citation to the Mathis paper and describe this as an additional form of feedback. The text is quoted below:

      “Opportunities also exist in extending real time pose classification (Forys et al. 2020; Kane et al. 2020) and movement perturbation (Mathis et al. 2017) to shape aspects of an animal’s motor repertoire.”

      Some of the results content would be better suited for the methods, one example: "A previous version of the CLNF system was found to have non-linear audio generation above 10 kHz, partly due to problems in the audio generation library and partly due to the consumer-grade speaker hardware we were employing. This was fixed by switching to the Audiostream (https://github.com/kivy/audiostream) library for audio generation and testing the speakers to make sure they could output the commanded frequencies"

      This is now moved to the Methods section.

      For: "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19), supporting the idea that neural efficiency could improve with learning," not sure I agree with this, the studies on cortical plasticity are usually to show a neural basis for the learning observed, efficiency is separate from this.

      We have modified this statement to remove the concept of efficiency "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19).”

      The paragraph that opens "Distinct task- and reward-related cortical dynamics" that describes the experiment should appear in the previous section, as the data is introduced there.

      We have moved the mentioned paragraphs in the previous section where we presented the data and other experiment details. This makes the text more readable and contextual.

      I would present the different ROI rules with better descriptors and visualization to improve the readability.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments.

      Minor corrections to the text and figures.

      Figure 1 is a little crowded, combining the CLNF and CLMF experiments, I would turn this into a 2 panel figure, one for each, similar to how you did figure 2.

      We have revised Figure 1 to include two panels, one for CLNF and one for CLMF. The colored components indicate elements specific to each setup, while the uncolored components represent elements shared between CLNF and CLMF. Relevant text in the manuscript is updated to refer to these figures.

      For Figure 2, the organization of the CLMF section is not intuitive for the reader. I would reorder it so it has a similar flow as the CLNF experiment.

      We have revised the figure by updating the layout of panel B (CLMF) to align with panel A (CLNF), thereby creating a more intuitive and consistent flow between the panels. We appreciate this helpful suggestion, which we believe has substantially improved the clarity of the figure. The corresponding text in the manuscript has also been updated to reflect these changes.

      For Figure 3, highlight that C and E are examples. They also seem a little out of place, so they could even be removed.

      We have now explicitly labeled Figures 3C and 3E as representative examples (figure legend and on figure itself). We believe including these panels provides helpful context for readers: Figure 3C illustrates how the ROIs align on the dorsal cortical brain map with segmented cortical regions, while Figure 3E shows example paw trajectories in three dimensions, allowing visualization of the movement patterns observed during the trials.

      In the plots, I would add sample sizes, for instance, in CLNF learning curve in Figure 4A, how many animals are in each group? 

      We have labeled Figure 4 with number of animals used in CLNF (No-rule-change, N=23; Rule-change, N=17), and CLMF (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4).

      Also, Figure 7 for example, which figures are single-sessions, versus across animals? For Figure 7c, what time bin is the data taken from?

      We have clarified this now and mentioned it in all the figures. Figure 7 in the previous version is Figure 9 in the current updated manuscript. Figure 9A is from individual sessions on different days from the same mouse. Figure 9B is the group average reward centered ΔF/F<sub>0</sub> activity in different cortical regions (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4). Figure 9C shows average ΔF/F<sub>0</sub> peak values obtained within -1sec to +1sec centered around the reward point (N=8).

      It says "punish" in Figure 3, but there is no punishment?

      Yes, the task did not involve punishment. Each trial resulted in either a success, which is followed by a reward, or a failure, which is followed by a buzzer sound. To better reflect these outcomes, we have updated Figure 3 and replaced the labels “Reward” with “Success” and “Punish” with “Failure.”

      The regression on 5c doesn't look quite right, also this panel is not mentioned in the text.

      The figure referred to by the reviewer as Figure 5 is now presented as Figure 6 in the revised manuscript. Regarding the reviewer’s observation about the regression line in the left panel of Figure 5C, the apparent misalignment arises because the majority of the data points are densely clustered at the center of the scatter plot, where they overlap substantially. The regression line accurately reflects this concentration of overlapping data. To improve clarity, we have updated the figure and ensured that it is now appropriately referenced in the Results section.

      Reviewer #2 (Recommendations for the authors):

      (1) There would be many interesting observations and links between the peripheral and cortical studies if there was a body video available during the cortical study. Is there any such data available?

      We agree that a detailed analysis of behavior during the CLNF task would be necessary to explore any behavior correlates with success in the task. Unfortunately, we do not have a sufficient video of the whole body to perform such an analysis.

      (2) The text (p. 24) states: [intracortical GCAMP transients measured over days became more stereotyped in kinetics and were more correlated (to each other) as the task performance increased over the sessions (Figure 7E).] But I cannot find this quantification in the figures or text?

      Figure 7 in the previous version of the manuscript now appears as Figure 9. In this figure, we present cortical activity across selected regions during trials, and in Figure 9E we highlight that this activity becomes more correlated. Since we did not formally quantify variability, we have removed the previous claim that the activity became stereotyped and revised the text in the updated manuscript accordingly.

      Typos:

      10-serest c (page 13)

      Inverted color codes in figure 4E vs F

      Reviewer #3 (Recommendations for the authors):

      We have mostly attempted to limit the feedback to suggestions and posed a few questions that might be interesting to explore given the dataset the authors have collected.

      Comments:

      In close loop systems the latency is primary concern, and authors have successfully tested the latency of the system (Delay): from detection of an event to the reaction time was less than 67ms.

      We have commented on the issues and limitations caused by latency, and potential future directions to overcome these challenges in responses to some of the previous comments.

      Additional major comments:

      "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time (Figure 4A, Animation 1)." Fig 4A is merely showing change in task performance over time and does not have information regarding the changes observed specific to CLNF for each ROI.

      We acknowledge that the sample size for individual ROI rules was not sufficient for meaningful comparisons. To address this limitation, we pooled the data across all the rules tested. The manuscript includes a detailed list of the rules along with their corresponding sample sizes for transparency.

      A ΔF/F<sub>0</sub> threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, n=28 and CLNF Rule-change, n=13). It is unclear what the replicates here are. Trials or mice? The corresponding Figure legend has a much smaller n value.

      Thank you for pointing this out. We realized that we had not indicated the sample replicates in the figure, and the use of n instead of N for the number of animals may have been misleading. We have now corrected the notation and clarified this information in the figure to resolve the discrepancy.

      What were the replicates for each ROI pairs evaluated?

      Each ROI rule and number of mice and trials are listed in Table 5 and Table 6.

      Our analysis revealed that certain ROI rules (see description in methods) lead to a greater increase in success rate over time than others (Supplementary Figure 3D). The Supplementary figures 3C and 3D are blurry and could use higher resolution images. 

      We have increased the font size of the text that was previously difficult to read and re-exported the figure at a higher resolution (300 DPI). We believe these changes will resolve the issue.

      Also, It will help the reader is a visual representation of the ROI pairs are provided, instead of the text view. One interesting question is whether there are anatomical biases to fast vs slow learning pairs (Directionality - anterior/posterior, distance between the selected ROIs etc). This could be interesting to tease apart.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments. While a detailed investigation of the anatomical basis of fast versus slow learning cortical ROIs is beyond the scope of the present study, we agree that this represents an exciting future direction for further research.

      How distant should the ROIs be to achieve increased task performance?

      We appreciate this insightful question. We did not specifically test this scenario. In our study, we selected 0.3 × 0.3 mm ROIs centered on the standard AIBS mouse brain atlas (CCF). At this resolution, ROIs do not overlap, regardless of their placement in a two-ROI experiment. Furthermore, because our threshold calculations are based on baseline recordings, we expect the system would function for any combination of ROI placements. Nonetheless, exploring this systematically would be an interesting avenue for future experiments.

      Figures:

      I would leave out some of the methodological details such as the protocol for water restriction (Fig. 3) out of the legend. This will help with readability.

      We have removed some of the methodological details, including those mentioned above, from the legend of Figure 3 in the updated manuscript.

      Fig 1 and Fig 2: In my opinion, It would be easier for the reader if the current Fig. 2, which provides a high level description of CLNF and CLBF is presented as Fig. 1. The current Fig. 1, goes into a lot of methodological implementation details, and also includes a lot of programming jargon that is being introduced early in the paper that is hard to digest early on in the paper's narrative.

      Thank you for the suggestion. In the new manuscript, Figure 1 and Figure 2 have been swapped.

      Higher-resolution images/ plots are needed in many instances. Unsure if this is the pdf compression done by the manuscript portal that is causing this.

      All figures were prepared in vector graphics format using the open-source software Inkscape. For this manuscript, we exported the images at 300 DPI, which is generally sufficient for publication-quality documents. The submission portal may apply additional processing, which could have resulted in a reduction in image quality. We will carefully review the final submission files and ensure that all figures are clear and of high quality.

      The authors repeatedly show ROI specific analysis M1_L, F1_R etc. It will be helpful to provide a key, even if redundant in all figures to help the reader.

      We have now included keys to all such abbreviations in all the figures.

      There are also instances of editorialization and interpretation e.g., "Surprisingly, the "Rule-change" mice were able to discover the change in rule and started performing above 70% within a day of the rule change, on day 6" that would be more appropriate in the main body of the paper.

      Thank you for pointing this out in the figure legend, and we have removed it now since we already discussed this in the Results.

      Minor comments

      (1) The description of Figure 1 is hard to follow and can be described better based on how the information is processed and executed in the system from source to processing and back. Using separated colors (instead of shaded of grey) for the neuro feedback and movement feedback would help as well. Common components could have a different color. The specification like the description of the config file should come later.

      Figure 1 in the previous version is Figure 2 in the updated version. We have taken suggestions from other reviewers and made the figure easier to understand and split it into two panels with color coding Green for CLNF, Pink for CLMF specific parts while common shared parts are left without any color.

      (2) Page 20 last paragraph:

      Authors are neglecting that the rule change is done one day prior and the results that you see in the second half on the 6th day are not just because of the first half of the 6th day instead combined training on the 5th day (rule change) and then the first half of the 6th day. Rephrasing this observation is essential.

      We have revised the text for clarity to indicate that the performance increase observed on day 6 is not necessarily attributable to training on that day. In fact, we noted and mentioned that mice began to perform the task better during the second half of the session on day 5 itself.

      (3)  The method section description of the CLMF setup (Page no 39 first paragraph) is more detailed, a diagram of this setup would make it easy to follow and a better read.

      We have made changes to the CLMF setup (Figure 1B) and CLMF schematic (Figure 2B) to make it easier to understand parts of the setup and flow of control.

    1. eLife Assessment

      This is a valuable study that integrates behavioral and molecular approaches to identify neuromodulators influencing blood-feeding behavior in the disease vector Anopheles stephensi. Through gene expression analyses across blood-seeking life stages and RNA interference experiments, the authors present solid evidence that co-knockdown of the neuromodulators short Neuropeptide F and RYamide affects blood-seeking states in A. stephensi. However, evidence demonstrating that these neuropeptides are sufficient to promote host-seeking is lacking.

    2. Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of off-target effects, are not adequately discussed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways. 

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neurotranscriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model. 

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria. 

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF. 

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in Author response Image 1 below. 

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.  

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following: 

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligandpromiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns. 

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.  4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (4) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (5) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impactful. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      (a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.  

      (b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.  

      (c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformatting here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides). 

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-hostseeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.  

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios. 

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability. 

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 202ti). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa-mediated control of blood feeding in An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.  

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected. 

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. eLife Assessment

      This paper provides a useful new theory of the hallucinatory effects of 5-HT2A psychedelics. The authors present convincing evidence that a computational model trained with the Wake-Sleep algorithm can reproduce some features of hallucinations by varying the strength of top-down connections in the model, though it is not clear that this model applies to 5-HT2A hallucinogens in particular. The work will be of interest to researchers studying hallucinations or offline activity and plasticity more broadly.

    2. Reviewer #1 (Public review):

      Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that challenge certain mainstream ideas in psychedelic neuroscience.

      While some of my concerns have been addressed in revision, I am still not convinced that this model applies to 5-HT2A hallucinogens, as opposed to a pharmacologically distinct hallucinogen. I think it is important to justify which class of hallucinogens this model applies to and distinguish it from other hallucinogens. While some researchers tend to group several hallucinogens together (e.g., 5-HT2A agonists, NMDA antagonists, kappa-opioids agonists), I'm not convinced this is warranted, when they have distinct subjective and cognitive effects (including quite different visual distortions, and again I point out that the kappa-opioid agonist salvinorin A, which is referred to as an "oneirogen," has been described as particularly dream-like, perhaps more so than 5-HT2A hallucinogens), as well as some differences in therapeutic outcomes (ketamine seems to not have as persisting of therapeutic effects, and kappa-opioid agonist have yet to be shown to be therapeutic). Their use patterns highlight this (e.g., 5-HT2A drugs are used less in non-festival/rave social settings compared to NMDA drugs like ketamine, which can be used frequently enough to the point of abuse; kappa-opioid agonists have quite mixed effects in terms of pleasurable outcomes, thereby rarely being used/abused and almost never to my knowledge being used recreationally).

      In sum, more is needed to justify the claim that this work applies to 5-HT2A drugs in particular.

    3. Reviewer #2 (Public review):

      This work is a nice contribution to the literature in articulating a specific, testable theory of how psychedelics act to generate hallucinations and plasticity.

      I believe my concerns from the first round of review have been addressed in this version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      First, we thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article has been considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we have made to the text.

      Common Concerns (R1 & R2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in our previous text–we have added them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay. 

      Relevant modifications: Page 4, 1st paragraph; Page 11, 1st paragraph.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We have provided a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Relevant modifications: Page 9, final paragraph; Page 12, final paragraph.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We have clarified that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Relevant modifications: Page 4, first paragraph; Page 13, first paragraph.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We have elaborated on this point, and moved the discussion earlier in the text.

      Relevant modifications: Page 1, 1st paragraph; Page 4, 2nd paragraph.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We ultimately decided to remove these discussions from the main text, as they had little bearing on the content of our work. Within the Ethics Declarations section we softened our claims from “millenia” to “centuries,” as indigenous psychedelic use over this latter period of time is well-substantiated.

      Relevant modifications: removed from introduction; modified Ethics Declarations

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. There are two possible additional factors that could contribute to this phenomenon: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We have provided an extended discussion of these nuances in our revision.

      Relevant modifications: Page 1, paragraph 2.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Relevant modifications: Page 9, paragraph 1; Page 10, final paragraph; Page 11, final paragraph.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our Wake-Sleep-trained models and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide an biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b).

      To demonstrate that our proposed hallucination mechanism is capable of producing more complex hallucinations in larger, more powerful models, we employed our same hallucination generation mechanism in a pretrained Very Deep Variational Autoencoder (VDVAE) (Child et al., 2021), which is a hierarchical variational autoencoder with a nearly identical structure compared to our Wake-Sleep-trained networks, with both a bottom-up inference pathway and a top-down generative pathway that maps cleanly onto our multicompartmental neuron model. VDVAEs are trained on the same objective function as our Wake-Sleep-trained networks, but using the backpropagation algorithm. The VDVAE models were able to generate much more complex hallucinations (emergence of complex geometric patterns, smooth deformations of objects and faces), whose complexity arguably exceeds those produced by the DeepDream algorithm. Therefore while the VDVAEs are less biologically realistic (they do not learn via local synaptic plasticity), they function as a valuable high-level model of hallucination generation that complements our Wake-Sleep-trained approach. As further validation, we were also able to replicate our key results and testable predictions with these models.

      Relevant modifications: Results section “Modeling hallucinations in large-scale pretrained networks”; Figure 6, S7, S8; Page 12, paragraph 3; Methods section “Generating hallucinations in hierarchical variational autoencoders.”

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We have added a discussion of this in our ‘Model Limitations’ section.

      Relevant modifications: Page 12, paragraph 4.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      Relevant modifications: Page 10, paragraph 1.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      In our revised submission, ‘ripple’ phenomena are now visible in two places: Fig 2c-d, and Fig 6 (rows 2 and 3). Because the VDVAE models used to generate Figure 6 produce higher quality generated images, the ripples appearing in these plots are likely more prototypical, but it is not easy to evaluate the quality of these visualizations relative to subjective hallucination phenomena.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our Wake-Sleep-trained model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results. In fact, the pretrained VDVAE models that we worked with do include top-down influence during the Wake-stage inference process, and these models recapitulated our key results and testable predictions (Fig. S8).

      Relevant modifications: Fig. S8; Page 12, paragraph 4.

    1. eLife Assessment

      This valuable study highlights the key role of NK cells and PD-L1+ neutrophils in worsening sepsis responses in the context of MASH (metabolic dysfunction-associated steatohepatitis). It focused on the role of neutrophils in mediating this effect, which is based on a choline-deficient high-fat diet model of various knockouts or selective ablation of immune cell types. While the data presented are of great interest, there are concerns around the reliability of the strength of the evidence provided, which is currently considered incomplete. The study may be of interest to researchers in immunopathological disease mechanisms once confirmatory studies have been completed.

      [Editors' note: the authors no longer have access to the original flow cytometry data and plan to compile new datasets for further consideration.]

    2. Reviewer #1 (Public review):

      Summary:

      By using an established NAFLD model, choline-deficient high-fat diet, Barros et al show that LPS challenge causes excessive IFN-γ production by hepatic NK cells which further induces recruitment and polarization of a PD-L1 positive neutrophil subset leading to massive TNFα production and increased host mortality. Genetic inhibition of IFN-γ or pharmacological blockade of PD-L1 decreases recruitment of these neutrophils and TNFα release, consequently preventing liver damage and decreasing host death.

      Since NAFLD is often accompanied by chronic, low-grade inflammation, it can lead to an overactive but dysfunctional immune response and increase the body's overall susceptibility to infections, therefore this is very important research question.

      Strengths:

      The biggest strength of the manuscript is vast number of mouse strains used.

      Weaknesses:

      After the review, there are still some open questions from my side:

      (1) I would like the authors to defend their choice of diet type since this has not been done in the review/response to authors. In case they cannot, we need additional proof (HFD or WD model).

      (2) Since the authors used same control groups (chow and HFCD), as required by the animal ethics committee, they must have power analysis test to show that the number of controls (but also in other groups) they used is enough to see the effect. Please provide it.

    3. Reviewer #2 (Public review):

      Summary:

      This is an extremely interesting mouse study, trying to understand how sepsis is tolerated during obesity/NAFLD. The researchers combine a well-established model of NASH (Choline-deficiency with High Fat Diet) with a sepsis model (IP injection of 10mg/kg LPS), leading to dramatic mortality in mice. Using this model, they characterize the complex contributions of immune cells. Specifically, they find that NK-cells and Neutrophils contribute the most to mortality in this model due to IFNG and PD-L1+ Neutrophils.

      Strengths:

      The biggest strength of the manuscript is how clear the primary phenotypes/endpoints of their model are. Within 6 hours of LPS injection, there is a stark elevation of liver inflammation and damage, which is exacerbated by a High Fat/CholineDeficient diet (HFCD). And after 1 day, almost all of the mice die. Using these endpoints, the authors were able to identify which cells were critical for mortality in the model and the specific mediators involved.

      Comments on revisions:

      I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

    1. eLife Assessment

      This work describes the establishment of an image analysis pipeline for signal correction, segmentation and quantitative data analysis of multilayered organoid and tumoroid systems. The revised study is important for the field to address many practical challenges in deep-tissue visualization. The image analysis pipeline is well-designed and compelling.

    2. Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhaced laser penetration, dual view registration and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline image, different pre-treatments are done dependent on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses in toto properties of gastruloid nuclear density, patterns of cell division, morphology, deformation and gene expression.

      Strengths:

      The methods developed are sound, well described and well validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research and would be of interest to the wide scientific community.

      Comments on revisions:

      I am happy with the job the authors have done with the revision. No further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques, enabling improved deep-tissue visualization compared with conventional methods. This advanced approach allows comprehensive 3D imaging of whole-mount immunostained gastruloids, capturing both tissue-scale architecture and single-cell-level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The image analysis method for nuclei segmentation was thoroughly benchmarked against existing methods, demonstrating advantages over conventional approaches, and its applicability across diverse datasets was convincingly established. The authors also evaluated the state-of-the-art Cellpose-SAM framework, showing that it performs well on their data and that the authors' preprocessing strategy can further enhance Cellpose-SAM's segmentation performance in deep tissues.<br /> The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven napari platform, facilitating interactive exploration and analysis.

      Weaknesses:

      In my initial review, I noted that the developed image analysis pipeline lacked benchmarking against existing methods and provided only a limited demonstration of its applicability to other datasets. These points have been appropriately addressed in the revised manuscript, and I have no further weaknesses to note.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim was compellingly achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community. Given that suitable datasets for developing advanced 3D cell segmentation methods remain scarce in biological image analysis, the public release of these data is significant and is expected to stimulate further advances in the development of sophisticated computational approaches.

      Comments on revisions:

      The authors have addressed the previous revision thoroughly and appropriately. I have no further suggestions or additional recommendations at this time.

    4. Reviewer #3 (Public review):

      Summary

      The paper presents a imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-depended intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference is very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done with the correlation between nuclear shape deformation and tissue density changes being a interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot) and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      Comments on revisions:

      The minor issues that I originally raised in my first review have been fully resolved in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.  

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.  

      Strengths:  

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      A recommendation should be added on when or under which conditions to use this pipeline. 

      We thank the reviewer for this valuable feedback, we added the text in the revised version, ines 418 to 474. “In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm”.

      “The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed)”.

      “Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered”.

      “Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results”.

      “Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium)”.

      “Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed”.

      Reviewer #2 (Public review):  

      Summary:  

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.  

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.  

      All computational tools developed in this study are released as open-source, Python-based software.  

      Strengths:  

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.  

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics. We added the following sentences to the Discussion, lines 418 to 474, and completed the discussion on applicability with a table showing the purpose, requirements, applicability and limitations of each step of the processing and analysis pipeline.

      “Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope”.

      “Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript”.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. To evaluate our 3D nuclei segmentation model, we tested it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method). The results are added in the manuscript, Fig. S9b.

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.  

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 1.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript.  The results are added in the manuscript, Fig. S9b. Fig3 displays the results qualitatively compared to our trained model Stardist-tapenade.

      Author response image 2.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      CellPose-SAM, which is a recent model developed building on the CellPose framework. The pre-trained model performs well on gastruloids imaged using our pipeline, and performs better than StarDist3D at segmenting elongated objects such as deformed nuclei. The performances are qualitatively compared on Fig. S9a and S10.  We also demonstrate how using local contrast enhancement improves the results of CellPose-SAM (Fig. S10a), showing the versatility of the Tapenade pre-processing module. Tissue-scale, packing-related metrics from Cellpose–SAM labels qualitatively match those from stardist-tapenade as shown Fig.10c and d.

      Appraisal:  

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.  

      Impact and utility:  

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.  

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary  

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.  

      Strengths  

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses  

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:  

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).  

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately:

      https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      A morphometric analysis based on the axial views was added as Fig. S6a of the manuscript, complementary to the XY views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      In lines 64 and 65, it is mentioned that confocal and light-sheet microscopy remain limited to samples under 100μm in diameter. I would recommend revising this sentence. In the paper of Moos and colleagues (also cited in this manuscript; PMID: 38509326), gastruloid samples larger than 100μm are imaged in toto with an open-top dual-view and dual-illumination light-sheet microscope, and live cell behaviour is analysed. Another example, if considering also multi-angle systems, is the impressive work of McDole and colleagues (PMID: 30318151), in which one of the authors of this manuscript is a corresponding author. There, multi-angle light sheet microscopy is used for in toto imaging and reconstruction of post-implantation mouse development (samples much larger than 100μm). Some multi-sample imaging strategies have been developed for this type of imaging system, though not to the sample number extent allowed by the Viventis LS2 system or the Bruker TruLive3D imager, which have higher image quality limitations.

      We thank the reviewer for this remark. As reported in their paper, Moos et al. used dual-view light-sheet microscopy to image gastruloids, which are particularly dense and challenging tissues, with whole-mount samples of approximately 250 µm in diameter. Nevertheless, their image quality metric (DCT) shows a rapid twofold decrease within 50 µm depth (Extended Fig 5.h), whereas with two-photon microscopy, our image quality metric (FRC-QE) decreases by a factor of two over 150 µm in non-cleared samples (PBS) (see Fig. 2 c). While these two measurements (FRC-QE versus DCT) are not directly comparable, the observed difference reflects the superior depth performance of two-photon microscopy, owing in part to the use of non-descanned detectors. In our case, imaging was performed with Hoechst, a blue fluorophore suboptimal for deep imaging, whereas in the Moos dataset (Draq5, far-red), the configuration was more favorable for imaging in depth  which further supports our conclusion.

      In McDole et al, tissues reaching 250µm were imaged from 4 views, but do not reach cellular-scale resolution in deeper layers compatible with cell segmentation to our knowledge.

      We corrected the sentence ‘However, light-sheet and confocal imaging approaches remain limited to relatively small organoids typically under 100 micrometers in diameter ‘ by the following (line 64) :

      “While advances in light-sheet microscopy have extended imaging depth in organoids, maintaining high image quality throughout thick samples remains challenging. In practice, quantitative analyses are still largely restricted to organoids under roughly 100 µm in diameter”.

      It is worth mentioning that two-photon microscopes are much more widely available than light sheet microscopes, and light sheet systems with 2-photon excitation are even less accessible, which makes the described workflow of Gros and colleagues have a wide community interest.  

      We thank the reviewer for this remark, and added this suggestion line 74:

      “Finally, two-photon microscopes are typically more accessible than light-sheet systems and allow for straightforward sample mounting, as they rely on procedures comparable to standard confocal imaging”.

      Reviewer #2 (Recommendations for the authors):  

      Suggestions:  

      A comparison with established pre-trained models for 3D organoid image segmentation (e.g., Cellos[1], AnyStar[2], and DeepStar3D[3], all based on StarDist3D) would help highlight the advantages of the authors' custom StarDist3D model, which has been specifically optimized for two-photon microscopy images.  

      (1)  Cellos: https://doi.org/10.1038/s41467-023-44162-6

      (2)  AnyStar: https://doi.org/10.1109/WACV57701.2024.00742

      (3)  DeepStar3D: https://doi.org/10.1038/s41592-025-02685-4

      We agree with the reviewer that a benchmark against existing segmentation methods is very useful. This is addressed in the revised version, as detailed above (Figure 3).

      Recommendations:  

      Please clarify the following point. In line 195, the authors state, "This allowed us to detect all mitotic nuclei in whole-mount samples for any stage and size." Does this mean that the custom-trained StarDist3D model can detect 100% of mitotic nuclei? It was not clear from the manuscript, figures, or videos how this was validated. Given the reported performance scores of the StarDist3D model for detecting all nuclei, claiming 100% detection of mitotic nuclei seems surprisingly high.

      We thank the reviewer for this comment. As it was detailed in the methods section, the detection score reaches 82%, and only the complete pipeline (detection+minimal manual curation) allows us to detect all mitotic nuclei. To make it clearer, the following precisions were added in the Results section:

      ”To detect division events, we stained gastruloids with phosphohistone H3 (ph3) and trained a separate custom Stardist3D model using 3D annotations of nuclei expressing ph3 (see Methods III H). This model together allowed us to detect nearly all mitotic nuclei in whole-mount samples for any stage and size (Fig.3f and Suppl.Movie 4), and we used minimal manual curation to correct remaining errors.”

      Minor corrections:  

      It appears that Figures 4-6 are missing from the submitted version, but they can be found in the manuscript available on bioRxiv.

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4 to 6.

      In line 185, is the intended phrase "by comparing the 2D predictions and the 2D sliced annotated segments..."? 

      To gain some clarity, we replaced the initial sentence:

      “The f1 score obtained by comparing the 3D prediction and the 3D ground-truth is well approximated by the f1 score obtained by comparing the 2D annotations and the 2D sliced annotated segments, with at most a 5% difference between the two scores.” by

      “The f1 score obtained in 3D (3D prediction compared with the 3D ground-truth) is well approximated by the f1 score obtained in 2D (2D predictions compared with the 2D sliced annotated segments). The difference between the 2 scores was at most 5%.”

      Reviewer #3 (Recommendations for the authors):

      (1) How is the "local neighborhood volume" defined, and how was it computed?

      The reviewer is referring to this paragraph (the term is underscored) :

      “To probe quantities related to the tissue structure at multiple scales, we smooth their signal with a Gaussian kernel of width σ, with σ defined as the spatial scale of interest. From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of nuclear volume to local neighborhood volume), and nuclear volume at multiple scales.”

      To improve clarity, the phrasing has been revised: the term local neighborhood volume has been replaced by local averaging volume, and a reference to the Methods section has been added.

      From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of space occupied by nuclear volume within the local averaging volume, as defined in the Methods III I), and nuclear volume at multiple scales.

      (2) In the definition of inertia tensor (18), isn't the inner part normally defined in the reversed way (delta_i,j - ...)?

      We thank the reviewer for noticing this error, which we fixed in the manuscript.

      (3) For intensity normalization, the paper uses the Hoechst signal density as a proxy for a ubiquitous nuclei signal. I would assume that this is problematic, for eg, dividing cells (which would overestimate it). Would using the average Hoechst signal per nucleus mask (as segmentation is available) be a better proxy?

      We agree that this idea is appealing if one assumes a clear relationship between nuclear volume and Hoechst intensity. However, since cell and nuclear volumes vary substantially with differentiation state (see Fig. 4), such a normalization approach would introduce additional biases at large spatial scales. We believe that the most robust improvement would instead consist in masking dividing cells during the normalization procedure, as these events could be detected and excluded from the computation.

      Nonetheless, we believe the method proposed by the reviewer could prove relevant for other types of data, so we will implement this recommendation in the code available in the Tapenade package.

      (4) Figures 4-6 were part of the Supplementary Material, but should be included in the main text?

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4-6.

      We also noticed a missing reference to Fig. S3 in the main text, so we added lines 302 to 307 to comment on the wavelength-dependency of the normalization method. We improved the description of Fig.6, which lacked clarity (line 316 to 321, line 327).

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H. T.; Karatas, E.; Poquillon, T.; Grenci, G.; Furlan, A.; Dilasser, F.; Mohamad Raffi, S. B.; Blanc, D.; Drimaracci, E.; Mikec, D.; Galisot, G.; Johnson, B. A.; Liu, A. Z.; Thiel, C.; Ullrich, O.; OrgaRES Consortium; Racine, V.; Beghin, A. (2025). Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology.  Nature Methods, 22(6), pp.1343-1354

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4:Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

  2. Jan 2026
    1. eLife Assessment

      This important work compares the size of two brain areas, the amygdala and the hippocampus, across 12 species belonging to the Macaca genus. The authors find, using a convincing methodological approach, that amygdala - but not hippocampal - volume varies with social tolerance grade, with high tolerance species showing larger amygdala than low tolerance species of macaques. Interestingly, their findings also suggest an inverted developmental effect, with intolerant species showing an increase in amygdala volume across the lifespan, compared to tolerant species exhibiting the opposite trend. Overall, this paper offers new insights into the neural basis of social and emotional processing.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades such that high-tolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new, important evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old.

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are nicely detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      (4) The following comments were brought up during the review. In their revision, the authors have sufficiently addressed all of these comments by providing detailed responses and updating their manuscript. First, the revision clarified how much one could draw conclusions about "nature vs. nurture" from this study. Second, the revision also clarified the contributions of very young and very old animals in their correlations. Third, in their revision, the authors expanded on how their results could be interpreted in the context of multiple behavioral traits by Thierry (2021) by providing more detailed descriptions. Finally, during the revision, the authors clarified that both intolerant and tolerant species experience complex socio-cognitive demands and highlighted that socio-cognitive challenges arise across the tolerance spectrum under different behavioral demands.

    3. Reviewer #2 (Public review):

      Summary:

      This comparative study of macaque species and type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power they have combined data from 4 centres - that have all used different scanning methods and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focussed on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: 1) that more intolerant species have relatively larger amygdalae, and 2) that with development there is an opposite pattern of volume change (increasing with age in intolerant sp and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. I suspect I would end up doing the same but it feels a bit like 'heads I win, tails you lose'. In the case of Grade 1 species, the individuals have a lot to learn especially if they are not top of the hierarchy, but at the same time there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite to how I read them, in which case the Table and preceding text needs to align.)

      Comments on revisions:

      I am happy with all of the revisions and the care shown by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus that remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The modifications brought up between the two versions of the article have answered my remarks regarding age/grade/brain area differences.

      As such, I think the results are holding strong, but maybe more work is needed with respect to interpretation.<br /> Classification of the social grade, as well as the issue of nature vs nurture have been addressed by the authors, I thank them for this.<br /> I still feel the integration of the amygdala as a common cognitive & emotional center could be possibly more pushed in the discussion, although I acknowledge that it would be complicated to do without knowing how the emotional and social lives of these animals impacted the growth of their amygdala...

      Strengths:

      Methods & breadth of species tested

      Weaknesses:

      Interpretations, which, although softened, could still be more integrated with the literature on emotion

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      We thank Reviewer #1 for its thoughtful and constructive feedback. We found the suggestions particularly helpful in refining the conceptual framework and clarifying key aspects of our interpretations.

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades, such that hightolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      Strengths:

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old - an age that is rare in the wild but more common in captive settings. 

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are well detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      Weaknesses:

      (1) The nature vs. nurture distinction is an important one, but it may be difficult to draw conclusions about "nature" in this case, given that only two data points (from grades 3 and 4) come from animals under one year of age (Method Figure 1D). Most brains were collected after substantial social exposure-typically post age 1 or 1.5-so the data may better reflect developmental changes due to early life experience rather than innate wiring. It might be helpful to frame the findings more clearly in terms of how early experiences shape development over time, rather than as a nature vs. nurture dichotomy.

      We agree with the reviewer that presenting our findings through a strict nature vs. nurture dichotomy was potentially misleading. We have revised the introduction and the discussion (e.g. lines 85-95 and 363-365) to clarify that we examined how neurodevelopmental trajectories differ across social grades with the caveat of related to the absence of very young individuals in our samples.  We now explicitly mention that our results may reflect both early species-typical biases and experience-dependent maturation.

      We positioned our study on social tolerance in a comparative neuroscience framework and introduced a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates

      Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organize these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024).

      “Cross-fostering experiments (De Waal and Johanowicz, 1993), along with our own results, suggest that social tolerance grades reflect both early, possibly innate predispositions and later environmental shaping”.

      (2) It would be valuable to clarify how the older individuals, especially those 20+ years old, may have influenced the observed age-related correlations (e.g., positive in grades 1-2, negative in grades 3-4). Since primates show well-documented signs of aging, some discussion of the potential contribution of advanced age to the results could strengthen the interpretation.

      We thank the reviewer for highlighting this important point. In our dataset, younger and older subjects are underrepresented, but they are distributed across all subgroups. Therefore, we do not think that it could drive the interaction effect we are reporting. In our sample, amygdala volume tended to increase with age in intolerant species and decrease in tolerant species. We included a new analysis (Figure 4) that allows providing a clearer assessment of when social grades 1 vs 4 differed in terms of amygdala and hippocampus volume. While our model accounts for age continuously, we agree that age-related variation deserves cautious interpretation and require longitudinal designs in future studies.

      We also added the following statements in the discussion (lines 386-391)

      “Due to a limited sample size of our study, this crossing trend, already accounted for by our continuous age model, should be further investigated. These results call for cautious interpretation of age-related variation and further emphasize the importance of longitudinal studies integrating both behavioral, cognitive and anatomical data in non-human primates, which would help to better understand the link between social environment and brain development (Song et al., 2021)”.

      (3) The authors categorize the behavioral traits previously described in Thierry (2021) into 3 selfdefined cognitive requirements, however, they do not discuss under what conditions specific traits were assigned to categories or justify why these cognitive requirements were chosen. It is not fully clear from Thierry (2021) alone how each trait would align with the authors' categories. Given that these traits/categories are drawn on for their neuroanatomical hypotheses, it is important that the authors clarify this. It would be helpful to include a table with all behavioral traits with their respective categories, and explain their reasoning for selecting each cognitive requirement category.

      Thank you for this important suggestion. We have extensively revised the introduction to explain how we derived from the scientific literature the three cognitive dimensions—socio-cognitive demands, behavioral inhibition, and predictability of the social environment—. We now provide a complete overview of the 18 behavioral traits described in Thierry’s framework and their cognitive classification in a dedicated table , along with hypothesized neural correlates. We have also mentioned traits that were not classified in our framework along with short justification of this classification. We believe this addition significantly improves the transparency and intelligibility of our conceptual approach.

      “The concept of social tolerance, central to this comparative approach, has sometimes been used in a vague or unidimensional way. As Bernard Thierry (2021) pointed out, the notion was initially constructed around variations in agonistic relationships – dominance, aggressiveness, appeasement or reconciliation behaviors – before being expanded to include affiliative behaviors, allomaternal care or male–male interactions (Thierry, 2021). These traits do not necessarily align along a single hierarchical axis but rather reflect a multidimensional complexity of social style, in which each trait may have co-evolved with others (Thierry, 2021, 2000; Thierry et al., 2004). Moreover, the lack of a standardized scientific definition has sometimes led to labeling species as “tolerant” or “intolerant” without explicit criteria (Gumert and Ho, 2008; Patzelt et al., 2014). These behavioral differences are characterized by different styles of dominance (Balasubramaniam et al., 2012), severity of agonistic interactions (Duboscq et al., 2014), nepotism (Berman and Thierry, 2010; Duboscq et al., 2013; Sueur et al., 2011) and submission signals (De Waal and Luttrell, 1985; Rincon et al., 2023), among the 18 covariant behavioral traits described in Thierry's classification of social tolerance (Thierry, 2021, 2017, 2000)”.

      “To ground the investigation of social tolerance in a comparative neuroanatomical framework, we introduce a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates. Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organized these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024; Testard 2022)”.

      (4) One of the main distinctions the authors make between high social tolerance species and low tolerance species is the level of complex socio-cognitive demands, with more tolerant species experiencing the highest demands. However, socio-cognitive demands can also be very complex for less tolerant species because they need to strategically balance behaviors in the presence of others. The relationships between socio-cognitive demands and social tolerance grades should be viewed in a more nuanced and context-specific manner. 

      We fully agree and we did not mean that intolerant species lives in a ‘simple’ social environment but that the ones of more tolerant species is markedly more demanding. Evidence supporting this statement include their more efficient social networks (Sueur et al., 2011) and more complex communicative skills (e.g. tolerant macaques displayed higher levels of vocal diversity and flexibility than intolerant macaques in social situation with high uncertainty (Rebout et al., 2020).

      In the revised version (lines 106-122), we now highlight that socio-cognitive challenges arise across the tolerance spectrum, including in less tolerant species where strategic navigation of rigid hierarchies and risk-prone interactions is required. We hope that this addition offers a more balanced and nuanced framing of socio-cognitive demands across macaque societies

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      (5) While the limitations section touches on species-related considerations, the issue of individual variability within species remains important. Given that amygdala volume can be influenced by factors such as social rank and broader life experience, it might be useful to further emphasize that these factors could introduce meaningful variation across individuals. This doesn't detract from the current findings but highlights the importance of considering life history and context when interpreting subcortical volumes-particularly in future studies.

      We have now emphasized this point in the limitations section (lines 441-456). While our current dataset does not allow us to fully control for individual-level variables across all collection centers, we recognize that factors such as rank, social exposure, and individual life history may influence subcortical volumes

      “Although we explained some interspecies variability, adding subjects to our database will increase statistical power and will help addressing potential confounding factors such as age or sex in future studies. One will benefit from additional information about each subject. While considered in our modelling, the social living and husbandry conditions of the individuals in our dataset remain poorly documented. The living environment has been considered, and the size of social groups for certain individuals, particularly for individuals from the CdP, have been recorded. However, these social characteristics have not been determined for all individuals in the dataset. As previously stated, the social environment has a significant impact on the volumetry of certain regions. Furthermore, there is a lack of data regarding the hierarchy of the subjects under study and the stress they experience in accordance with their hierarchical rank and predictability of social outcomes position (McCowan et al., 2022)”. 

      Reviewer #2 (Public review):

      We thank Reviewer #2 for its thoughtful remarks and for acknowledging the value of our comparative approach despite its inherent constraints.

      Summary:

      This comparative study of macaque species and the type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power, they have combined data from 4 centres, which have all used different scanning methods, and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focused on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: (1) that more intolerant species have relatively larger amygdalae, and (2) that with development, there is an opposite pattern of volume change (increasing with age in intolerant species and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case, otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable. 

      We thank the reviewer for this important observation. In the original version, Table 1 presented simplified direct predictions linking social tolerance grades to amygdala and hippocampus volumes. We recognize that this formulation may have created confusion In the revised manuscript, we have thoroughly restructured the table and its accompanying rationale. Table 1 now better reflects our conceptual framework grounded in three cognitive dimensions—sociocognitive demands, behavioral inhibition, and social predictability—each linked to behavioral traits and associated neural hypotheses based on published literature. This updated framework, detailed in lines 144-169 of the introduction, provides a more nuanced basis for interpreting our results and avoids the inconsistencies previously noted. The Discussion was also revised accordingly (lines 329-255) to clarify where our findings diverge from the original predictions and to explore alternative explanations based on social complexity. Rather than directly predicting amygdala size from social tolerance grades, we propose that variation in volume emerges from differing combinations of cognitive pressures across species.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. In the case of Grade 1 species, the individuals have a lot to learn, especially if they are not top of the hierarchy, but at the same time, there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite of how I read them, in which case the Table and preceding text need to align.)

      In order to facilitate the interpretation of our Bayesian modelling, we have selected a more focused ROI in our automatic segmentation procedure of the Hippocampus (from Hippocampal Formation to Hippocampus) and have added to the new analysis (Figure 4) that helps to properly test whether the hippocampus significantly differs between species from social grade 1 vs 4. The present analysis found that this is the case in adult monkeys. This is therefore consistent with our hypothesis that amygdala volumes are principally explained by heightened sociocognitive demands in more tolerant species.

      We also acknowledge the reviewer’s concerns about the limited generalizability due to our sample. The challenges of comparative neuroimaging in non-human primates—especially when using post-mortem datasets—are substantial. Given the ethical constraints and the rarity of available specimens, increasing the number of individuals or species is not feasible in the short term. However, we have made all data and code publicly available and clearly stated the limitations of our sample in the manuscript. Despite these constraints, we believe our dataset offers an unprecedented comparative perspective, particularly due to the inclusion of rare and tolerant species such as M. tonkeana, M. nigra, and M. thibetana, which have never been included in structural MRI studies before. We hope this effort will serve as a foundation for future collaborative initiatives in primate comparative neuroscience.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their thoughtful and detailed review. Their comments helped us refine both the conceptual and interpretative aspects of the manuscript. We respond point by point below.

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species. 

      25 brains were extracted by the authors themselves who are highly with this procedure. Overall, we believe that dissection protocols did not alter the total brain volume. Despite our expertise, we experienced some difficulties to not damage the cerebellum. Therefore, this region was not included in our analysis. We also noted that this brain region was also damaged or absent from the Prime-DE dataset.

      Several protocols were used to prepare and store tissue. It could have impacted the total brain volume.

      We agree that differences in tissue preparation and storage could potentially affect total brain volume. Therefore, we explicitly included the main sample preparation variable — whether brains had been previously frozen — as a covariate in our model. This factor did not explain our results. Moreover, Figures 1D and 1I display the frozen status and its correlation with the amygdala and hippocampus ratios, respectively. Figure 2 shows the parameters of the model and the posterior distributions for the frozen status and total brain volume effects.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to modelpredicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1)  Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      We fully agree with this observation. In the revised version of the manuscript, we now include a detailed conceptual table listing all 18 behavioral traits from Thierry’s framework. For each trait, we provide its underlying social implications, its associated cognitive dimension (when applicable), and the hypothesized neural correlate. 

      While some traits may could have been arguably classified in several cognitive dimensions (e.g. reconciliation rate), we preferred to assign each to a unique dimension for clarity. Additionally, the introduction (lines 95-169 + Table1) now explains how each trait was evaluated based on existing literature and assigned to one of the three proposed cognitive categories: socio-cognitive demands, behavioral inhibition, or social unpredictability. This structure offers a clearer and more transparent basis for the neuroanatomical hypotheses tested in the study.

      “Navigating social life in primate societies requires substantial cognitive resources: individuals must not only track multiple relationships, but also regulate their own behavior, anticipate others’ reactions, and adapt flexibly to changing social contexts. Taken advantage of databases of magnetic resonance imaging (MRI) structural scans, we conducted the first comparative study integrating neuroanatomical data and social behavioral data from closely related primate species of the same genus to address the following questions: To what extent can differences in volumes of subcortical brain structures be correlated with varying degrees of social tolerance? Additionally, we explored whether these dispositions reflect primarily innate features, shaped by evolutionary processes, or acquired through socialization within more or less tolerant social environments”.

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      “The second category, inhibitory control, includes traits that involve regulating impulsivity, aggression, or inappropriate responses during social interactions. Tolerant macaques have been shown to perform better in tasks requiring behavioral inhibition and also express lower aggression and emotional reactivity in both experimental and natural contexts (Joly et al., 2017; Loyant et al., 2023). These features point to stronger self-regulation capacities in species with egalitarian or less rigid hierarchies. More broadly, inhibition – especially in its strategic form (self-control) – has been proposed to play a key role in the cohesion of stable social groups. Comparative analyses across mammals suggest that this capacity has evolved primarily in anthropoid primates, where social bonds require individuals to suppress immediate impulses in favour of longer-term group stability (Dunbar and Shultz, 2025). This view echoes the conjecture of Passingham and Wise (2012), who proposed that the emergence of prefrontal area BA10 in anthropoids enabled the kind of behavioural flexibility needed to navigate complex social environments (Passingham et al., 2012)”.

      “The third category, social environment predictability, reflects how structured and foreseeable social interactions are within a given society. In tolerant species, social interactions are more fluid and less kin-biased, leading to greater contextual variation and role flexibility, which likely imply a sustained level of social awareness. In fact, as suggested by recent research, such social uncertainty and prolonged incentives are reflected by stress-related physiology : tolerant macaques such as M. tonkeana display higher basal cortisol levels, which may be indicative of a chronic mobilization of attentional and regulatory resources to navigate less predictable social environments (Sadoughi et al., 2021)”.

      “Each behavioral trait was individually evaluated based on existing empirical literature regarding the types of cognitive operations it likely involves. When a primary cognitive dimension could be identified, the trait was assigned accordingly. However, some behaviors – such as maternal protection, allomaternal care, or delayed male dispersal – do not map neatly onto a single cognitive process. These traits likely emerge from complex configurations of affective and socialmotivational systems, and may be better understood through frameworks such as attachment theory (Suomi, 2008), which emphasizes the integration of social bonding, emotional regulation, and contextual plasticity. While these dimensions fall beyond the scope of the present framework, they offer promising directions for future research, particularly in relation to the hypothalamic and limbic substrates of social and reproductive behavior”.

      “Rather than forcing these traits into potentially misleading categories, we chose to leave them unclassified within our current cognitive framework. This decision reflects both a commitment to conceptual clarity and the recognition that some behaviors emerge from a convergence of cognitive demands that cannot be neatly isolated. This tripartite framework, leaving aside reproductive-related traits, provides a structured lens through which to link behavioral diversity to specific cognitive processes and generate neuroanatomical predictions”.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      As pointed out by Thierry and collaborators, the social tolerance concept is already grounded in a phylogenetic framework as social tolerance matches the phylogenetical tree of these macaque species, suggesting a biological ground of these behavioral observations. Given the modest sample size and uneven species representation, we opted not to adopt tools such as Phylogenetic Generalized Least Squares (PGLS) in our analysis. Our primary aim in this study was to explore neuroanatomical variation as a function of social traits, not to perform a phylogenetic comparative analysis per see. That said, we now explicitly acknowledge this limitation in the Discussion and indicate that future work using larger datasets and phylogenetic methods will be essential to disentangle social effects from evolutionary relatedness. We hope that making our dataset openly available will facilitate such futures analyses.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      We appreciate this insightful observation. Indeed, findings from studies in humans and nonhuman primates showing associations between brain structure and social network size typically rely on detailed life history and behavioral data at the individual level. Unfortunately, such finegrained information was not consistently available across our entire sample. While some individuals from the Centre de Primatologie (CdP) were housed in known group compositions and social settings, we did not have access to longitudinal social data—such as rank, grooming rates, or network centrality—that would allow for robust individual-level analyses. We now acknowledge this limitation more clearly in the Discussion (lines 436-443), and we fully agree that future work combining neuroimaging with systematic behavioral monitoring will be necessary to explore how species-level effects interact with individual social experience.

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by DomínguezBorràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      We thank the reviewer for this important and nuanced point. We have revised the manuscript to adopt a more cautious and integrative tone regarding the function of the amygdala. In the revised Discussion (lines 341-355), we now explicitly state that the amygdala is involved in a broad range of processes—emotional, social, and affective—and that these domains are deeply intertwined. Rather than proposing a strict dissociation, we now suggest that the amygdala supports integrated socio-emotional functions that are mobilized differently across social tolerance styles. We also cite recent relevant literature (e.g., Domínguez-Borràs & Vuilleumier, 2021) to support this view and have removed any claim suggesting we challenge the emotional function of the amygdala per se. Our aim is to contribute to a richer understanding of how affective and social processes co-construct structural variation in this region.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Private Comments:

      (1) Table 1 should be formatted for clarity i.e., bolded table headers, text realignment, and spacing. It was not clear at first glance how information was organized. It may also be helpful to place behavioral traits as the first column, seeing that these traits feed into the author's defined cognitive requirements.

      We have reformatted Table 1 to improve clarity and readability. Behavioral traits now appear in the first column, followed by cognitive dimensions and hypothesized neural correlates. Column headers have been bolded and alignment has been standardized.

      (2) Figures could include more detail to help with interpretations. For example, Figure 3 should define values included on the x-axis in the figure caption, and Figure 4 should explain the use of line, light color, and dark color. Figure 1 does not have a y-axis title.

      The figures have been revised and legends completed to ensure more clarity.

      (3) Please proofread for typos throughout.

      The manuscript has been carefully proofread, and all typographical and grammatical errors have been corrected. These changes are visible in the tracked version.

      Reviewer #2 (Recommendations for the authors):

      Specific comments:

      (1) Given all of the variability would it not be a good idea to just compare (eg in the supplemental) the macaque data from just the Strasbourg centre for m mulatta and m toneanna. I appreciate the ns will be lower, but other matters are more standardized.

      We fully understand the reviewer’s suggestion to restrict the comparison to data collected at a single site in order to minimize inter-site variability. However, as noted, such an analysis would come at the cost of statistical power, as the number of individuals per species within a single center is small. For example, while M. tonkeana is well represented at the Strasbourg centre, only one individual of M. mulatta is available from the same site. Thus, a restricted comparison would severely limit the interpretability of results, particularly for age-related trajectories. To address variability, we included acquisition site and brain preservation method as covariates or predictors where appropriate, and we have been cautious in our interpretations. We also now emphasize in the Methods and Discussion the value of future datasets with more standardized acquisition protocols across species and centers. We hope that by openly sharing our data and workflow, we can contribute to this broader goal.

      (2) I have various minor edits:

      (a) L 25 abstract - Specify what is meant by 'opposite trend'; the reader cannot infer what this is.

      Modified in line 25-28: “Unexpectedly, tolerant species exhibited a decrease in relative amygdala volume across the lifespan, contrasting with the age-related increase observed in intolerant species—a developmental pattern previously undescribed in primates.”

      (b) L67 - The reference 'Manyprimates' needs fixing as it does in the references section.

      After double checking, Manyprimates studies are international collaborative efforts that are supposed to be cite this way (https://manyprimates.github.io/#pubs).

      (c) L74 - Taking not Taken.

      This typo has been corrected.

      (d) L129 - It says 'total volume', but this is corrected total volume?

      We have clarified in the figures legends that the “total brain volume” used in our analyses excludes the cerebellum and the myelencephalon, as specified in our image preprocessing protocol. This ensures consistency across individuals and institutions.

      (e) L138 - Suddenly mentions 'frozen condition' without any prior explanation - this needs explaining in the legend - also L144.

      We have added an explanation of the ‘frozen condition’ variable in in the relevant figure legend.

      (f) L166 - Results - it would be helpful to remind readers what Grade 1 signifies, ie intolerant species.

      We now include a brief reminder in the Results section that Grade 1 corresponds to socially intolerant species, to help readers unfamiliar with the classification (Lines 240-251).

      (g)Figure 4 - Provide the ns for each of the 4 grades to help appreciate the meaningfulness of the curves, etc.

      The number of subjects has been added to the Figure and a novel analysis helps in the revised ms help to appreciate the meaningfulness of some of these curves.

      (h) L235 - 'we had assumed that species of high social tolerance grade would have presented a smaller amygdala in size compared to grade 1'. But surely this is the exact opposite of what is predicted in Table 1 - ie, the authors did not predict this as I read the paper (Unless Table l is misleading/ambiguous and needs clarification).

      As discussed in our response to Reviewer #2 and #3, we have restructured both Table 1 and the Discussion to ensure consistency. We now explicitly state that the findings diverge from our initial inhibitory-control-based prediction and propose alternative interpretations based on sociocognitive demands.

      (i) L270 - 'This observation' which?? Specify.

      We have replaced ‘this observation’ with a precise reference to the observed developmental decrease in amygdala volume in tolerant species.

      (j) L327 - 'groundbreaking' is just hype given that there are so many caveats - I personally do not like the word - novel is good enough.

      We have replaced the word ‘groundbreaking’ with ‘novel’ to adopt a more measured and appropriate tone in the discussion.

      (3) I might add that I am happy with the ethics regarding this study. 

      Thanks, we are also happy that we were able to study macaque brains from different species using opportunistic samplings along with already available data. We are collectively making progress on this!

      (4) Finally, I should commend the authors on all the additional information that they provide re gender/age/species. Given that there are 2xs are many females as males, it would be good to know if this affects the findings. I am not a primatologist, so I don't know, for example, if the females in Grade 1 monkeys are just as intolerant as the males?

      We thank the reviewer for this thoughtful comment. We now explicitly mention the female-biased sex ratio in the Methods section and report in the Results (Figure 2, Figure 3) that sex was included as a covariate in our Bayesian models. While a small effect of sex was found for hippocampal volume, no effect was observed for the amygdala. Given the strong imbalance in our dataset (2:1 female-to-male ratio), we refrained from drawing any conclusion about sex-specific patterns, as these would require larger and more balanced samples. Although we did not test for sex-by-grade interactions, we agree that this question—especially regarding whether females and males express social style differences similarly across grades—represents an important direction for future comparative work.

      Reviewer #3 (Recommendations for the authors):

      I found the article well-written, and very easy to follow, so I have little ways to propose improvements to the article to the authors, besides addressing the various major points when it comes to interpretation of the data.

      One list I found myself wanting was in fact the list of the social tolerance grades, and the process by which they got selected into 3 main bags of socio-cognitive skills. Then it would become interesting to see how each of the 12 species compares within both the 18 grades (maybe once again out of the scope of this paper, there are likely reviews out there that already do that, but then the authors should explicitly mention so in the paper: X, 19XX have compared 15 out of 18 traits in YY number of macaque species); and within the 3 major subcognitive requirements delineated by the authors, maybe as an annex?

      We thank the reviewer for this thoughtful suggestion. In the revised manuscript, we now include a detailed table (Table 1) that lists the 18 behavioral traits derived from Thierry’s framework, along with their associated cognitive dimension and hypothesized neuroanatomical correlate. While we did not create a matrix mapping each of the 12 species across all 18 traits due to space and data availability constraints, we agree this is an important direction that should be tackled by primatologist. We now include a sentence (line 87-90) in the manuscript to guide readers to previous comparative reviews (e.g., Thierry, 2000; Thierry et al., 2004, 2021) that document the expression of these traits across macaque species. We also clarify that our three cognitive categories are conceptual tools intended to structure neuroanatomical predictions, and not formal clusters derived from quantitative analyses.

      In the annex, it would also be good to have a general summarizing excel/R file for the raw data, with important information like age, sex, and the relevant calculated volumes for each individual. The folders available following the links do not make it an easy task for a reader to find the raw data in one place.

      We fully agree with the reviewer on the importance of data accessibility. We have now uploaded an additional supplementary file in .csv format on our OSF repository, which includes individuallevel metadata for all 42 macaques: species, sex, age, social grade, total brain volume, amygdala volume, and hippocampus volume. The link to this file is now explicitly mentioned in the Data Availability section. We hope this will facilitate comparisons with other datasets and improve usability for the community. In addition, we provide in a supplementary table the raw data that were used for our Bayesian modelling (see below).

      The availability of the raw data would also clear up one issue, which I believe results from the modelling process: it looks odd on Figure 2, that volume ratios, defined as the given brain area volume divided by the total brain volume, give values above 1 (especially for the hippocampus). As such, the authors should either modify the legend or the figure. In general, it would be nicer to have the "real values" somewhere easily accessible, so that they can be compared more broadly with: 1) other macaques species to address questions relevant to the species; 2) other primates to address other questions that are surely going to arise from this very interesting work!

      We thank the reviewer for pointing this out. The ratio values in Figure 1 correspond to the proportion of the regional volume (amygdala or hippocampus) relative to the total brain volume, excluding the cerebellum and myelencephalon. As such, values above 0.01 (i.e., above 1% of the brain volume) are expected for these structures and do not indicate an error. We have updated the figure legend to clarify this point explicitly. In addition, we have now made a cleaned .csv file available via OSF, containing all raw volumetric data and metadata in a format that facilitates cross-species or cross-study comparisons. This replaces the previous folder-based structure, which may have been less accessible.

      Typos:

      L233: delete 'in'

      L430: insert space in 'NMT template(Jung et al., 2021).'

    1. eLife Assessment

      The current work uses DNA-tethered motor trapping to reduce vertical forces and improve datasets for kinesin-1 motility under load. The evidence is compelling and the significance is important to the kinesin field. Kinesin-1 is more robust and less prone to premature detachment than previously indicated. This represents a significant advancement in the field and is generally applicable to work with optical tweezers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hensley and Yildez studies the mechanical behavior of kinesin under conditions where the z-component of the applied force is minimized. This is accomplished by tethering the kinesin to the trapped bead with a long double stranded DNA segment as opposed to directly binding the kinesin to the large bead. It complements several recent studies that have used different approaches to looking at the mechanical properties of kinesin under low z-force loads. The study shows that much of the mechanical information gleaned from the traditional "one bead" with attached kinesin approach was probably profoundly influenced by the direction of the applied force. The authors speculate that when moving small vesicle cargos (particularly membrane bound ones) the direction of resisting force on the motor has much less of a z-component than might be experience if the motor were moving large organelles like mitochondria.

      Strengths:

      The approach is sound and provides an alternative method to examine the mechanics of kinesin under conditions where the z-component of the force is lessened. The data show that kinesin has very different mechanical properties compared to those extensively reported with using the "single-bead" assay where the molecule is directly coupled to a large bead which is then trapped.

      Weaknesses:

      The sub stoichiometry binding of kinesins to the multivalent DNA complicates the interpretation of the data.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      This short report by Hensley and Yildiz explores kinesin-1 motility under more physiological load geometries than previous studies. Large Z-direction (or radial) forces are a consequence of certain optical trap experimental geometries, and likely do not occur in the cell. Use of a long DNA tether between the motor and the bead can alleviate Z-component forces. The authors perform three experiments. In the first, they use two assay geometries - one with kinesin attached directly to a bead and the other with kinesin attached via a 2 kbp DNA tether - with a constant-position trap to determine that reducing the Z component of force leads to a difference in stall time but not stall force. In the second, they use the same two assay geometries with a constant-force trap to replicate the asymmetric slip bond of kinesin-1; reducing the Z component of force leads to a small but uniform change in the run lengths and detachment rates under hindering forces but not assisting forces. In the third, they connect two or three kinesin molecules to each DNA, and measure a stronger scaling in stall force and time when the Z component of force is reduced. They conclude that kinesin-1 is a more robust motor than previously envisaged, where much of its weakness came from the application of axial force. If forces are instead along the direction of transport, kinesin can hold on longer and work well in teams. The experiments are rigorous, and the data quality is very high. There is little to critique or discuss. The improved dataset will be useful for modeling and understanding multi-motor transport. The conclusions complement other recent works that used different approaches to low-Z component kinesin force spectroscopy, and provide strong value to the kinesin field.

      Comments on revisions:

      The authors have satisfied all of my comments. I commend them on an excellent paper.

    4. Reviewer #3 (Public review):

      Hensley et al. present an important study into the force-detachment behaviour of kinesin-1, using a newly adapted methodological approach. This new method of DNA-tethered motor trapping is effective in reducing vertical forces and can be easily optimised for other motors and protein characterisation. The major strength of the paper is characterising kinesin-1 under low z-forces, which is likely to reflect the physiological scenario. They find kinesin-1 is more robust and less prone to premature detachment. The motors exhibit higher stall rates and times. Under hindering and assisting loads, kinesin-1 detachment is more asymmetric and sensitive, and with low z-force shows that slip-behaviour kinetics prevail. Another achievement of this paper is the demonstration of the multi-motor kinesin-1 assay using their low-z force method, showing that multiple kinesin-1 motors are capable of generating higher forces (up to 15 pN, and nearly proportional to motor number), thus opening an avenue to study multiple motor coordination. Overall, the data have been collected in a rigorous manner, the new technique is sound and effective, and results presented are compelling.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) My primary concern is that in some of the studies, there are not enough data points to be totally convincing. This is particularly apparent in the low z-force condition of Figure 1C.

      We agree that adequate sampling is essential for drawing robust conclusions. To address this concern, we performed a post hoc sensitivity analysis to assess the statistical power of our dataset. Given our sample sizes (N = 85 and 45) and observed variability, the experiment had 80% power (α = 0.05) to detect a difference in stall force of approximately 0.36 pN (Cohen’s d ≈ 0.38). The actual difference observed between conditions was 0.25 pN (d ≈ 0.26), which lies below the minimum detectable effect size. Thus, the non-significant result (p = 0.16) likely reflects that any true difference, if present, is smaller than the experimental sensitivity, rather than a lack of sufficient sampling.

      Importantly, both measured stall forces fall within the reported range for kinesin-1 in the literature, supporting that the dataset is representative and the measurements are reliable.

      (2) I'm also concerned about Figure 2B. Does each data point in the three graphs represent only a single event? If so, this should probably be repeated several more times to ensure that the data are robust.

      Each data point shown corresponds to the average of many processive runs, ranging from 32 to 167. This has been updated in the figure caption accordingly.

      (3) Figure 3. I'm surprised that the authors could not obtain a higher occupancy of the multivalent DNA tether with kinesin motors. They were adding up to a 30X higher concentration of kinesin, but still did not achieve stoichiometric labeling. The reasons for this should be discussed. This makes interpretation of the mechanical data much tougher. For instance, only 6-7% of the beads would be driven by three kinesins. Unless the movement of hundreds of beads were studied, I think it would be difficult to draw any meaningful insight, since most of the events would be reflective of beads with only one or sometimes two kinesins bound. I think more discussion is required to describe how these data were treated.

      The mass-photometry data in Figure 3B were acquired in the presence of a 3-fold molar excess of kinesin (Supplemental Figure 4) relative to the DNA chassis. In comparison, optical trapping studies were performed at a 10-20-fold molar excess of kinesin, resulting in a substantially higher percentage of chassis with multiple motors. The reason why we had to perform mass photometry measurements at lower molar excess than the optical trap is that at higher kinesin concentrations, the “kinesin-only” peak dominated and obscured 2- or 3-kinesin-bound species, preventing reliable fitting of the mass photometry data. 

      We have now used the mass photometry measurements to extrapolate occupancies under trapping conditions. We estimate 76-93% of 2-motor chassis are bound to two kinesins and ~70% of 3-motor chassis are bound to three kinesins under our trapping conditions. Moreover, the mean forces in Figures 3C–D exceed those expected for a single kinesin, consistent with occupancy substantially greater than one motor per chassis.

      We wrote: “To estimate the percentage of chassis with two and three motors bound, we performed mass photometry measurements at a 3-fold molar excess of kinesin to the chassis, as higher ratios would obscure the distinction of complexes from the kinesin-only population. Assuming there is no cooperativity among the binding sites, we modeled motor occupancy using a Binomial distribution (Figure 3_figure supplement 2). We observed 17-29% of particles corresponded to the two-motor species on the 2-motor chassis in mass photometry, indicating that 45-78% of the 2-motor chassis was bound to two kinesins. Similarly, 15% and 40% of the 3motor chassis were bound to two and three kinesins, respectively.  

      In optical trapping assays, we used 10-fold and 20-fold molar excess of kinesin for 2-motor and 3-motor chassis, respectively, to substantially increase the percentage of the chassis carried by multiple kinesins. Under these conditions, we estimate 76-93% of the 2-motor chassis were bound to two kinesins, and 30% and 70% of 3-motor chassis were bound to two and three kinesins, respectively.”

      “Multi-motor trapping assays were performed similarly using 10x and 20x kinesin for 2- and 3motor chassis, respectively. To estimate the percentage of chassis with multiple motors, we used the probability of kinesin binding to a site on a chassis from mass photometry in 3x excess condition to compute an effective dissociation constant where r is the molar ratio of kinesin to chassis. Single-site occupancy at higher molar excesses of kinesin was calculated using this parameter. ”

      We also added Figure 3_figure supplement 2 to explain our Binomial model.

      (4) Page 5, 1st paragraph. Here, the authors are comparing time constants from stall experiments to data obtained with dynein from Ezber et al. This study used the traditional "one bead" trapping approach with dynein bound directly to the bead under conditions where it would experience high z-forces. Thus, the comparison between the behavior of kinesin at low z-forces is not necessarily appropriate. Has anyone studied dynein's mechanics under low z-force regimes?

      We thank the reviewer for catching a citation error. The text has been corrected to reference Elshenawy et al. 2020, which reported stall time constants for mammalian dynein. 

      To our knowledge, dynein’s mechanics under explicitly low z-force conditions have not yet been reported; however, given the more robust stalling behavior of dynein and greater collective force generation, the cited paper was chosen to compare low z-force kinesin to a motor that appears comparatively unencumbered by z-forces. Our study adds to growing evidence that high z-forces disproportionately limit kinesin performance. 

      For clarification, we modified that sentence as follows: “These time constants are comparable to those reported for minus-end-directed dynein under high z-forces”.

      Reviewer #2 (Recommendations for the authors):

      (1) P3 pp2, a DNA tensiometer cannot control the force, but it can measure it; get the distance between the two ends of the tensiometer, and apply WLC.

      The text has been updated to more accurately reflect the differences between optical trapping and kinesin motility against a DNA tensiometer with a fixed lattice position.

      (2) Fig. 2b, SEM is a poor estimate or error for exponentially distributed run lengths. Other methods, like bootstrapping an exponential distribution fit, may provide a more realistic estimate.

      Run lengths were plotted as an inverse cumulative distribution function and fitted to a single exponential decay (Supplementary Figure S3). The plotted value represents the fitted decay constant (characteristic run length) ± SE (standard error of the fit), not the arithmetic mean ± SEM. Velocity values are reported as mean ± SEM. Detachment rate was computed as velocity divided by run length, except at 6 and 10 pN hindering loads, where minimal forward displacement necessitated fitting run-time decays directly. In those cases, the plotted detachment rate equals the inverse of the fitted time constant. The figure caption has been updated accordingly.

      (3) Kinesin-1 is covalently bound to a DNA oligo, which then attaches to the DNA chassis by hybridization. This oligo is 21 nt with a relatively low GC%. At what force does this oligo unhybridize? Can the authors verify that their stall force measurements are not cut short by the oligo detaching from the chassis?

      The 21-nt attachment oligo (38 % GC) is predicted to have ΔG<sub>37C</sub> ≈-25 kcal/mole or approximately 42 kT. If we assume this is the approximate amount of work required to unhybridize the oligo, we would expect the rupture force to be >15 pN. This significantly exceeds the stall force of a single kinesin. Since the stalling events rarely exceed a few seconds, it is unlikely that our oligos quickly detach from the chassis under such low forces.  

      Furthermore, optical trapping experiments are tuned such that no more than 30% of beads display motion within several minutes after they are brought near microtubules. After stalling events, the motor dissociates from the MT, and the bead snaps back to the trap center. Most beads robustly reengage with the microtubule, typically within 10 s, suggesting that the same motor chassis reengages with the microtubule after microtubule detachment. Successive runs of the same bead typically have similar stall forces, suggesting that the motors do not disengage from the chassis under resistive forces exerted by the trap.

      (4) Figure 1, a justification or explanation should be provided for why events lower than 1.5 pN were excluded. It appears arbitrary.

      Single-motor stall-force measurements used a trap stiffness of 0.08–0.10 pN/nm. At this stiffness, a 1.5 pN force corresponds to 15–19 nm bead displacement, roughly two kinesin steps, and events below this threshold could not be reliably distinguished from Brownian noise. For this reason, forces < 1.5 pN were excluded.

      In Methods, we wrote “Only peak forces above 1.5 pN (corresponding to a 15-19 nm bead displacement) were analyzed to clearly distinguish runs from the tracking noise.”

      (5) Figure 2b, is the difference in velocity statistically significant?

      The difference in velocity is statistically significant for most conditions. We did not compare velocities for -10 and -6 pN as these conditions resulted in little forward displacement. However, the p-values for all of the other conditions are -4 pN: 0.0026, -2 pN: 0.0001, -1 pN: 0.0446, +0.5 pN: 0.3148, +2 pN: 0.0001, +3 pN: 0.1191, +4 pN: 0.0004.

      (6) The number of measurements for each experimental datapoint in the corresponding figure caption should be provided. SEM is used without, but N is not reported in the caption.

      Figure captions have now been updated to report the number of trajectories (N) for each data point.

      Reviewer #3 (Recommendations for the authors):  

      (1) The method of DNA-tethered motor trapping to enable low z-force is not entirely novel, but adapted from Urbanska (2021) for use in conventional optical trapping laboratories without reliance on microfluidics. However, I appreciate that they have fully established it here to share with the community. The authors could strengthen their methods section by being transparent about protein weight, protein labelling, and DNA ladders shown in the supplementary information. What organism is the protein from? Presumably human, but this should be specified in the methods. While the figures show beautiful data and exemplary traces, the total number of molecules analysed or events is not consistently reported. Overall, certain methodological details should be made sufficient for reproducibility.

      We appreciate the reviewer’s attention to methodological clarity. The constructs used are indeed human kinesin-1, KIF5B. The Methods now specify protein origin, molecular weights, and labeling details, and all figure captions report the number of trajectories analyzed to ensure reproducibility.

      (2) The major limitation the study presents is overarching generalisability, starting with the title. I recommend that the title be specific to kinesin-1. 

      The title has been revised to specify kinesin-1. 

      The study uses two constructs: a truncated K560 for conventional high-force assays, and full-length Kif5b for the low z-force method. However, for the multi-motor assay, the authors use K560 with the rationale of preventing autoinhibition due to binding with DNA, but that would also have limited characterisation in the single-molecule assay. Overall, the data generated are clear, high-quality, and exciting in the low z-force conditions. But why have they not compared or validated their findings with the truncated construct K560? This is especially important in the force-feedback experiments and in comparison with Andreasson et al. and Carter et al., who use Drosophila kinesin-1. Could kinesin-1 across organisms exhibit different force-detachment kinetics? It is quite possible. 

      Construct choice was guided by physiological relevance and considerations of autoinhibition: K560 was used for high z-force single-motor assays. The results of these assays are consistent with conventional bead assays performed by Andreasson et al. and Carter et al. using kinesin from a different organism. Therefore, we do not believe there are major differences between force properties of Drosophila and human kinesin-1.

      For low z-force assays, we used full-length KIF5B, which has nearly identical velocity and stall force to K560 in standard bead assays. We used this construct for low z force assays because it has a longer and more flexible stalk than K560 and better represents the force behavior of kinesin under physiological conditions. We then used constitutively-active K560 motors for multi-motor experiments to avoid potential complications from autoinhibition of full-length kinesin.

      Similarly, the authors test backward slipping of Kif5b and K560 and measure dwell times in multi-motor assays. Why not detail the backward slippage kinetics of Kif5b and any step-size impact under low z-forces? For instance, with the traces they already have, the authors could determine slip times, distances, and frequency in horizontal force experiments. Overall, the manuscript could be strengthened by analysing both constructs more fully.

      Slip or backstep analyses were not performed on single-motor data because such events were rare; kinesin typically detached rather than slipped. In contrast, multi-motor assays exhibited frequent slip events corresponding to the detachment of individual motors, which were analyzed in detail.

      We wrote “In comparison, slipping events were rarely observed in beads driven by a single motor, suggesting that kinesin typically detaches rather than slipping back on the microtubule under hindering loads.”

      Appraisal and impact:

      This study contributes to important and debated evidence on kinesin-1 force-detachment kinetics. The authors conclude that kinesin-1 exhibits a slip-bond interaction with the microtubule under increasing forces, while other recent studies (Noell et al. and Kuo et al.), which also use low z-force setups, conclude catch-bond behaviour under hindering loads. I find the results not fully aligned with their interpretation. The first comparison of low zforces in their setup with Noell et al. (2024), based on stall times, does not hold, because it is an apples-to-oranges comparison. Their data show a stall time constant of 2.52 s, which is comparable to the 3 s reported by Noell et al., but the comparison is made with a weighted average of 1.49 s. The authors do report that detachment rates are lower in low z-force conditions under unloaded scenarios. So, to completely rule out catch-bond-like behaviour is unfair. That said, their data quality is good and does show that higher hindering forces lead to higher detachment rates. However, on closer inspection, the range of 0-5 pN shows either a decrease or no change in detachment rate, which suggests that under a hindering force threshold, catch-bond-like or ideal-bond-like behaviour is possible, followed by slipbond behaviour, which is amazing resolution. Under assisting loads, the slip-bond character is consistent, as expected. Overall, the study contributes to an important discussion in the biophysical community and is needed, but requires cautious framing, particularly without evidence of motor trapping in a high microtubule-affinity state rather than genuine bond strengthening.

      We are not completely ruling out the catch bond behavior in our manuscript. As the reviewer pointed out, our results are consistent with the asymmetric slip bond model, whereas DNA tensiometer assays are more consistent with the catch bond behavior. The advantage of our approach is the capability to directly control the magnitude and direction of load exerted on the motor in the horizontal axis and measure the rate at which the motor detaches from the microtubule as it walks under constant load. In comparison, DNA tensiometer assays cannot control the force, but measure the time it takes the motor to fall off from the microtubule after a brief stall. The extension of the DNA tether is used to estimate the force exerted on the motor during a stall in those assays. The slight disadvantage of our method is the presence of low zforces, whereas DNA tensiometer assays are expected to have little to no z-force. We wrote that the discrepancy between our results can be attributed to the presence of low z forces in our DNA tethered trapping assembly, which may result in a higher-than-normal detachment rate under high hindering loads, thereby resulting in less asymmetry in the force detachment kinetics. We also added that this discrepancy can be addressed by future studies that directly control and measure horizontal force and measure the motor detachment rate in the absence of z forces. Optical trapping assays with small nanoparticles (Sudhakar et al. Science 2021) may be well suited to conclusively reveal the bond characteristics of kinesin under hindering loads.

      Reviewing Editor Comments:

      The reviewers are in agreement with the importance of the findings and the quality of the results. The use of the DNA tether reduces the z-force on the motor and provides biologically relevant insight into the behavior of the motor under load. The reviewers' suggestions are constructive and focus on bolstering some of the data points and clarifying some of the methodological approaches. My major suggestion would be to clarify the rationale for concluding that kinesin-1 exhibits slip-bond behavior with increasing force in light of the work of Noell (10.1101/2024.12.03.626575) and Kuo et al (2022 10.1038/s41467022-31069-x), both of which take advantage of DNA tethers.

      Please see our response to the previous comment. In the revised manuscript, we first clarified that our results are in agreement with previous theoretical (Khataee & Howard, 2019) and experimental studies (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020) that kinesin exhibits slower detachment under hindering load. This asymmetry became clear when the z-force was reduced or eliminated. 

      We clarified the differences between our results and DNA tensiometer assays and provided a potential explanation for these discrepancies. We also proposed that future studies might be required to fully distinguish between asymmetric slip, ideal, or catch bonding of kinesin under hindering loads.

      We wrote:

      “Our results agree with the theoretical prediction that kinesin exhibits higher asymmetry in force-detachment kinetics without z-forces (Khataee & Howard, 2019), and are consistent with optical trapping and DNA tensiometer assays that reported more persistent stalling of kinesin in the absence of z-forces (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020).

      Force-detachment kinetics of protein-protein interactions have been modeled as either a slip, ideal, or catch bond, which exhibit an increase, no change, or a decrease in detachment rate, respectively, under increasing force (Thomas et al., 2008). Slip bonds are most commonly observed in biomolecules, but studies on cell adhesion proteins reported a catch bond behavior (Marshall et al., 2003). Although previous trapping studies of kinesin reported a slip bond behavior (Andreasson et al., 2015; Carter & Cross, 2005), recent DNA tensiometer studies that eliminated the z-force showed that the detachment rate of the motor under hindering forces is lower than that of an unloaded motor walking on the microtubule (Kuo et al., 2022; Noell et al., 2024), consistent with the catch bond behavior. Unlike these reports, we observed that the stall duration of kinesin is shorter than the motor run time under unloaded conditions, and the detachment rate of kinesin increases with the magnitude of the hindering force. Therefore, our results are more consistent with the asymmetric slip bond behavior. The difference between our results and the DNA tensiometer assays (Kuo et al., 2022; Noell et al., 2024) can be attributed to the presence of low z-forces in our DNA-tethered optical trapping assays, which may increase the detachment rate under high hindering forces. Future studies that could directly control hindering forces and measure the motor detachment rate in the absence of z-forces would be required to conclusively reveal the bond characteristics of kinesin under hindering loads.”

    1. eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. The experimental dataset is unique, the coupled experimental and computational analyses comprehensive, and the effect is strong. However, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

    2. Reviewer #1 (Public review):

      The authors have conducted substantial additional analyses to address the reviewers' comments. However, several key points still require attention. I was unable to see the correspondence between the model predictions and the data in the added quantitative analysis. In the rebuttal letter, the delta peak speed time displays values in the range of [20, 30] ms, whereas the data were negative for the 45{degree sign} direction. Should the reader directly compare panel B of Figure 6 with Figure 1E? The correspondence between the model and the data should be made more apparent in Figure 6. Furthermore, the rebuttal states that a quantitative prediction was not expected, yet it subsequently argues that there was a quantitative match. Overall, this response remains unclear.

      A follow-up question concerns the argument about strategic slowing. The authors argue that this explanation can be rejected because the timing of peak speed should be delayed, contrary to the data. However, there appears to be a sign difference between the model and the data for the 45{degree sign} direction, which means that it was delayed in this case. Did I understand correctly? In that regard, I believe that the hypothesis of strategic slowing cannot yet be firmly rejected and the discussion should more clearly indicate that this argument is based on some, but not all, directions. I agree with the authors on the importance of the mass underestimation hypothesis, and I am not particularly committed to the strategic slowing explanation, but I do not see a strong argument against it. If the conclusion relies on the sign of the delta peak speed, then the authors' claims are not valid across all directions, and greater caution in the interpretation and discussion is warranted. Regarding the peak acceleration time, I would be hesitant to draw firm conclusions based on differences smaller than 10 ms (Figures R3 and 6D).

      The authors state in the rebuttal that the two hypotheses are competing. This is not accurate, as they are not mutually exclusive and could even vary as a function of movement direction. The abstract also claims that the data "refutes" strategic slowing, which I believe is too strong. The main issue is that, based on the authors' revised manuscript, the lack of quantitative agreement between the model and the data for the mass underestimation hypothesis is considered acceptable because a precise quantitative match is not expected, and the predictions overall agree for some (though not all) directions and phases (excluding post-in). That is reasonable, but by the same logic, the small differences between the model prediction and the strategic slowing hypothesis should not be taken as firm evidence against it, as the authors seem to suggest. In practice, I recommend a more transparent and cautious interpretation to avoid giving readers the false impression that the evidence is decisive. The mass underestimation hypothesis is clearly supported, but the remaining aspects are less clear, and several features of the data remain unexplained.

    3. Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model adds confidence to the proposed conclusions.

      Compared to the previous version, the authors have thoroughly addressed my concerns. The model is now clear and well-articulated, and alternative hypotheses have been ruled out convincingly. The paper is improved and suitable for publication in my opinion, making a significant contribution to the field.

      Strengths:

      - Comprehensive analysis of a unique data set of reaching movement in microgravity<br /> - Use of a sensible and well-thought experimental approach<br /> - State-of-the-art analyses of main kinematic parameter<br /> - Computational model simulations of arm reaching to test alternative hypotheses and support the mass underestimation one

      This work has no major weakness as it stands, and the discussion provides a fair evaluation of the findings and conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited and the manuscript is well written.

      Weaknesses:

      I nevertheless am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      To strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treating the arm as a second-order low pass filter (Eq. 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs damping and natural frequency, leading to greater uncertainty to the consequences of the initial command. This would still be an argument for un-adapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      While the authors attempt to differentiate their study from previous studies where limb neuromechanical impedance was shown to be modified in weightlessness by emphasizing that in the current study the movements were rapid and the initial movement is "feedforward". But this incorrectly implies that the limb's mechanical response to the motor command is determined only by active feedback mechanisms. In fact:

      (a) All commands to the muscle pass through the motor neurons. These neurons receive descending activations related not only to the volitional movement, but also to the dynamic state of the body and the influence of other sensory inputs, including the vestibular system. A decrease in descending influences from the vestibular organs will lower the background sensitivity to all other neural influences on the motor neuron. Thus, the motor neuron may be less sensitive to the other volitional and reflexive synaptic inputs that it may receive.

      (b) Muscle tone plays a significant role in determining the force and the time course of the muscle contraction. In a weightless environment, where tonic muscle activity is likely to be reduced, there is the distinct possibility that muscles will react more slowly and with lower amplitude to an otherwise equivalent descending motor command, particularly in the initial moments before spinal reflexes come into play. These, and other neuronal mechanisms could lead to the "under-actuation" effect observed in the current study, without necessarily being reflective of an underestimation of mass per se.

      (2) The subject's body in weightless is much more sensitive to reaction forces in interactions with the environment in the absence of the anchoring effect of gravity pushing the body into the floor and in the absence of anticipatory postural adjustments that typically accompany upper-limb motions in Earth gravity in order to maintain an upright posture. The authors dismiss this possibility because the taikonauts were asked to stabilize their bodies with the contralateral hand. But the authors present no evidence that this was sufficient to maintain the shoulder and trunk at a strictly constant position, as is supposed by the simplified biomechanical model used in their optimal control framework. Indeed, a small backward motion of the shoulder would result in a smaller acceleration of the fingertip and a smaller extent of the initial ballistic motion of the hand with respect to the measurement device (the tablet), consistent with the observations reported in the study. Note that stability of the base might explain why 45º movements were apparently less affected in weightlessness, according to many of the reported analyses, including those related to corrective movements (Fig. 5 B, C, F; Fig. 6D), than the other two directions. If the trunk is being stabilized by the left arm, the same reaction forces on the trunk due to the acceleration of the hand will result in less effective torque on the trunk, given that the reaction forces act with a much smaller moment arm with respect to the left shoulder (the hand movement axis passes approximately through the left shoulder for the 45º target) compared to either the forward or rightward motions of the hand.

      (3) The above is exacerbated by potential changes in the frictional forces between the fingertip and the tablet. The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact can be expected to be quite different than on the ground. While these forces may be low on Earth, the fact is that we do not know what forces the taikonauts used on orbit. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. . Indeed, given the increased instability of the body and the increased uncertainty of movement direction of the hand, taikonauts may have been induced to apply greater forces against the tablet in order to maintain contact in weightlessness, which would in turn slow the motion of the finger on the table and increase the reaction forces acting on the trunk. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors

      I feel that the authors have done an admirable job of exploring the how to explain the modifications to movement kinematics that they observed on orbit within the constraints of the optimal control theory applied to a simplified model of the human motor system. While I fully appreciate the value of such models to provide insights into question of human sensorimotor behaviour, to draw firm conclusions on what humans are actually experiencing based only on manipulations of the computational model, without testing the model's implicit assumptions and without considering the actual neurophysiological and biomechanical mechanisms, can be misleading. One way to do this could be to examine these questions through extensions to the model used in the simulations (changing activation dynamics of the torque generators, allowing for potential motion backward motion of the shoulder and trunk, etc.). A better solution would be to emulate the physiological and biomechanical conditions on Earth (supporting the arm against gravity to reduce muscle tone, placing the subject on a moveable base that requires that the body be stabilized with the other hand) in order to distinguish the hypothesis of an underestimation of mass vs. other potential sources of under-actuation and other potential effects of weightlessness on the body.

      In sum, my opinion is that the authors are relying too much on a theoretical model as a ground truth and thus overstate their conclusions. But to provide a convincing argument that humans truly underestimate mass in weightlessness, they should consider more judiciously the neurophysiology and biomechanics that fall outside the purview of the simplified model that they have chosen. If a more thorough assessment of this nature is not possible, then I would argue that a more measured conclusion of the paper should be 1) that the authors observed modifications to movement kinematics in weightlessness consistent with an under-actuation for the intended motion, 2) that a simplified model of human physiology and biomechanics that incorporates principles of optimal control suggest that the source of this under-actuation might be an underestimation of mass in the computation of an appropriate feedforward motor command, and 3) that other potential neurophysiological or biomechanical effects cannot be excluded due to limitations of the computational model.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. While the experimental dataset is unique and the coupled experimental and computational analyses comprehensive, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

      We thank the editor and reviewers for the thoughtful and constructive comments, which helped us substantially improve the manuscript. In this revised version, we have made the following key changes:

      - Directly presented the differential effect of microgravity in different movement directions, showing its quantitative match with model predictions.

      - Showed that changing cost function with the idea of conservative strategy is not a viable alternative.

      - Showed our model predictions remain largely the same after adding Coriolis and centripetal torques.

      - Discussed alternative explanations including neuromuscular deconditioning, friction, body stability, etc.

      - Detailed the model description and moved it to the main text, as suggested.

      Our point-to-point response is numbered to facilitate cross-referencing.

      We believe the revisions and the responses adequately addresses the reviewers’ concerns, and new analysis results strengthened our conclusion that mass underestimation is the major contributor to movement slowing in microgravity.

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      Response (1): Thank you for raising this point. The basic premise of this concern is that changing the cost function for implementing strategic slowing can reproduce our empirical findings, thus the alternative hypothesis that we aimed to refute in the paper remain possible. At least, it could co-exist with our hypothesis of mass underestimation. In the revision, we show that changing the cost function only, as suggested here, cannot produce the behavioral patterns observed in microgravity.

      As suggested, we modified the relative weighting of the state and control cost matrices (i.e., Q and R in the cost function Eq 15) without considering mass underestimation. While this cost function scaling can decrease peak velocity – a hallmark of strategic slowing – it also inevitably leads to later peak timings. This is opposite to our robust findings: the taikonauts consistently “advanced” their peak velocity and peak acceleration in time. Note, these model simulation patterns have also been shown in Crevecoeur et al. (2010), the paper mentioned by the reviewer (see their Figure 7B).

      We systematically changed the ratio between the state and control weight matrices in the simulation, as suggested. We divided Q and multiplied R by the same factor α, the cost function scaling parameter α as defined in Crevecoeur et al. (2010). This adjustment models a shift in movement strategy in microgravity, and we tested a wide range of α to examine reasonable parameter space. Simulation results for α = 3 and α = 0.3 are shown in Figure 1—figure supplement 2 and Figure 1—figure supplement 3 respectively. As expected, with α = 3 (higher control effort penalty), peak velocities and accelerations are reduced, but their timing is delayed. Conversely, with α = 0.3, both peak amplitude and timing increase. Hence, changing the cost function to implement a conservative strategy cannot produce the kinematic pattern observed in microgravity, which is a combination of movement slowing and peak timing advance.

      Therefore, we conclude that a change in optimal control strategy alone is insufficient to explain our empirical findings. Logically speaking, we cannot refute the possibility of strategic slowing, which can still exist on top of the mass underestimation we proposed here. However, our data does not support its role in explaining the slowing of goal-directed hand reaching in microgravity. We have added these analyses to the Supplementary Materials and expanded the Discussion to address this point.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Response (2): First, we have to clarify that our study does not aim to quantitatively fit observed hand trajectory. The two-link arm model simulates an ideal case of moving a point mass (effective mass) on a horizontal plane without friction (Todorov, 2004; 2005). In contrast, in the experiment, participants moved their hand on a tabletop without vertical arm support, so the movement was not strictly planar and was affected by friction. Thus, this kind of model can only illustrate qualitative differences between conditions, as in the majorities of similar modeling studies (e.g., Shadmehr et al., 2016). In our study, qualitative simulation means the model is intended to reproduce the directional differences between conditions—not exact numeric values—in key kinematic measures. Specifically, it should capture how the peak velocity and acceleration amplitudes and their timings differ between normal gravity and microgravity (particularly under the mass-underestimation assumption).

      Second, the reviewer rightfully pointed out that the directional effect is essential for our theorization of the importance of mass underestimation. However, the directional effect has two aspects, which were not clearly presented in our original manuscript. We now clarify both here and in the revision. The first aspect is that key kinematic variables (peak velocity/acceleration and their timing) are affected by movement direction, even before any potential microgravity effect. This is shown by the ranking order of directions for these variables (Figure 1C-H). The direction-dependent ranking, confirmed by pre-flight data, indicates that effective mass is a determining factor for reaching kinematics, which motivated us to study its role in eliciting movement slowing in space. This was what our original manuscript emphasized and clearly presented.

      The second aspect is that the hypothetical mass underestimation might also differentially affect movements in different directions. This was not clearly presented in the original manuscript. However, we would not expect a quantitative match between model predictions and empirical data, for the reasons mentioned above. We now show this directional ranking in microgravity-elicited kinematic changes in both model simulations and empirical data. The overall trend is that the microgravity effect indeed differs between directions, and the model predictions and the data showed a reasonable qualitative match (Author response image 1 below).

      Shown in Author response image 1, we found that for amplitude changes (Δ peak speed, Δ peak acceleration) both the model and the mean of empirical data show the same directional ordering (45° > 90° > 135°) in pre-in and post-in comparisons. For timing (Δ peak-speed time, Δ peak-acceleration time), which we consider the most diagnostic, the same directional ranking was observed. We only found one deviation, i.e., the predicted sign (earlier peaks) was confirmed at 90° and 135°, but not at 45°. As discussed in Response (6), the absence of timing advance at 45° may reflect limitations of our simplified model, which did not consider that the 45° direction is essentially a single-joint reach. Taken together, the directional pattern is largely consistent with the model predictions based on mass underestimation. The model successfully reproduces the directional ordering of amplitude measures -- peak velocity and peak acceleration. It also captures the sign of the timing changes in two out of the three directions. We added these new analysis results in the revision and expanded Discussion accordingly.

      The details of our analysis on directional effects: We compared the model predictions (Author response image 1, left) with the experimental data (Author response image 1, right) across the three tested directions (45°, 90°, 135°). In the experimental data panels, both Δ(pre-in) (solid bars) and Δ(post-in) (semi-transparent bars) with standard error are shown. The directional trends are remarkably similar between model prediction and actual data. The post-in comparison is less aligned with model prediction; we postulate that the incomplete after-flight recovery (i.e., post data had not returned to pre-flight baselines) might obscure the microgravity effect. Incomplete recovery has also been shown in our original manuscript: peak speed and peak acceleration did not fully recover in post-flight sessions when compared to pre-flight sessions. To further quantify the correspondence between model and data, we performed repeated-measures correlation (rm-corr) analyses. We found significant within-subject correlations for three of the four metrics. For pre–in, Δ peak speed time (r<sub>rm</sub> = 0.627, t(23) = 3.858, p < 0.001), Δ peak acceleration time (r<sub>rm</sub> = 0.591, t(23) = 3.513, p = 0.002), and Δ peak acceleration (r<sub>rm</sub> = 0.573, t(23) = 3.351, p = 0.003) were significant, whereas Δ peak speed was not (r<sub>rm</sub> = 0.334, t(23) = 1.696, p = 0.103). These results thus show that the directional effect, as predicted our model, is observed both before spaceflight and in spaceflight (the pre-in comparison).

      Author response image 1.

      Directional comparison between model predictions and experimental data across the three reach directions (45°, 90°, 135°). Left: model outputs. Right: experimental data shown as Δ relative to the in-flight session; solid bars = Δ(in − pre) and semi-transparent bars = Δ(in − post). Colors encode direction consistently across panels (e.g., 45° = darker hue, 90° = medium, 135° = lighter/orange). Panels (clockwise from top-left): Δ peak speed (cm/s), Δ peak speed time (ms), Δ peak acceleration time (ms), and Δ peak acceleration (cm/s²). Bars are group means; error bars denote standard error across participants.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      We agree that both hypotheses have been put forward before, however they are competing hypotheses that have not been resolved. Furthermore, the mass underestimation hypothesis is a conjecture without any solid evidence; previous reports on mass underestimation of object cannot directly translate to underestimation of body. As detailed in our responses above, we have shown that a conservative strategy implemented via a different cost function cannot reproduce the key findings in our dataset, thereby supporting the alternative hypothesis of mass underestimation. Moreover, we found qualitative agreement between the model predictions and the experimental data in terms of directional effects, which further strengthens our interpretation.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      Response (3): We are happy to include exemplary speed and acceleration trajectories. Kinematic profiles from one example participant are shown in Figure 2—figure supplement 6.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response (4): Great suggestion. In the revision, we have moved the model into the main text and added further justification for using this simple model.

      We initially omitted the nonlinear Coriolis and centripetal terms in order to start with a minimal model. Importantly, excluding these terms does not affect the model’s main conclusions. In the revision we added simulations that explicitly include these terms. The full explanation and simulations are provided in the Supplementary Notes 2 (this time we have to put it into the Supplementary to reduce the texts devoted to the model). More explanations can also be found in our response to Reviewer 2 (response (6)). The results indicate that, although these velocity-dependent forces show some directional anisotropy, their contribution is substantially smaller relative to that of the included inertial component; specifically, they have only a negligible impact on the predicted peak amplitudes and peak times.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response (5): Thank you for your thoughtful comment. You are correct that the increase in the percentage of trials with submovements is modest, but a more critical change was observed in the timing between submovement peaks—specifically, the inter-peak interval (IPI). These intervals became longer during flight. Taken together with the percentage increase, the submovement changes significantly predicted the increase in movement duration, as shown by our linear mixed-effects model, which indicated that IPI increased.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45° condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45°, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45° (beyond its low effective mass). In such planar movements, 45° often corresponds to a movement which is close to single-joint, whereas 90° and 135° involve multi-joint movements. If so, the increased proportion of submovements in 90° and 135° could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45° direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      Response (6): Thank you for raising these important questions. We unpacked the whole paragraph into two concerns: 1) the possibility that misestimation of Coriolis and centripetal torques might lead to corrective submovements, and 2) the weak effect in the 45° direction unexploited. These two concerns are valid but addressable, and they did not change our general conclusions based on our empirical findings (see Supplementary note 2. Coriolis and centripetal torques have minimal impact).

      Possible explanation for the 45° discrepancy

      We agree with the reviewer that the 45° direction likely involves more single-joint (elbow-dominant) movement, whereas the 90° and 135° directions require greater multi-joint (elbow + shoulder) coordination. This is particularly relevant when the workspace is near body midline (e.g., Haggard & Richardson, 1995), as the case in our experimental setup. To demonstrate this, we examined the curvature of the hand trajectories across directions. Using cumulative curvature (positive = counterclockwise), we obtained average values of 6.484° ± 0.841°, 1.539° ± 0.462°, and 2.819° ± 0.538° for the 45°, 90°, and 135° directions, respectively. The significantly larger curvature in the 45° condition suggests that these movements deviate more from a straight-line path, a hallmark of more elbow-dominant movements.

      Importantly, this curvature pattern was present in both the pre-flight and in-flight phases, indicating that it is a general movement characteristic rather than a microgravity-induced effect. Thus, the 45° reaches are less suitable for modeling with a simplified two-link arm model compared to the other two directions. We believe this is the main reason why the model predictions based on effective mass become less consistent with the empirical data for the 45° direction.

      We have now incorporated this new analysis in the Results and discussed it in the revised Discussion.

      Citation: Haggard, P., Hutchinson, K., & Stein, J. (1995). Patterns of coordinated multi-joint movement. Experimental Brain Research, 107(2), 254-266.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response (7): Neuromuscular deconditioning is indeed a space effect; thanks for bringing this up as we omitted the discussion of this confounds in our original manuscript. Prolonged stay in microgravity can lead to a reduction of muscle strength, but this is mostly limited to lower limb. For example, a recent well-designed large-sample study have shown that while lower leg muscle showed significant strength reductions, no changes in mean upper body strength was found (Scott et al., 2023), consistent with previous propositions that muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). Furthermore, the muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Of course, we cannot completely rule out the contribution of muscle weakness; we can only postulate, based on the task itself (12 cm reaching) and systematic microgravity effect (the increase in submovements, the increase in the inter-submovements intervals, and their significant prediction on movement slowing), that muscle weakness is an unlikely major contributor for the movement slowing.

      The reviewer suggests that poor coordination in microgravity might contribute to slowing down + more submovements. This is also a possibility, but we did not find evidence to support it. First, there is no clear evidence or reports about poor coordination for simple upper-limb movements like reaching investigated here. Note that reaching or aiming movement is one of the most studied tasks among astronauts. Second, we further analyzed our reaching trajectories and found no sign of curvature increase, a hallmark of poor coordination of Coriolis/centripetal torques, in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Scott J, Feiveson A, English K, et al. Effects of exercise countermeasures on multisystem function in long duration spaceflight astronauts. npj Microgravity. 2023;9(11).

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      Response (8): We thank the reviewer for raising these important and technically insightful points regarding our modeling framework. We first clarify the structure of the model and key assumptions, and then address the specific questions in points (a)–(c) below.

      We used Todorov’s (2005) stochastic optimal control method to compute a finite-horizon LQG policy under sensory noise and signal-dependent motor noise (state noise set to zero). The cost function is: (see details in updated Methods). The resulting time-varying gains {L<sub>k</sub>, K<sub>k</sub>} correspond to the feedforward mapping and the feedback correction gain, respectively. The control law can be expressed as:

      where u<sub>k</sub> is the control input, is the nominal planned state, is the estimated state, L<sub>k</sub> is the feedforward (nominal) control associated with the planned trajectory, and K<sub>k</sub> is the time-varying feedback gain that corrects deviations from the plan.

      To define the motor plan for comparison with behavior, we simulate the deterministic open-loop

      trajectory by turning off noise and disabling feedback corrections, i.e., . In this framework, “feedforward” refers to this nominal motor plan. Thus, sensory and signal-dependent noise influence the computed policy (via the gains), but are not injected when generating the nominal trajectory. This mirrors the minimum-jerk practice used to obtain nominal kinematics in prior utility-based work (Shadmehr, 2016), while optimal control provides a more physiologically grounded nominal plan. In the revision, we have updated the equations, provided more modeling details, and moved the model description to the main text to reduce possible confusions.

      In the implementation of the “mass underestimation” condition, the mass used to compute the policy is the underestimated mass (), whereas the actual mass is used when simulating the feedforward trajectories. Corrective submovements are analyzed separately and are not required for the planning-deficit findings reported here.

      Answers of the three specific questions:

      a) We mistakenly wrote a continuous-time infinite-horizon cost function in our original manuscript, whereas our controller is actually implemented as a discrete-time finite-horizon LQG with a terminal cost, over a horizon set by the utility-based optimal movement duration T<sub>opt</sub>. The underestimated mass is used in both the utility model (to determine T<sub>opt</sub>) and in the control computation (i.e., internal model), while the true mass is used when simulating the movement. This mismatch captures the central idea of feedforward planning based on an incorrect internal model.

      b) As described, our model includes signal-dependent motor noise and sensory noise, following Todorov (2005). We also evaluated whether increased noise levels in microgravity could account for the observed behavioral changes. Simulation results showed that increasing either source of noise did not alter the main conclusions or reverse the trends in our key metrics. Moreover, our experimental data showed no significant increase in endpoint variability in microgravity (see analyses and results in Figure 2—figure supplement 3 & 4), making it unlikely that increased sensorimotor noise alone accounts for the observed slowing and submovement changes.

      c) In our framework, the time-varying gains {L<sub>K</sub>,K<sub>K</sub>}define the feedforward and feedback components of the control policy. While both gains are computed based on a stochastic optimal control formulation (including noise), for comparison with behavior we simulate only the nominal feedforward plan, by turning off both noise and feedback: . This defines a deterministic open-loop trajectory, which we use to capture planning-level effects such as peak timing shifts under mass underestimation. Feedback corrections via gains exist in the full model but are not involved in these specific analyses. We clarified this modeling choice and its behavioral relevance in the revised text.

      We have updated the equations and moved the model description into the main text in the revised manuscript to avoid confusion.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Response (9): Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place, shown by the microgravity-related changes in peak velocity/acceleration. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts’ studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the control center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. All the pre-, in-, and post-flight test sessions were administered by the same group of personnel with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. In the revision, we included these experimental details for readers that are not familiar with space studies, and provided the rationales for emphasizing fast movements.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      Response (10): We believe that the presence or absence of adaptation between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. Their adaptation concerned incorporating microgravity into movement control to minimize effort, whereas ours concerned accurately perceiving body mass. Gaveau et al.’s task involved large-amplitude vertical reaching, a scenario in which gravity strongly affects joint torques and movement execution. Thus, adaptation to microgravity can lead to better execution, providing a strong incentive for learning. By contrast, our task consisted of small-amplitude horizontal movements, where the gravitational influence on biomechanics is minimal.

      More importantly, we believe the lack of adaptation for mass underestimation is not totally surprising. When an inertial change is perceived (such as an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching within tens of trials. In that case, sensory cues are veridical, as they correctly signal the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from adapting. Our initial explanation on this matter was too brief; we expanded it in the revised Discussion.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Response (11): Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting, parameter estimation, and posture-constrained reaching experiments; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic measures (Figure 2 and Figure 3 as questioned) show consistent directional trends between model predictions and empirical data. We added new analysis results on this matter: the directional effect we observed (how the key measures changed in microgravity across direction condition) is significantly correlated with our model predictions in most cases. Please check our detailed response (2) above. These results are also added in the revision.

      We also highlight in the revision that our modeling is not to quantitatively predict reaching behaviors in space, but to qualitatively prescribe that how mass underestimation, but not the conservative control strategy, can lead to divergent predictions about key kinematic measures of fast reaching.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al. showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      Response (12): We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      Response (13): We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1N and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation of the friction is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10N-15N). Thus, friction anisotropy is unlikely to explain our data. Indeed, our readers might have the same concern, we thus added some discussion about possible effect of friction.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Response (14): Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We think shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue. This argument is now mentioned in the revised Discussion.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

      Recommendations for the authors:

      Reviewing Editor Comments:

      General recommendation

      Overall, the reviewers agreed this is an interesting study with an original and strong approach. Nonetheless, there were significant weaknesses identified. The main criticism is that there is insufficient evidence for the claim that the movement slowing is due to mass underestimation, rather than other explanations for the increased feedback corrections. To bolster this claim, the reviewers have requested a deeper quantitative analysis of the directional effect and comparison to model predictions. They have also suggested that a 2-dof arm model could be used to predict how mass underestimation would influence multi-joint kinematics, and this should be compared to the data. Alternatively, or additionally, a control experiment could be performed (described in the reviews). We do realize that some of these options may not be feasible or practical. Ultimately, we leave it to you to determine how best to strengthen and solidify the argument for mass underestimation, rather than other causes.

      As an alternative approach, you could consider tempering the claim regarding mass underestimation and focus more on the result that slower movements in microgravity are not simply a feedforward, rescaling of the movement trajectories, but rather, have greater feedback corrections. In this case, the reviewers feel it would still be critical to explain and discuss potential reasons for the corrections beyond mass underestimation.

      We hope that these points are addressable, either with new analyses, experiments, or with a tempering of the claims. Addressing these points would help improve the eLife assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) Move model descriptions to the main text to present modelling choices in more detail

      Response (15): Thank you for the suggestion. We have moved the model descriptions to the main text to present the modeling choices in more detail and to allow readers to better cross-reference the analyses.

      (2) Perform quantitative comparisons of the directional effect with the model's predictions, and add raw kinematic traces to illustrate the effect in more detail.

      Response (16): Thanks for the suggestion, we have added the raw kinematics figure from a representative participant and please refer to Response (2) above for the comparisons of directional effect.

      (3) Explore the effect of varying cost parameters in addition to mass estimation error to estimate the proportion of data explained by the underestimation hypothesis.

      Response (17): Thank you for the suggestion. This has already been done—please see Response (1) above.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) It must be justified early on why reaction times are being analyzed in this work. I understood later that it is to rule out any global slowing down of behavioral responses in microgravity.

      Response (18): Exactly, RT results are informative about the absence of a global slowing down. Contrary to the conservative-strategy hypothesis, taikonauts did not show generalized slowing; they actually had faster reaction times during spaceflight, incompatible with a generalized slowing strategy. Thanks for point out; we justified that early in the text.

      (2) Since the results are presented before the methods, I suggest stressing from the beginning that the reaching task is performed on a tablet and mentioning the instructions given to the participants, to improve the reading experience. The "beep" and "no beep" conditions also arise without obvious justification while reading the paper.

      Response (19): Great suggestions. We now give out some experimental details and rationales at the beginning of Results.

      (3) Figure 1C: The vel profiles are not returning to 0 at the end, why? Is it because the feedback gain is computed based on the underestimated mass or because a feedforward controller is applied here? Is it compatible with the experimental velocity traces?

      Response (20): Figure. 1C shows the forward simulation under the optimal control policy. In our LQG formulation the terminal velocity is softly penalized (finite weight) rather than hard-constrained to zero; with a fixed horizon° the optimal solution can therefore end with a small residual velocity.

      In the behavioral data, the hand does come to rest: this is achieved by corrective submovements during the homing phase.

      (4) Left-skewed -> I believe this is right-skewed since the peak velocity is earlier.

      Response (21): Yes, it should be right-skewed, thanks for point that out.

      (5) What was the acquisition frequency of the positional data points? (on the tablet).

      Response (22): The sampling frequency is 100 Hz. Thanks for pointing that out; we’ve added this information to the Methods.

      (6) Figure S1. The planned duration seems to be longer than in the experiment (it is more around 500 ms for the 135-degree direction in simulation versus less than 400 ms in the experiment). Why?

      Response (23): We apologize for a coding error that inadvertently multiplied the body-mass parameter by an extra factor, making the simulated mass too high. We have corrected the code, rerun the simulations, and updated Figures 1 and S1; all qualitative trends remain unchanged, and the revised movement durations (≈300–400 ms) are closer to the experimental values.

      (7) After Equation 13: "The control law is given by". This is not the control law, which should have a feedback form u=K*x in the LQ framework. This is just the dynamic equations for the auxiliary state and the force. Please double-check the model description.

      Response (24): Thank you for point this out. We have updated and refined all model equations and descriptions, and moved the model description from the Supplementary Materials to the main text; please see the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a concern about the interpretation of the anisotropic "equivalent mass". From my understanding, the equivalent mass would be what an external actor would feel as an equivalent inertia if pushing on the end effector from the outside. But the CNS does not push on the arm with a pure force generator acting at the hand to effectuate movement. It applies torque around the joints by applying forces across joints with muscles, causing the links of the arm to rotate around the joints. If the analysis is carried out in joint space, is the effective rotational inertia of the arm also anisotropic with respect to the direction of the movement of the hand? In other words, can the authors reassure me that the simulations are equivalent to an underestimation of the rotational inertia of the links when applied to the joints of the limb? It could be that these are mathematically the same; I have not delved into the mathematics to convince myself either way. But I would appreciate it if the authors could reassure me on this point.

      Response (25): Thank you for raising this point. In our work, “equivalent mass” denotes the operational-space inertia projected along the hand-movement direction u, computed as:

      This formulation describes the effective mass perceived at the end effector along a given direction, and is standard in operational-space control.

      Although the motor command can be coded as either torque/force in the CNS, the actual executions are equivalent no matter whether it is specified as endpoint forces or joint torques, since force and torque are related by . For small excursions as investigated here, this makes the directional anisotropy in endpoint inertia consistent with the anisotropy of the effective joint-space inertia required to produce the same endpoint motion. Conceptually, therefore, our “mass underestimation” manipulation in operational space corresponds to underestimating the required joint-space inertia mapped through the Jacobian. Since our behavioral data are hand positions, using the operational-space representation is the most direct and appropriate way for modeling.

      (2) I would also like to suggest one more level of analysis to test their hypothesis. The authors decomposed the movements into submovements and measured the prevalence of corrective submovements in weightlessness vs. normal gravity. The increase in corrective submovements is consistent with the hypothesis of a misestimation of limb mass, leading to an unexpectedly smaller displacement due to the initial feedforward command, leading to the need for corrections, leading to an increased overall movement duration. According to this hypothesis, however, the initial submovement, while resulting in a smaller than expected displacement, should have the same duration as the analogous movements performed on Earth. The authors could check this by analyzing the duration of the extracted initial submovements.

      Response (26): We appreciate the reviewer’s suggestion regarding the analysis of the initial submovement duration. In our decomposition framework, each submovement is modeled as a symmetric log-normal (bell-shaped) component, such that the time to peak speed is always half of the component duration. Thus, the initial submovement duration is directly reflected in the initial submovement peak-speed time already reported in our original manuscript (Figure. 5F).

      However, we respectfully disagree with the assumption that mass underestimation would necessarily yield the same submovement duration as on Earth. Under mass underestimation, the movement is effectively under-actuated, and the initial submovement can terminate prematurely, leading to a shorter duration. This is indeed what we observed in the data. Therefore, our reported metrics already address the reviewer’s proposal and support the conclusion that mass underestimation reduces the initial submovement duration in microgravity. Per your suggestion, we now added one more sentence to explain to the reader that initial submovement peak-speed time reflect the duration of the initial submovement.

      Some additional minor suggestions:

      (1) I believe that it is important to include the data from the control subjects, in some form, in the main article. Perhaps shading behind the main data from the taikonauts to show similarities or differences between groups. It is inconvenient to have to go to the supplementary material to compare the two groups, which is the main test of the experiment.

      Response (27): Thank you for the suggestion. For all the core performance variables, the control group showed flat patterns, with no changes across test sessions at all. Thus, including these figures (together with null statistical results) in the main text would obscure our central message, especially given the expanded length of the revised manuscript (we added model details and new analysis results). Instead, following eLife’s format, we have reorganized the Supplementary Material so that each experimental figure has a corresponding supplementary figure showing the control data. This way, readers can quickly locate the control results and directly compare them with the experimental data, while keeping the main text focused.

      (2) "Importantly, sensory estimate of bodily property in microgravity is biased but evaded from sensorimotor adaptation, calling for an extension of existing theories of motor learning." Perhaps "immune from" would be a better choice of words.

      Response (28): Thanks for the suggestion, we edited our text accordingly.

      (3) "First, typical reaching movement exhibits a symmetrical bell-shaped speed profile, which minimizes energy expenditure while maximizing accuracy according to optimal control principles (Todorov, 2004)." While Todorov's analysis is interesting and well accepted, it might be worthwhile citing the original source on the phenomenon of bell-shaped velocity profiles that minimize jerk (derivative of acceleration) and therefore, in some sense, maximize smoothness. Flash and Hogan, 1985.

      Response (29): Thanks for the suggestion, we added the citation of minimum jerk.

      (4) "Post-hoc analyses revealed slower reaction times for the 45° direction compared to both 90° (p < 0.001, d = 0.293) and 135° (p = 0.003, d = 0.284). Notably, reactions were faster during the in-flight phase compared to pre-flight (p = 0.037, d = 0.333), with no significant difference between in-flight and post-flight phases (p = 0.127)." What can one conclude from this?

      Response (30): Although these decreases reached statistical significance, their magnitudes were small. The parallel pattern across groups suggests the effect is not driven by microgravity, but is more plausibly a mild learning/practice effect. We now mentioned this in the Discussion.

      (5) "In line with predictions, peak acceleration appeared significantly earlier in the 45° direction than other directions (45° vs. 90°, p < 0.001, d = 0.304; 45° vs. 135°, p < 0.001, d = 0.271)." Which predictions? Because the effective mass is greater at 45º? Could you clarify the prediction?

      Response (31): We should be more specific here; thank you for raising this. The predictions are the ones about peak acceleration timing (shown in Fig. 1H). We now modified this sentence as:

      “In line with model predictions (Figure 1H), ….”.

      (6) Figure 2: Why do 45º movements have longer reaction times but shorter movement durations?

      Response (32): Appreciate your careful reading of the results. We believe this is possibly due to flexible motor control across conditions and trials, i.e., people tend to move faster when people react slower with longer reaction time. This has been reflected in across-direction comparisons (as spotted by the reviewer here), and it has also been shown within participant and across participants: For both groups, we found a significant negative correlation between movement duration (MD) and reaction time (RT), both across and within individuals (Figure 2—figure supplement 5). This finding indicates that participants moved faster when their RT was slower, and vice versa. This flexible motor adjustment, likely due to the task requirement for rapid movements, remained consistent during spaceflight.

    1. eLife Assessment

      In this useful study, the authors conducted an impressive amount of atomistic simulations with a realistic asymmetric lipid bilayer to probe how the HIV-1 envelope glycoprotein (Env) transmembrane domain, cytoplasmic tail, and membrane environment influence ectodomain orientation and antibody epitope exposure. The simulations convincingly show that ectodomain motion is dominated by tilting relative to the membrane and explicitly demonstrate the role of membrane asymmetry in modulating the protein conformation and orientation. However, due to the qualitative nature of the conducted analyses, the evidence for the coupling between membrane-proximal regions and the antigenic surface is considered incomplete. With stronger integration of prior experimental and computational literature, this work has the potential to serve as a reference for how Env behaves in a realistic, glycosylated, membrane-embedded context.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art, given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad audience.

      Weaknesses:

      The manuscript lacks a discussion of previous studies. The authors should consider addressing or comparing their work with the following points:

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work:

      https://doi.org/10.1101/2025.03.26.645577

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain:

      https://doi.org/10.1021/jacs.5c15421

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state:

      https://doi.org/10.1073/pnas.1807259115

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study:

      https://doi.org/10.1126/science.aaa9804

    3. Reviewer #2 (Public review):

      (1) Summary

      In this work, the authors aim to elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. By comparing multiple model variants that differ in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail, the authors aim to identify general features of protein-membrane dynamics relevant to antibody recognition.

      (2) Strengths

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. Such simulations remain technically challenging, and the work represents a substantial computational and methodological effort.

      The analysis provides a clear and intuitive description of large-scale protein motions relative to the membrane, including ectodomain tilting and transmembrane orientation. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor.

      Another strength is the explicit consideration of membrane and glycan steric effects on antibody accessibility. By evaluating multiple classes of antibodies targeting distinct regions of the protein, the study highlights how membrane proximity and glycan dynamics can differentially influence access to different epitopes. This comparative approach helps place the results in a broader immunological context and may be useful for readers interested in antibody recognition or vaccine design.

      Overall, the results are internally consistent across multiple simulations and model variants, and the conclusions are generally well aligned with the data presented.

      (3) Weaknesses

      The main limitations of the study relate to sampling and model dependence, which are inherent challenges for simulations of this size and complexity. Although the simulations are long by current standards, individual trajectories explore only portions of the available conformational space, and several conclusions rely on pooling data across a limited number of replicas. This makes it difficult to fully assess the robustness of some quantitative trends, particularly for rare events such as specific epitope accessibility states.

      In addition, several aspects of the model construction, including the treatment of missing regions, loop rebuilding, and initial configuration choices, are necessarily approximate. While these approaches are reasonable and well motivated, the extent to which some conclusions depend on these modeling choices is not always fully clear from the current presentation.

      Finally, the analysis of antibody accessibility is based on geometric and steric criteria, which provide a useful first-order approximation but do not capture potential conformational adaptations of antibodies or membrane remodeling during binding. As a result, the accessibility results should be interpreted primarily as model-based predictions rather than definitive statements about binding competence.

      Despite these limitations, the study provides a valuable and carefully executed contribution, and its datasets and analytical framework are likely to be useful to others interested in protein-membrane interactions and antibody recognition.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      A key strength of this work is the scope and realism of the simulation systems. The authors construct a very large, nearly complete Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT, are well motivated by existing biological and structural data.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30{degree sign} (and up to ~50{degree sign} in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example, by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Weaknesses:

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      With stronger integration of prior experimental and computational literature, this work has the potential to serve as a valuable reference for how Env behaves in a realistic, glycosylated, membrane-embedded context. The simulation framework itself is well-suited for future studies incorporating mutations, strain variation, antibodies, inhibitors, or receptor and co-receptor engagement. In its current form, the primary contribution of the study is to consolidate and extend existing observations within a single, large-scale model, providing a useful platform for future mechanistic investigations.

    5. Author response:

      In response to the comments raised, we outline below the revisions we plan to strengthen the manuscript.

      First, we will expand the Introduction and Discussion sections to provide clearer comparison with prior experimental and computational studies of ectodomain tilting, MPER–TMD conformational heterogeneity, and membrane deformation, and to discuss how our simulations reproduce and extend these earlier observations.

      Second, we plan to add analyses that more directly assess the coupling between ectodomain and TMD motions. We will also revise the text to emphasize the limits imposed by sampling and model dependence and to discuss the potential benefits of enhanced sampling methods.

      Third, we will clarify the rationale for the chosen membrane composition and discuss how differences in lipid content between host plasma membranes and HIV virions may influence bilayer properties and Env dynamics.

      Fourth, we will supplement the Methods section to improve clarity and address issues of citation throughout the manuscript.

      Finally, we intend to deposit MD trajectories to a public research data repository to the extent permitted by available storage capacity.

    1. eLife Assessment

      This valuable study uses NAD(P)H fluorescence lifetime imaging (FLIM) to map metabolic states in the Drosophila brain. The authors reveal subtype-specific metabolic profiles in Kenyon cells and report learning-related changes, supported by solid evidence and careful methodology. However, the FLIM shifts observed after memory formation in α/β neurons are small and only weakly significant, so the ability of FLIM to detect subtle physiological changes still requires further validation. Nevertheless, this work provides a strong starting point and demonstrates the promising potential of FLIM for probing neural metabolism in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence life-time imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). It is unclear whether this small effect represents a meaningful shift in neuronal metabolic state.

      Whether this method can be valuable to examine the effects of long-term memory (after spaced or massed conditioning) remains to be established.

    3. Reviewer #2 (Public review):

      This revised manuscript presents a valuable application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with particular emphasis on the mushroom body, a key center for associative learning and memory. They also report metabolic shifts in α/β Kenyon cells following classical conditioning, in line with their known role in energy-demanding memory processes.

      The study is well-executed and the authors have added more detailed methodological descriptions in this version, which strengthen the technical contribution. The analysis pipeline is rigorous, with careful curve fitting and appropriate controls. However, the metabolic shifts observed after conditioning are small and only weakly significant, raising questions about the sensitivity of FLIM for detecting subtle physiological changes. The authors acknowledge these limitations in the revised discussion, which helps place the findings in proper context.

      Despite this, the work provides a solid foundation for future applications of label-free FLIM in vivo and serves as a valuable technical resource for researchers interested in neural metabolism. Overall, this study represents a meaningful step toward integrating metabolic imaging with the study of neural activity and cognitive function.

    4. Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins help identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein bound form. The ratio of free to bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting a FL decay model with two free parameters, f_free and T_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein bound NADPH, but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 ns (Figure 1a). The paper beautifully lays out the analysis pipeline, providing a valuable resource. The full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure properties of NADPH related autofluorescence of Kenyon Cells (KCs) of the fly mushroom body. The somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e); the f_free fit is higher for the calyx (input synapses) region than for KC somata; and the average across flies of average f_free fits in alpha/beta KC somata decreases slightly following paired presentation of odor and shock, compared to unpaired presentation of the same stimuli. Though the change is slight, no comparable change is detected in gamma KCs, suggesting that distributions of f_free derived from FL may be sensitive enough to measure changes in metabolic pathways following conditioning.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence lifetime imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      (1) Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). The authors should provide justification for why they believe this small effect represents a meaningful shift in neuronal metabolic state.

      We agree with the reviewer that the observed f_free shift averaged per individual, while statistically significant, is small. However, to our knowledge, this is the first study to investigate a physiological (i.e., not pharmacologically induced) variation in neuronal metabolism using FLIM. As such, there are no established expectations regarding the amplitude of the effect. In the revised manuscript, we have included an additional experiment involving the knockdown of ALAT in α/β Kenyon cells, which further supports our findings. We have also expanded the discussion to expose two potential reasons why this effect may appear modest.

      (2) The lack of experiments examining the effects of long-term memory (after spaced or massed conditioning) seems like a missed opportunity. Such experiments could likely reveal more drastic changes in the metabolic profiles of KCs, as a consequence of memory consolidation processes.

      We agree with the reviewer that investigating the effects of long-term memory on metabolism represent a valuable future path of investigation. An intrinsic caveat of autofluorescence measurement, however, is to identify the cellular origin of the observed changes. To this respect, long-term memory formation is not an ideal case study as its essential feature is expected to be a metabolic activation localized to Kenyon cells’ axons in the mushroom body vertical lobes (as shown in Comyn et al., 2024), where many different neuron subtypes send intricate processes. This is why we chose to first focus on middle-term memory, where changes at the level of the cell bodies could be expected from our previous work (Rabah et al., 2022). But our pioneer exploration of the applicability of NAD(P)H FLIM to brain metabolism monitoring in vivo now paves the way to extending it to the effect of other forms of memory.

      (3) The discussion is mostly just a summary of the findings. It would be useful if the authors could discuss potential future applications of their method and new research questions that it could help address.

      The discussion has been expanded by adding interpretations of the findings and remaining challenges.

      Reviewer #2 (Public review):

      This manuscript presents a compelling application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with a particular focus on the mushroom body, a key structure involved in associative learning and memory. In particular, they identify metabolic shifts in α/β Kenyon cells following classical conditioning, consistent with their established role in energy-demanding middle- and long-term memories.

      These results highlight the potential of label-free FLIM for in-vivo neural circuit studies, providing a powerful complement to genetically encoded sensors. This study is well-conducted and employs rigorous analysis, including careful curve fitting and well-designed controls, to ensure the robustness of its findings. It should serve as a valuable technical reference for researchers interested in using FLIM to study neural metabolism in vivo. Overall, this work represents an important step in the application of FLIM to study the interactions between metabolic processes, neural activity, and cognitive function.

      Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins helps to identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein-bound form. The ratio of free-to-bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering the brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting an FL decay model with two free parameters, f_free and t_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant of 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein-bound NADPH but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 (Figure 1a). These methods appear to be solid, the full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure the properties of NADPH-related autofluorescence of Kenyon Cells(KCs) of the fly mushroom body. The results from the three main figures are:

      (1) Somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e);

      (2) The f_free fit is higher for the calyx (input synapses) region than for KC somata (Figure 2b);

      (3) The average across flies of average f_free fits in alpha/beta KC somata decreases from 0.734 to 0.718. Based on the first two findings, an accurate title would be "Autofluorecense lifetime imaging reveals regional differences in NADPH state in Drosophila mushroom bodies."

      The third finding is the basis for the title of the paper and the support for this claim is unconvincing. First, the difference in alpha/beta f_free (p-value of 4.98E-2) is small compared to the measured difference in f_free between somas and calyces. It's smaller even than the difference in average soma f_free across datasets (Figure 2b vs c). The metric is also quite derived; first, the model is fit to each (binned) voxel, then the distribution across voxels is averaged and then averaged across flies. If the voxel distributions of f_free are similar to those shown in Supplementary Figure 2, then the actual f_free fits could range between 0.6-0.8. A more convincing statistical test might be to compare the distributions across voxels between alpha/beta vs alpha'/beta' vs. gamma KCs, perhaps with bootstrapping and including appropriate controls for multiple comparisons.

      The difference observed is indeed modest relative to the variability of f_free measurements in other contexts. The fact that the difference observed between the somata region and the calyx is larger is not necessarily surprising. Indeed, these areas have different anatomical compositions that may result in different basal metabolic profiles. This is suggested by Figure 1b which shows that the cortex and neuropile have different metabolic signatures. Differences in average f_free values in the somata region can indeed be observed between naive and conditioned flies. However, all comparisons in the article were performed between groups of flies imaged within the same experimental batches, ensuring that external factors were largely controlled for. This absence of control makes it difficult to extract meaningful information from the comparison between naive and conditioned flies.

      We agree with the reviewer that the choice of the metric was indeed not well justified in the first manuscript. In the new manuscript, we have tried to illustrate the reasons for this choice with the example of the comparison of f_free in alpha/beta neurons between unpaired and paired conditioning (Dataset 8). First, the idea of averaging across voxels is supported by the fact that the distributions of decay parameters within a single image are predominantly unimodal. Examples for Dataset 8 are now provided in the new Sup. Figure 14. Second, an interpretable comparison between multiple groups of distributions is, to our knowledge, not straightforward to implement. It is now discussed in Supplementary information. To measure interpretable differences in the shapes of the distributions we computed the first three moments of distributions of f_free for Dataset 8 and compared the values obtained between conditions (see Supplementary information and new Sup. Figure 15). Third, averaging across individuals allows to give each experimental subject the same weight in the comparisons.

      I recommend the authors address two concerns. First, what degree of fluctuation in autofluorescence decay can we expect over time, e.g. over circadian cycles? That would be helpful in evaluating the magnitude of changes following conditioning. And second, if the authors think that metabolism shifts to OXPHOS over glycolosis, are there further genetic manipulations they could make? They test LDH knockdown in gamma KCs, why not knock it down in alpha/beta neurons? The prediction might be that if it prevents the shift to OXPHOS, the shift in f_free distribution in alpha/beta KCs would be attenuated. The extensive library of genetic reagents is an advantage of working with flies, but it comes with a higher standard for corroborating claims.

      In the present study, we used control groups to account for broad fluctuations induced by external factors such as the circadian cycle. We agree with the reviewer that a detailed characterization of circadian variations in the decay parameters would be valuable for assessing the magnitude of conditioning-induced shifts. We have integrated this relevant suggestion in the Discussion. Conducting such an investigation lies unfortunately beyond the scope and means of the current project.

      In line with the suggestion of the reviewer, we have included a new experiment to test the influence of the knockdown of ALAT on the conditioning-induced shift measured in alpha/beta neurons. This choice is motivated in the new manuscript. The obtained result shows that no shift is detected in the mutant flies, in accordance with our hypothesis.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      (1) Y axes in Figures 1e, 2c, 3b,c are misleading. They must start at 0.

      Although we agree that making the Y axes start at 0 is preferable, in our case it makes it difficult to observe the dispersion of the data at the same time (your next suggestion). To make it clearer to the reader that the axes do not start at 0, a broken Y-axis is now displayed in every concerned figure.

      (2) These same plots should have individual data points represented, for increased clarity and transparency.

      Individual data points were added on all boxplots.

      Reviewer #2 (Recommendations for the authors):

      I am evaluating this paper as a fly neuroscientist with experience in neurophysiology, including calcium imaging. I have little experience with FLIM but anticipate its use growing as more microscopes and killer apps are developed. From this perspective, I value the opportunity to dig into FLIM and try to understand this autofluorescence signal. I think the effort to show each piece of the analysis pipeline is valuable. The figures are quite beautiful and easy to follow. My main suggestion is to consider moving some of the supplemental data to the main figures. eLife allows unlimited figures, moving key pieces of the pipeline to the main figures would make for smoother reading and emphasize the technical care taken in this study.

      We thank the reviewer for their feedback. Following their advice we have moved panels from the supplementary figures to the main text (see new Figure 2).

      Unfortunately, the scientific questions and biological data do not rise to the typical standard in the field to support the claims in the title, "In vivo autofluorescence lifetime imaging of the Drosophila brain captures metabolic shifts associated with memory formation". The authors also clearly state what the next steps are: "hypothesis-driven approaches that rely on metabolite-specific sensors" (Intro). The advantage of fly neuroscience is the extensive library of genetic reagents that enable perturbations. The key manipulation in this study is the electric shock conditioning paradigm that subtly shifts the distribution of a parameter fit to an exponential decay in the somas of alpha/beta KCs vs others. This feels like an initial finding that deserves follow-up; but is it a large enough result to motivate a future student to pick this project up? The larger effect appears to be the gradients in f_free across KCs overall (Figure 2b). How does this change with conditioning?

      We acknowledge that the observed metabolic shift is modest relative to the variability of f_free and agree that additional corroborating experiments would further strengthen this result. Nevertheless, we believe it remains a valid and valuable finding that will be of interest to researchers in the field. The reviewer is right in pointing out that the gradient across KCs is higher in magnitude, however, the fact that this technique can also report experience-dependent changes, in addition to innate heterogeneities across different cell types, is a major incentive for people who could be interested in applying NAD(P)H FLIM in the future. For this reason, we consider it appropriate to retain mention of the memory-induced shift in the title, while making it less assertive and adding a reference to the structural heterogeneities of f_free revealed in the study. We have also rephrased the abstract to adopt a more cautious tone and expanded the discussion to clarify why a low-magnitude shift in f_free can still carry biological significance in this context. Finally, we have added the results of a new set of data involving the knockdown of ALAT in Kenyon cells, to further support the relevance of our observation relative to memory formation, despite its small magnitude. We believe that these elements together form a good basis for future investigations and that the manuscript merits publication in its present form.

      Together, I would recommend reshaping the paper as a methods paper that asks the question, what are the spatial properties of NADPH FL across the brain? The importance of this question is clear in the context of other work on energy metabolism in the MBs. 2P FLIM will likely always have to account for autofluorescence, so this will be of interest. The careful technical work that is the strength of the manuscript could be featured, and whether conditioning shifts f_free could be a curio that might entice future work.

      By transferring panels of the supplementary figures to the main text (see new Figure 2) as suggested by Reviewer 2, we have reinforced the methodological part of the manuscript. For the reasons explained above, we however still mention the ‘biological’ findings in the title and abstract.

      Minor recommendations on science:

      Figure 2C. Plotting either individual data points or distributions would be more convincing.

      Individual data points were added on all boxplots.

      There are a few mentions of glia. What are the authors' expectations for metabolic pathways in glia vs. neurons? Are glia expected to use one more than the other? The work by Rabah suggests it should be different and perhaps complementary to neurons. Can a glial marker be used in addition to KC markers? This seems crucial to being able to distinguish metabolic changes in KC somata from those in glia.

      Drosophila cortex glia are thought to play a similar role as astrocytes in vertebrates (see Introduction). In that perspective, we expect cortex glia to display a higher level of glycolysis than neurons. The work by Rabah et al. is coherent with this hypothesis. Reviewer 2 is right in pointing out that using a glial marker would be interesting. However, current technical limitations make such experiments challenging. These limitations are now exposed in the discussion.

      The question of whether KC somata positions are stereotyped can probably be answered in other ways as well. For example, the KCs are in the FAFB connectomic data set and the hemibrain. How do the somata positions compare?

      The reviewer’s suggestion is indeed interesting. However, the FAFB and hemibrain connectomic datasets are based on only two individual flies, which probably limits their suitability for assessing the stereotypy of KC subtype distributions. In addition, aligning our data with the FAFB dataset would represent substantial additional work.

      The free parameter tau_bound is mysterious if it can be influenced by the identity of the protein. Are there candidate NADPH binding partners that have a spatial distribution in confocal images that could explain the difference between somas and calyx?

      There are indeed dozens of NADH- or NADPH-binding proteins. For this reason, in all studies implementing exponential fitting of metabolic FLIM data, tau_bound is considered a complex combination of the contributions from many different proteins. In addition, one should keep in mind that the number of cell types contributing to the autofluorescence signal in the mushroom body calyx (Kenyon cells, astrocyte-like and ensheathing glia, APL neurons, olfactory projection neurons, dopamine neurons) is much higher than in the somas (only Kenyon cells and cortex glia). This could also participate in the observed difference. Hence, focusing on intracellular heterogeneities of potential NAD(P)H binding partners seems premature at that stage.

      The phrase "noticeable but not statistically significant" is misleading.

      We agree with the reviewer and have removed “noticeable but” from the sentence in the new version of the manuscript.

      Minor recommendations on presentation:

      The Introduction can be streamlined.

      We agree that some parts of the Introduction can seem a bit long for experts of a particular field. However, we think that this level of detail makes the article easily accessible for neuroscientists working on Drosophila and other animal models but not necessarily with FLIM, as well as for experts in energy metabolism that may be familiar with FLIM but not with Drosophila neuroscience.

    1. eLife Assessment

      This study provides a useful application of computational modelling to examine how people with chronic pain learn under uncertainty, contributing to efforts to link pain with motivational processes. However, the evidence supporting the main claims is incomplete, as the modelling differences are not reflected in observable behaviour or pain measures, and the interpretation extends beyond what the data can substantiate. The conclusions would benefit from a clearer explanation of the behavioural differences that underlie the computational findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how individuals with chronic temporomandibular disorder (TMD) learn from uncertain rewards, using a probabilistic three-armed bandit task and computational modelling. The authors aim to identify whether people living with chronic pain show altered learning under uncertainty and how such differences might relate to psychological symptoms.

      Strengths:

      The work addresses an important question about how chronic pain may influence cognition and motivation. The task design is appropriate for probing adaptive learning, and the modelling approach is novel. The findings of altered uncertainty updating in the TMD group are interesting.

      Weaknesses:

      Several aspects of the paper limit the strength of the conclusions. The group differences appear only in model-derived parameters, with no corresponding behavioural differences in task performance. Model parameters do not correlate with pain severity, making the proposed mechanistic link between pain and learning speculative. Some of the interpretations extend beyond what the data can directly support.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on a case-control study in which participants with chronic pain (TMD) were compared to controls on performance of a three-option learning task. The authors find no difference in task behavior, but fit a model to this behavior and suggest that differences in the model-derived metrics (specifically, change in learning rate/estimated volatility/model estimated uncertainty) reveal a relevant between-group effect. They report a mediation effect suggesting that group differences on self-report apathy may be partially mediated by this uncertainty adaptation result.

      Strengths:

      The role of sensitivity to uncertainty in pathological states is an interesting question and is the focus of a reasonable amount of research at present. This paper provides a useful assessment of these processes in people with chronic pain.

      Weaknesses:

      (1) The interpretation of the model in the absence of any apparent behavioral effect is not convincing. The model is quite complex with a number of free parameters (what these parameters are is not well explained in the methods, although they seem to be presented in the supplement). These parameters are fitted to participant choice behavior - that is, they explain some sort of group difference in this choice behavior. The authors haven't been able to demonstrate what this difference is. The graphs of learning rate per group (Figure 2) suggest that the control group has a higher initial learning rate and a lower later learning rate. If this were actually the case, you would expect to see it reflected in the choice data (the control group should show higher lose-shift behavior earlier on, with this then declining over time, and the TMD group should show no change). This behavior is not apparent. The absence of a clear effect on behavior suggests that the model results are more likely to be spurious.

      (2) As far as I could see, the actual parameters of the model are not reported. The results (Figure 2) illustrate the trial-level model estimated uncertainty/learning rate, etc, but these differ because the fitted model parameters differ. The graphs look like there are substantial differences in v0 (which was not well recovered), but presumably lambda, at least, also differs. The mean(SD) group values for these parameters should be reported, as should the correlations between them (it looks very much like they will be correlated).

      (3) The task used seems ill-suited to measuring the reported process. The authors report the performance of a restless bandit task and find an effect on uncertainty adaptation. The task does not manipulate uncertainty (there are no periods of high/low uncertainty) and so the only adaptation that occurs in the task is the change from what appears to be the participants' prior beliefs about uncertainty (which appear to be very different between groups - i.e. the lines in Figure 2a,b,c are very different at trial 0). If the authors are interested in measuring adaptation to uncertainty, it would clearly be more useful to present participants with periods of higher or lower uncertainty.

      (4) The main factor driving the better fit of the authors' preferred model over listed alternatives seems to be the inclusion of an additive uncertainty term in the softmax-this differentiates the chosen model from the other two Kalman filter-based models that perform less well. But a similar term is not included in the RW models-given the uncertainty of a binary outcome can be estimated as p(1-p), and the RW models are estimating p, this would seem relatively straightforward to do. It would be useful to know if the factor that actually drives better model fit is indeed in the decision stage (rather than the learning stage).

    4. Reviewer #3 (Public review):

      This paper applies a computational model to behavior in a probabilistic operant reward learning task (a 3-armed bandit) to uncover differences between individuals with temporomandibular disorder (TMD) compared with healthy controls. Integrating computational principles and models into pain research is an important direction, and the findings here suggest that TMD is associated with subtle changes in how uncertainty is represented over time as individuals learn to make choices that maximize reward. There are a number of strengths, including the comparison of a volatile Kalman filter (vKF) model to some standard base models (Rescorla Wagner with 1 or 2 learning rates) and parameter recovery analyses suggesting that the combination of task and vKF model may be able to capture some properties of learning and decision-making under uncertainty that may be altered in those suffering from chronic pain-related conditions.

      I've focused my comments in four areas: (1) Questions about the patient population, (2) Questions about what the findings here mean in terms of underlying cognitive/motivational processes, (3) Questions about the broader implications for understanding individuals with TMD and other chronic pain-related disorders, and (4) Technical questions about the models and results.

      (1) Patient population

      This is a computational modelling study, so it is light on characterization of the population, but the patient characteristics could matter. The paper suggests they were hospitalized, but this is not a condition that requires hospitalization per se. It would be helpful to connect and compare the patient characteristics with large-scale studies of TMD, such as the OPPERA study led by Maixner, Fillingim, and Slade.

      (2) What cognitive/motivational processes are altered in TMD

      The study finds a pattern of alterations in TMD patients that seems clear in Figure 2. Healthy controls (HC) start the task with high estimates of volatility, uncertainty, and learning rate, which drop over the course of the task session. This is consistent with a learner that is initially uncertain about the structure of the environment (i.e., which options are rewarded and how the contingencies change over time) but learns that there is a fixed or slowly changing mean and stationary variance. The TMD patients start off with much lower volatility, uncertainty, and learning rate - which are actually all near 0 - and they remain stable over the course of learning. This is consistent with a learner who believes they know the structure of the environment and ignores new information.

      What is surprising is that this pattern of changes over time was found in spite of null group differences in a number of aspects of performance: (1) stay rate, (2) switch rate, (3) win-stay/lose-switch behaviors, (4) overall performance (corrected for chance level), (5) response times, (6) autocorrelation, (7) correlations between participants' choice probability and each option's average reward rate, (7) choice consistency (though how operationalized is not described?), (8) win-stay-lose-shift patterns over time. I'm curious about how the patterns in Figure 2 would emerge if standard aspects of performance are essentially similar across groups (though the study cannot provide evidence in favor of the null). It will be important to replicate these patterns in larger, independent samples with preregistered analyses.

      The authors believe that this pattern of findings reveals that TMD patients "maintain a chronically heightened sensitivity to environmental changes" and relate the findings to predictive processing, a hallmark of which (in its simplest form) is precision-weighted updating of priors. They also state that the findings are not related to reduced overall attentiveness or failure to understand the task, but describe them as deficits or impairments in calibrating uncertainty.

      The pattern of differences could, in fact, result from differences in prior beliefs, conceptualization of the task, or learning. Unpacking these will be important steps for future work, along with direct measures of priors, cognitive processes during learning, and precision-weighted updating.

      (3) Implications for understanding chronic pain

      If the findings and conclusions of the paper are correct, individuals with TMD and perhaps other pain-related disorders may have fundamental alterations in the ways in which they make decisions about even simple monetary rewards. The broader questions for the field concern (1) how generalizable such alterations are across tasks, (2) how generalizable they are across patient groups and, conversely, how specific they are to TMD or chronic pain, (3) whether they are the result of neurological dysfunction, as opposed to (e.g.) adaptive strategies or assumptions about the environment/task structure.

      It will be important to understand which features of patients' and/or controls' cognition are driving the changes. For example, could the performance differences observed here be attributable to a reduced or altered understanding of the task instructions, more uncertainty about the rules of the game, different assumptions about environments (i.e., that they are more volatile/uncertain or less so), or reduced attention or interest in optimizing performance? Are the controls OVERconfident in their understanding of the environment?

      This set of questions will not be easy to answer and will be the work of many groups for many years to come. It is a judgment call how far any one paper must go to address them, but my view is that it is a collaborative effort. Start with a finding, replicate it across labs, take the replicable phenomena and work to unpack the underlying questions. The field must determine whether it is this particular task with this model that produces case-control differences (and why), or whether the findings generalize broadly. Would we see the same findings for monetary losses, sounds, and social rewards? Tasks with painful stimuli instead of rewards?

      Another set of questions concerns the space of computational models tested, and whether their parameters are identifiable. An alteration in estimated volatility or learning rate, for example, can come from multiple sources. In one model, it might appear as a learning rate change and in another as a confirmation bias. It would be interesting in this regard to compare the "mechanisms" (parameters) of other models used in pain neuroscience, e.g., models by Seymour, Mancini, Jepma, Petzschner, Smith, Chen, and others (just to name a few).

      One immediate next step here could be to formally compare the performance of both patients and controls to normatively optimal models of performance (e.g., Bayes optimal models under different assumptions). This could also help us understand whether the differences in patients reflect deficits and what further experiments we would need to pin that down.<br /> In addition, the volatility parameter in the computational model correlated with apathy. This is interesting. Is there a way to distinguish apathy as a particular clinical characteristic and feature of TMD from apathy in the sense of general disinterest in optimal performance that may characterize many groups?

      If we know this, what actionable steps does it lead us to take? Could we take steps to reduce apathy and thus help TMD patients better calibrate to environmental uncertainty in their lives? Or take steps to recalibrate uncertainty (i.e., increase uncertainty adaptation), with benefits on apathy? A hallmark of a finding that the field can build off of is the questions it raises.

      (4) Technical questions about the models and results

      Clarification of some technical points would help interpret the paper and findings further:

      (a) Was the reward probability truly random? Was the random walk different for each person, or constrained?

      (b) When were self-report measures administered, and how?

      (c) Pain assessments: What types of pain? Was a body map assessed? Widespreadness? Pain at the time of the test, or pain in general?

      (d) Parameter recovery: As you point out, r = 0.47 seems very low for recovery of the true quantity, but this depends on noise levels and on how the parameter space is sampled. Is this noise-free recovery, and is it robust to noise? Are the examples of true parameters drawn from the space of participants, or do they otherwise systematically sample the space of true parameters?

      (e) What are the covariances across parameter estimates and resultant confusability of parameter estimates (e.g., confusion matrix)?

      (f) It would be helpful to have a direct statistical comparison of controls and TMD on model parameter estimates.

      (g) Null statistical findings on differences in correlations should not be interpreted as a lack of a true effect. Bayes Factors could help, but an analysis of them will show that hundreds of people are needed before it is possible to say there are no differences with reasonable certainty. Some journals enforce rules around the kinds of language used to describe null statistical findings, and I think it would be helpful to adopt them more broadly.

      (h) What is normatively optimal in this task? Are TMD patients less so, or not? The paper states "aberrant precision (uncertainty) weighting and misestimation of environmental volatility". But: are they misestimates?

      (i) It's not clear how well the choice of prior variance for all parameters (6.25) is informed by previous research, as sensible values may be task- and context-dependent. Are the main findings robust to how priors are specified in the HBI model?

    1. eLife Assessment

      This manuscript proposes a lateralized, lobe-specific brain-liver sympathetic neurocircuit regulating hepatic glucose metabolism and presents anatomical evidence for sympathetic crossover at the porta hepatis using viral tracing and neuromodulation approaches. While the topic is of important significance and the methodologies are, in principle, state-of-the-art, significant concerns regarding experimental design, incomplete methodological reporting, sparse and ambiguous labeling, and overi-nterpretation of the data substantially weaken support for the study's central conclusions, thereby limiting the study's completeness. The work will be of interest to biologists, clinicians, and physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      (1) Pseudorabies virus (PRV) tracing experiment:<br /> The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      (2) Impact on pancreas:<br /> The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      (6) How was the chemical denervation completed for the individual lobes?

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

    4. Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

    5. Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

    6. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      We sincerely thank the reviewer for recognizing the conceptual novelty of our work and for constructive comments aimed at enhancing its rigor and clarity. In response, we will carry out targeted experiments to address the points raised, including: (i) further characterization of LPGi projections to vagal and sympathetic circuits; (ii) evaluation of potential pancreatic involvement; and (ii) validation of the specificity of chemogenetic activation within the proposed circuit. We anticipate completing the revised version within 8 weeks.

      (1) Pseudorabies virus (PRV) tracing experiment:

      The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      Thank you for raising this important point. We fully agree that the liver receives both sympathetic and vagal sensory innervation, and we acknowledge that PRV-based tracing alone does not definitively distinguish between these two pathways. This represents a limitation of the original experimental design.

      Based on established anatomical literature as well as our experimental observations, vagal sensory neuron cell bodies reside in the nodose ganglion (NG), and their central projections terminate predominantly in the nucleus of the solitary tract (NTS) (Nature. 2023;623(7986):387-396; Curr Biol. 2020;30(20):3986-3998.e5.), which is located in the dorsomedial medulla. In contrast, the LPGi, together with other sympathetic-related nuclei, is predominantly distributed in the ventral medulla (Cell Metab. 2025;37(11):2264-2279.e10; Nat Commun. 2022;13(1):5079.).

      To directly assess the contribution of vagal sensory pathways, we will perform an additional PRV tracing experiment using two groups of mice: one with bilateral nodose ganglion (NG) removal and a sham-operated control group. Identical PRV injections will be delivered to the liver in both groups, and PRV labeling in the LPGi will be quantitatively compared. Preservation of LPGi labeling following NG ablation would indicate that PRV transmission occurs primarily via sympathetic, rather than vagal sensory, pathways. These data will be incorporated into the revised manuscript and are expected to be completed within 3 weeks.

      (2) Impact on pancreas:

      The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      Thank you for this important comment. We agree that the celiac ganglia (CG) provide sympathetic innervation not only to the liver but also to the pancreas, which plays a central role in glucose homeostasis through the secretion of both insulin and glucagon. Therefore, the potential pancreatic implications associated with LPGi chemogenetic manipulation worth careful consideration.

      To address this concern, we examined circulating glucagon levels following chemogenetic manipulation of the LPGi. As shown in the Supplementary Figure below, plasma glucagon (GCG) concentrations were not significantly altered at 30, 60, 90, or 120 minutes compared with control mice (n = 6), indicating that LPGi manipulation does not measurably affect glucagon secretion under our experimental conditions.

      We acknowledge that insulin secretion was not assessed in the study, which represents an important limitation given the pancreatic innervation of the CG. To further strengthen our interpretation, we are performing additional experiments in newly prepared mice to measure circulating insulin levels following LPGi manipulation. These data together with Author response image 1 below will be included in the revised manuscript upon completion.

      Author response image 1.

      Plasma concentrations of GCG in mice following LPGi GABAergic neurons activation.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      Thank you for raising this important point. A clear anatomical definition of the downstream pathways linking the brain to the liver is essential for interpreting the proposed brain-liver sympathetic circuit.

      However, the present study (Figure 4A) provides direct anatomical evidence supporting the organization of the brain–liver sympathetic neurocircuit. These observations are consistent with our recent detailed characterization of the brain-liver sympathetic circuit published in Cell Metabolism (Cell Metab. 2025;37(11):2264–2279), LPGi GABAergic neurons inhibit GABAergic neurons in the caudal ventrolateral medulla (CVLM). Disinhibition of CVLM reduces GABAergic suppression of rostral ventrolateral medulla (RVLM) neurons, which are key excitatory drivers of sympathetic tone. RVLM neurons project to sympathetic preganglionic neurons in the sympathetic chain (Syc). These neurons synapse with postganglionic sympathetic neurons in ganglia such as the celiac-superior mesenteric ganglion (CG-SMG). Postganglionic sympathetic fibers then innervate the liver, releasing NE to activate hepatic β<sub>2</sub>-adrenergic receptors and stimulate HGP.

      Together, these data establish a coherent anatomical basis for the proposed brain-liver sympathetic pathway and clarify the downstream organization relevant to the functional experiments presented here.

      Author response image 2.

      Tracing scheme (Left) and whole-mount imaging (Right) of PRV-labeled brain-liver neurocircuit. Scale bars, 3,000 (whole mount) or 1,000 (optical sections) μm.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

      Thank you for raising this important anatomical point. We fully acknowledge that the left and right celiac ganglia (CG) in mice are interconnected, and that unilateral viral injection could theoretically affect the contralateral side. The celiac–superior mesenteric ganglion (CG-SMG) complex serves as a major sympathetic hub that regulates visceral organ functions. Recent transcriptomic, anatomical, and functional studies have revealed that the CG-SMG is not a homogeneous structure but is composed of molecularly and functionally distinct neuronal populations. These populations exhibit specialized projection patterns and regulate different aspects of gastrointestinal physiology, supporting a model of modular sympathetic control. (Nature. 2025 Jan;637(8047):895-902). Therefore, we were aware of this phenomenon during the initial stages of these experiments.

      To minimize unintended spread to the contralateral CG, we took two complementary approaches.

      First, we optimized the injection strategy by using an extremely small injection volume (100 nL per site), with a very slow infusion rate (50 nL/min), and fine glass micropipettes. With these refinements, contralateral viral spread was rarely observed.

      Second, and importantly, all animals included in the final analyses were subjected to post hoc anatomical verification. After completion of the experiments, CG were collected, sectioned, and examined for viral expression. As shown in Supplementary Figure 5F, only mice in which viral expression was strictly confined to the targeted CG, with no detectable infection in the contralateral ganglion, were included in the presented data.

      Together, these measures ensure that the reported effects are attributable to local manipulation of the intended CG. We will ensure that the Methods section more explicitly details these technical precautions and that the legend for Figure S5F clearly states its role in validating injection specificity.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      We thank the reviewer for the thorough and constructive evaluation of our manuscript. In direct response, we will undertake comprehensive revisions to enhance the rigor and clarity of the study, including: (i) correcting ambiguous or misleading terminology pertaining to anatomical resolution and sympathetic circuit organization; (ii) expanding the Methods section with complete experimental details, improved image presentation, and explicit justification of our viral and genetic approaches; and (iii) strengthening data interpretation by addressing issues related to sparse PRV labeling, projection heterogeneity, and the functional implications of double-labeled neurons. All revisions are expected to be completed within 8 weeks.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      We thank the reviewer for raising these critical points regarding terminology and conceptual framing. We acknowledge that certain phrases in our original manuscript may have been overly broad or ambiguous, particularly in describing the scope of sympathetic heterogeneity and the specificity of neural projections. Due to practical constraints and the scope of our study, our investigation is focused on the brainstem, which represents the final common pathway for these lateralized commands. We acknowledge that terms referring to the cerebral hemispheres do not accurately describe our study.

      We are revising the manuscript to ensure accurate and consistent terminology and will submit the revised version with these corrections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      We sincerely thank the reviewer for these detailed and constructive comments regarding the PRV tracing experiments. We fully agree that careful presentation and interpretation of the anatomical data are essential for ensuring rigor and transparency. We address each point in detail below.

      (1) Image magnification and anatomical context of LPGi labeling

      We agree that the original images did not sufficiently convey the broader anatomical context of the LPGi. In the revised manuscript, we will replace the original panels in Figure 1 with new images that include lower-magnification overviews of the brainstem, alongside higher-magnification views of the LPGi. These images clearly delineate the LPGi with respect to established anatomical landmarks and atlas boundaries. Image contrast and resolution will also be optimized to allow unambiguous identification of PRV-labeled neurons and surrounding structures.

      (2) Sparse LPGi labeling at 5 days post-injection and methodological details

      We apologize for the omission of the detailed PRV injection protocol in the original Methods section. We deliberately used small-volume, focal injections (1 µL per liver lobe) to minimize viral spread and to restrict labeling to circuits specifically connected to the targeted hepatic region. Under these conditions, early-stage or intermediate-order upstream nuclei such as the LPGi are expected to exhibit relatively sparse labeling compared to more proximal autonomic nuclei. This information will add, including the PRV strain, viral titer, injection volume, precise injection coordinates, and surgical procedures.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      We thank the reviewer for this insightful and important question. We agree that not all neurons within the LPGi are liver-related, and we apologize that our rationale was not clearly articulated in the original manuscript.

      (1) Our decision to target GABAergic neurons in the LPGi using Gad1-Cre mice was based on prior experimental evidence rather than an assumption about the entire LPGi population. In our previous study (Cell Metab. 2025;37(11):2264-2279.e10), we performed single-cell RNA sequencing on retrogradely labeled LPGi neurons following liver tracing. These analyses revealed that the majority of liver-projecting LPGi neurons are GABAergic in nature. Based on these findings, we chose to selectively manipulate GABAergic neurons in the LPGi rather than the entire LPGi neuronal population, in order to achieve greater cellular specificity and to minimize potential confounding effects arising from heterogeneous neuron types within this region. We regret that this rationale was not clearly described in the original submission and have now revised the manuscript to explicitly state this reasoning.

      (2) In addition, we apologize for the omission of mouse strain, sex, and age information in the Methods section. These details will be fully added.

      (3) We selected AAV-based viral vectors, specifically the AAV9 serotype, due to their well-established efficiency in transducing neurons in the brainstem, relatively low toxicity, and widespread use in circuit-level chemogenetic and optogenetic studies. When combined with Cre-dependent viral constructs in Gad1-Cre mice, this approach enabled selective and reliable manipulation of LPGi GABAergic neurons.

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      We thank the reviewer for raising this important point. We agree that neurons innervating more than one liver lobe could, in principle, introduce potential confounding effects and may reflect higher-order integrative autonomic neurons.

      This consideration is consistent with a key finding of the cited study: the celiac-superior mesenteric ganglion (CG-SMG) contains molecularly distinct sympathetic neuron populations (e.g., RXFP1<sup>+</sup> vs. SHOX2<sup>+</sup>) that exhibit complementary organ projections and separate, non‑overlapping functions. Specifically, RXFP1<sup>+</sup> neurons innervate secretory organs (pancreas, bile duct) to regulate secretion, while SHOX2<sup>+</sup> neurons innervate the gastrointestinal tract to control motility. This functional segregation supports the concept of specialized autonomic modules rather than a uniform,“fight or flight”response, reinforcing the need for careful interpretation of circuit-specific manipulations. (Nature. 2025;637(8047):895-902; Neuron. Published online December 10, 2025).

      In our PRV tracing experiments, the proportion of double-labeled neurons was relatively small, suggesting that the majority of labeled LPGi neurons preferentially associate with individual hepatic lobes. Nevertheless, we recognize that activation of this minority population could contribute to broader physiological effects beyond strictly lobe-specific regulation. We acknowledge that the absence of single-cell-level resolution in the current study limits our ability to further dissect the functional heterogeneity of these projection-defined neurons, and we will explicitly state this as a limitation in the revised manuscript. We will explicitly acknowledge this possibility in the revised manuscript and included it as a limitation of the current study. We thank the reviewer for highlighting this important conceptual consideration.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      We thank the reviewer for pointing out this imprecise wording. We agree that the original phrasing was anatomically inaccurate and potentially confusing. The pathways we intended to describe involve brainstem premotor neurons that project to sympathetic preganglionic neurons in the spinal cord. These preganglionic neurons then innervate neurons in the celiac–superior mesenteric ganglia, which in turn provide postganglionic input to the liver.

      We are revising the manuscript to clearly distinguish premotor from preganglionic neurons and to describe this pathway in a manner consistent with the established organization of sympathetic autonomic circuits reported in the previous literature. The revised wording will explicitly reflect this hierarchical relay structure.

      (6) How was the chemical denervation completed for the individual lobes?

      We thank the reviewer for raising this important methodological concern. We agree that potential diffusion of 6-OHDA is a critical issue when performing lobe-specific chemical denervation, and we apologize that our original description did not sufficiently clarify how this was controlled.

      In the revised Methods section, we will provide a detailed description of the denervation procedure, including the injection volume and concentration of 6-OHDA, as well as the physical separation and isolation of individual hepatic lobes during application to minimize diffusion to adjacent tissue.

      To directly assess the specificity of the chemical denervation, we included immunofluorescence and Western blot analyses demonstrating a selective reduction of sympathetic markers in the targeted lobe, with minimal effects on non-targeted lobes. These results support the effectiveness and relative spatial confinement of the 6-OHDA treatment under our experimental conditions.

      We thank the reviewer for highlighting this point, which has helped us improve both the clarity and rigor of the manuscript.

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      We thank the reviewer for this careful and detailed evaluation of the Western blot data. We apologize that insufficient methodological detail was provided in the original submission.

      (1) We would like to clarify that the protein bands shown within each panel were derived from the same membrane. To improve transparency, we will provide full, uncropped images of the corresponding membranes in the supplementary materials. In addition, detailed information regarding protein loading amounts, gel conditions, and housekeeping controls will be added to the Methods section.

      (2) The use of different loading controls (β-actin or α-tubulin) reflects a technical consideration rather than an experimental inconsistency. In our experiments, the molecular weight of the TH (62kDa) was too close to α-tubulin (55kDa), and β-actin (42kDa) was therefore used to avoid band overlap and to ensure accurate quantification.

      (3) Regarding the GS signal shown in Figures 3F–G, we agree that the original representative image was suboptimal. This appears to be related to antibody performance rather than sample quality. To address this, we are repeating the GS Western blot using a newly validated antibody. The original tissue samples had been aliquoted and stored at −80 °C, allowing reliable re-analysis. This work will be done in 8 weeks.

      (4) All Western blot experiments were detected using chemiluminescence, and membrane stripping and reprobing procedures are now explicitly described in the Methods section.

      We thank the reviewer for highlighting these issues, which significantly improve the rigor and clarity of our data presentation.

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

      We thank the reviewer for pointing out these important and highly relevant references that were inadvertently omitted in our initial submission. The studies by Stanley et al. (Proc Natl Acad Sci U S A, 2010), Torres et al. (Am J Physiol Regul Integr Comp Physiol, 2021), and Desmoulins et al. (Auton Neurosci, 2025) represent key PRV-based retrograde tracing work that has mapped central neural circuits innervating the liver and thus provide essential context for our anatomical analyses.

      We agree that inclusion of these studies is necessary to properly situate our findings within the existing literature. Accordingly, we will incorporate citations to these references in the revised manuscript and discuss their relationship to our results.

      Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      We sincerely thank the reviewer for the positive and constructive feedback, which will significantly enhance both the methodological rigor and the broader biological interpretation of our study. In direct response, we will revise the Discussion to elaborate on the potential physiological advantages of a lateralized and lobe-specific pattern of liver innervation. Furthermore, we will expand the Methods section to include a comprehensive description of the quantitative analysis applied to PRV-labeled neurons. Together, these revisions will strengthen the manuscript’s clarity, depth, and relevance to researchers in hepatic metabolism, regeneration, and disease. We expect to complete all updates within 8 weeks.

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      We thank the reviewer for this helpful suggestion. We will add a detailed description of the quantitative methods used to analyze PRV-labeled neurons in the revised Methods section. This includes information on the counting criteria, the brain regions analyzed, how the regions of interest were delineated, and the normalization procedures applied to obtain the reported neuron counts.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

      We appreciate the reviewer’s suggestion. We will expand the Discussion to include a paragraph addressing the potential biological significance of lateralized liver innervation. We highlight that this asymmetric organization could allow for more precise, lobe-specific regulation of hepatic metabolism, enable integration of distinct physiological signals, and potentially provide robustness against perturbations. These points will discuss in the revised manuscript.

      Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      We sincerely appreciate the reviewer's thoughtful evaluation and fully agree that findings derived from a single-species model must be interpreted with caution in relation to human physiology. In direct response, we will revise the manuscript to explicitly clarify that all experimental data were obtained in mice and to provide a discussion of the limitations regarding direct extrapolation to humans. Concurrently, we will expand the Discussion section by integrating our findings with recent human and translational studies, including a multicenter clinical trial demonstrating that catheter-based endovascular denervation of the celiac and hepatic arteries significantly improved glycemic control in patients with poorly controlled type 2 diabetes, without major adverse events (Signal Transduct Target Ther. 2025;10(1):371). While our current work focuses on defining the anatomical organization and functional asymmetry of this circuit in mice, the clinical findings suggest that the core principles, sympathetic control of hepatic glucose metabolism via CG-liver pathways, may be conserved and of translational relevance. Additionally, we will clarify the interpretation of tyrosine hydroxylase labeling and expand the discussion of hepatic sensory and parasympathetic innervation, acknowledging their important roles in liver–brain communication and identifying them as key directions for future research. Collectively, these revisions will provide a more balanced, clinically informed, and rigorous framework for interpreting our findings, and we aim to complete all updates within 8 weeks.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      We thank the reviewer for this suggestion. We agree that the species should be clearly indicated. The findings presented in this study were obtained in mice using tissue clearing and whole-organ imaging approaches. Due to technical limitations, these observations are currently limited to the mouse strain. We will update the title and clarified the species used throughout the manuscript.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      We thank the reviewer for pointing this out. We acknowledge that tyrosine hydroxylase (TH) labels not only sympathetic fibers but also a subset of sensory fibers. We will add a limitation of this point in the revised manuscript. In addition, ongoing experiments using retrograde PRV labeling from the liver, combined with sectioning, are being used to distinguish sympathetic fibers from vagal and dorsal root ganglion–derived sensory fibers. These data will be included in a forthcoming update of the manuscript and are expected to be completed in approximately 6 weeks.

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      We thank the reviewer for this suggestion. Previous studies largely relied on electrical stimulation to modulate liver innervation, which provides relatively coarse control of neural activity (Eur J Biochem. 1992;207(2):399-411). By contrast, our use of chemogenetic and optogenetic approaches allows selective, cell-type–specific manipulation of LPGi neurons. We will revise the Discussion to place our functional data in the context of prior work, highlighting how these more precise approaches improve understanding of the contribution of liver-innervating neurons to hyperglycemia.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

      We thank the reviewer for this insightful comment and agree that chemical sympathetic denervation with 6-OHDA may induce compensatory changes in non-sympathetic hepatic inputs, including sensory and parasympathetic (vagal) innervation. As the reviewer correctly points out, increased LPGi cFOS activity may reflect afferent nerve engagement rather than solely compensatory sympathetic drive.

      More broadly, we agree that the central nervous system functions as an integrated homeostatic network that continuously processes diverse afferent signals, including hepatic sensory and vagal inputs, as well as other interoceptive cues. From this perspective, the LPGi cFOS changes observed in our study likely represent one component of a complex integrative response rather than evidence for a single dominant pathway.

      We acknowledge that the present study did not directly assess hepatic sensory or parasympathetic innervation, which represents a limitation in scope. In the revised manuscript, we will expand the Discussion to explicitly note this limitation and provide a more balanced consideration of potential crosstalk among sympathetic, sensory, and parasympathetic pathways in shaping LPGi activity following hepatic denervation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Although the findings are interesting, this reviewer has major concerns about the experimental design, methodology, results, and interpretation of the data. Experimental details are lacking, including basic information (age, sex, strain of mice, procedures, magnification, etc.).

      We thank the reviewer for this important recommendation. We agree that comprehensive reporting of experimental details is essential for rigor and reproducibility.

      In the revised manuscript, we will add complete information regarding mouse strain, sex, age, and sample size for each experiment. In addition, detailed descriptions of surgical procedures, viral constructs, injection parameters, imaging magnification, and analysis methods have been incorporated into the Methods section.

      These revisions ensure that all experiments are described with sufficient technical detail and clarity to allow accurate interpretation and replication of our findings.

      Reviewer #3 (Recommendations for the authors):

      Addressing a few questions might help:

      (1) The study found that liver-associated LPGi neurons are predominantly GABAergic. It would be informative to molecularly characterize the PRV-traced, liver-projecting LPGi neurons to determine their neurochemical phenotypes.

      We thank the reviewer for this insightful suggestion. We agree that molecular characterization of liver-projecting LPGi neurons is important for understanding their functional identity.

      This issue has been addressed in detail in our recent study (Cell Metab. 2025;37(11):2264-2279.e10), in which we performed single-cell RNA sequencing on retrogradely traced LPGi neurons connected to the liver. These analyses demonstrated that the majority of liver-projecting LPGi neurons are GABAergic, with a defined transcriptional profile distinct from neighboring non–liver-related populations.

      Based on these findings, the current study selectively targets GABAergic LPGi neurons using Gad1-Cre mice. We are now explicitly referencing and summarizing these molecular results in the revised manuscript to clarify the neurochemical identity of the PRV-traced LPGi neurons.

      (2) Is it possible to do a local microinjection of a sodium channel blocker (e.g., lidocaine) or an adrenergic receptor antagonist into the porta hepatis? That would potentially provide additional evidence for the porta hepatis as the functional crossover point.

      We appreciate the reviewer’s thoughtful suggestion. While pharmacological blockade at the porta hepatis could modulate local neural activity, the proposed approach may not fully capture the distinction between ipsilateral and contralateral inputs, and may not conclusively establish neural crossover at this particular site.

      In our view, the anatomical evidence provided by whole-mount tissue clearing, dual-labeled tracing, and direct visualization of decussating nerve bundles at the porta hepatis offers a more definitive demonstration of sympathetic crossover. Pharmacological blockade would affect both crossed and uncrossed fibers simultaneously and therefore would not specifically resolve the anatomical organization of this decussation.

      Nevertheless, we agree that functional interrogation of the porta hepatis represents an interesting direction for future work, and we will now acknowledge this possibility in the Discussion.

      (3) It is possible to investigate the effects of unilateral LPGi manipulation or ablation of one side of CG/SMG on liver metabolism, such as hyperglycemia?

      We thank the reviewer for this important suggestion. We agree that unilateral ablation or silencing of the CG-SMG could provide additional insight into lateralized sympathetic control of liver metabolism.

      However, precise and selective ablation of one side of the CG-SMG through 6-OHDA without affecting the contralateral ganglion or adjacent autonomic structures remains technically challenging, particularly given the anatomical connectivity between the two sides. We are currently optimizing approaches to achieve reliable unilateral manipulation.

      If successful within the revision timeframe, we will include these experiments and corresponding metabolic analyses in the revised manuscript. If not, we will explicitly discuss this experimental limitation and the predicted metabolic consequences of unilateral CG-SMG ablation as an important direction for future studies. This work will be done in 6 weeks.

      Reviewer #4 (Recommendations for the authors):

      In the abstract and elsewhere, the use of the term 'sympathetic release' is unclear - do you mean release of nerve products, such as the neurotransmitter norepinephrine? This should be more clearly defined.

      We thank the reviewer for pointing out this ambiguity. We agree that the term “sympathetic release” was imprecise. In the revised manuscript, we will explicitly refer to the release of sympathetic neurotransmitters, primarily norepinephrine, from postganglionic sympathetic fibers.

      We will revise the wording throughout the manuscript to ensure accurate and consistent terminology and to avoid potential confusion regarding the underlying neurobiological mechanisms.

    1. eLife Assessment

      The findings are important, as they identify MIRO1 as a central regulator linking mitochondrial positioning and respiratory chain function to VSMC proliferation, neointima formation, and human vasoproliferative disease. Overall, the strength of evidence is convincing, with comprehensive in vivo and in vitro data, including human cells and added bioenergetic analyses, that broadly support the main claims despite some remaining limitations in mechanistic and mitochondrial assays.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.<br /> The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima, and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied, and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.

      Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      Weaknesses:

      (1) Figure 3:

      I appreciate the system used to assess mitochondrial distribution; however, I believe that time-lapse microscopy to evaluate mitochondrial movements in real time should be mandatory. The experimental timing is compatible with time-lapse imaging, and these experiments will provide a quantitative estimation of the distance travelled by mitochondria and the fraction of mitochondria that change position over time. I also suggest evaluating mitochondrial shape in control and MIRO1-/- VSMC to assess whether MIRO1 absence could impact mitochondrial morphology, altering fission/fusion machinery, since mitochondrial shape could differently influence the mobility.

      Mitochondrial motility experiments. WT and Miro1-/- VSMCs were transiently transfected with mito-ds-red and untargeted GFP adenoviruses to fluorescently label mitochondria and cytosol, respectively. Live-cell fluorescence confocal microscopy was used to acquire mitochondrial images at one-minute intervals over a 25-30-minute period. WT cells exhibited dynamic reorganization of the mitochondrial network, whereas Miro1-/- VSMCs displayed minimal mitochondrial movement, characterized only by limited oscillatory behavior without network remodeling (Supplemental Video 1).

      Mitochondrial shape (form factor) was assessed by confocal microscopy in WT and Miro1-/- VSMCs. Analysis of the mitochondrial form factor (defined as the ratio of mitochondrial length to width) during cell cycle progression revealed morphological changes in wild type (WT) cells, characterized by an increase in form factor. In contrast, Miro1-/- cells exhibited no significant alterations in mitochondrial morphology (Figure 3- Figure supplement 1B).

      (2) Figure 6:

      The evidence of MIRO1 ablation on cristae remodeling is solid; however, considering that the mechanism proposed to explain the finding is the modulation of MICOS/MIB complex, as shown in Figure 6D, I suggest performing EM analysis in each condition. In my mind, Miro1 KK and Miro1 TM should lead to different cristae phenotypes according to the different impact on MICOS/MIB complex assembly. Especially, Miro1 TM should mimic Miro1 -/- condition, while Miro1 KK should drive a less severe phenotype. This would supply a good correlation between Miro1, MICOS/MIB complex formation and cristae folding.

      I also suggest performing supercomplex assembly and complex I activity with each plasmid to correlate MICOS/MIB complex assembly with the respiratory chain efficiency.

      Complex I activity assays revealed that overexpression of MIRO1-WT fully restored enzymatic activity in MIRO1-/- cells, whereas MIRO1-KK provided partial rescue. In contrast, a MIRO1 mutant lacking the transmembrane domain failed to restore activity and resembled the Miro1-/- phenotype (Figure 6- Figure supplement 2).

      The Complex I activity in each Miro1 mutant correlated with the degree of MICOS/MIB complex assembly in pulldown assays, implying a functional link between Miro1 and mitochondrial cristae organization.

      Moreover, an in-gel Complex V activity assay was performed to evaluate the enzymatic activity of mitochondrial ATP synthase in a native gel following electrophoresis. To normalize the activity signal, a Blue Native PAGE of the same samples was probed for the ATP5F1 subunit. A modest, yet statistically significant reduction in Complex V activity was observed in Miro1-/- cells (Figure 6- Figure supplement 1).

      (3) I noticed that none of the in vitro findings have been validated in an in vivo model. I believe this represents a significant gap that would be valuable to address. In your animal model, it should not be too complex to analyze mitochondria by electron microscopy to assess cristae morphology. Additionally, supercomplex assembly and complex I activity could be evaluated in tissue homogenates to corroborate the in vitro observations.

      We appreciate the reviewer’s comment. However, our currently available samples have been processed by light microscopy and are therefore not suitable for embedding for light for electron microscopy.

      (4) I find the results presented in Figure S7 somewhat unclear. The authors employ a pharmacological strategy to reduce Miro1 and validate the findings previously obtained with the genetic knockout model. They report increased mitophagy and a reduction in mitochondrial mass. However, in my opinion, these changes alone could significantly impact cellular metabolism. A lower number of mitochondria would naturally result in decreased ATP production and reduced mitochondrial respiration. This, in turn, weakens the proposed direct link between Miro1 deletion and impaired metabolic function or altered electron transport chain (ETC) activity. I believe this section would benefit from additional experiments and a more in-depth discussion.

      We initially conducted experiments using the MIRO1 reducer to explore the translational potential of our findings. These experiments aimed to provide a foundation in vivo studies. However, despite multiple attempts, we were unable to demonstrate a significant effect of MIRO1reducer, delivered via a Pluronic gel, on the mitochondria of the vascular wall. Of note, he role of MIRO1 in mitophagy has been well-established in several studies (for example, PMID: 34152608), which show that genetic deletion of Miro1 delays the translocation of the E3 ubiquitin ligase Parkin onto damaged mitochondria, thereby reducing mitochondrial clearance in fibroblasts and cultured neurons. Furthermore, loss of Miro1 in the hippocampus and cortex increases mitofusin levels with the appearance of hyperfused mitochondria and activation of the integrated stress response. Thus, MIRO1 deletion in genetic models does not result in a substantial reduction of mitochondria but causes hyperfused mitochondria. The rationale for developing the MIRO1 reducer stems from genetic forms of Parkinson’s disease, where Miro1 is retained in PD cells but degraded in healthy cells following mitochondrial depolarization (PMID: 31564441). Thus, the degradation of mutant MIRO1 by the reducer does not phenocopy the effects of genetic MIRO1 depletion. Thus, we believe the data with the reducer demonstrate that MIRO1 can be acutely targeted in vitro, but the mechanism of action (as the reviewer points out, the reduction of mitochondrial mass may lead to decreased ATP levels, potentially reducing cell proliferation) differs from that of chronic genetic deletion. In fact, we observe somewhat increased mitochondrial length in MIRO1-/- cells. We acknowledge that this is complex and have revised the paragraph to clarify the use of the MIRO1 reducer.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture, and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses is suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodeling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach, assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo, and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) There is a consistent lack of reporting across figure legends, including group sizes, n numbers, how many independent experiments were performed, or whether the data is mean +/- SD or SEM, etc. This needs to be corrected.

      These data were added in the revised manuscript.

      (2) The in vivo carotid injury experiments are in male mice fed a high-fat diet; this should be explicitly stated in the abstract, as it's unclear if there are any sex- or diet-dependent differences. Is VSMC proliferation/neointima formation different in chow-fed mice after carotid injury?

      This is an important point, and we appreciate the feedback. In this model, the transgene is located on the Y chromosome. As a result, only male mice can be studied. However, in our previous experiments, we have not observed any sex-dependent changes in neointimal formation. Additionally, please note that smooth muscle cell proliferation in neointimal formation is enhanced in models of cholesterol-fed mice on a high-fat diet.

      (3) The main body of the methods section is thin, and it's unclear why the majority of the methods are in the supplemental file. The authors should consider moving these to the main article, especially in an online-only journal.

      We thank the reviewer for this suggestion. We moved the methods to the main manuscript.

      (4) Certain metabolic analyses are suboptimal, including ATP concentration and Complex I activity measurements. The measurement of ATP/ADP and ATP/AMP ratios for energy charge status (luminometer or mass spectrometry), while high-resolution respirometry (Oroboros) to determine mitochondrial complex I activity in permeabilized VSMCs would be more informative.

      ATP/ADP and ATP/AMP ratios were assessed on samples from WT and Miro1-/- VSMCs using an ATP/ADP/AMP Assay Kit (Cat#: A-125) purchased from Biomedical Research Service, University at Buffalo, New York). Miro1-/- samples exhibited reduced ATP levels accompanied by elevated concentrations of ADP and AMP. As a result, both ATP/ADP and ATP/AMP ratios were significantly lower in MIRO1-/- cells compared to WT, indicating impaired cellular energy homeostasis (Figure 5B, C).

      (5) The statement that 'mitochondrial mobility is not required for optimal ATP production' is poorly supported. XF Seahorse analysis should be performed with nocodazole and also following MIRO1 reconstitution +/- EF hands.

      To evaluate the metabolic effects of Nocodazole, we conducted Seahorse metabolic assays on vascular smooth muscle cells with various conditions (VSMCs). We used WT VSMCs, Miro1-/- VSMCs, and Miro1-/- VSMCs that expressed either MIRO1-WT, KK, or ΔTM mutants.Our results demonstrate that Nocodazole exposure did not compromise mitochondrial respiratory activity. However, Miro1-/- VSMCs displayed a trend toward reduced basal and maximal mitochondrial respiration when compared to WT cells. This deficit was only partially corrected by the expression of the MIRO1-KK mutant. In contrast, reintroducing MIRO1-WT through adenoviral delivery fully restored mitochondrial respiration to normal levels (Figure 5- Figure supplement 1).

      (6) The authors should consider moving MIRO1 small molecule data into the main figures. A lot of value would be added to the study if the authors could demonstrate that therapeutic targeting of MIRO1 could prevent neointima formation in vivo.

      We appreciate the reviewer's comment and attempted the suggested in vivo experiments using the commercially available Miro1 reducer. For these experiments, we used a pluronic gel to deliver the reducer to the adventitial area surrounding the carotid artery. Despite numerous attempts to optimize the experimental conditions, we were unable to reliably detect a significant effect of the reducer on mitochondria in the vascular wall.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are potentially useful for understanding the importance of mitochondrial positioning and function in this specific cell type within health and disease contexts, the evidence presented appears incomplete, with key bioenergetic and mechanistic claims lacking adequate support.

      Strengths:

      (1)The study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      (2) It explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a potentially significant area for both basic and translational biology.

      (3) The use of both in vivo and in vitro systems provides a potentially useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      (1) The central claim that MIRO1 loss impairs mitochondrial bioenergetics is not convincingly demonstrated, with only modest changes in respiratory parameters and no direct evidence of functional respiratory chain deficiency.

      (2) The proposed link between MIRO1 and respiratory supercomplex assembly or function is speculative, lacking mechanistic detail and supported by incomplete or inconsistent biochemical data.

      (3) Key mitochondrial assays are either insufficiently controlled or poorly interpreted, undermining the strength of the conclusions regarding oxidative phosphorylation.

      (4) The study does not adequately assess mitochondrial content or biogenesis, which could confound interpretations of changes in respiratory activity.

      (5) Overall, the evidence for a direct impact of MIRO1 on mitochondrial respiratory function in the experimental setting is weak, and the conclusions overreach the data.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1)  Throughout the manuscript, the authors incorrectly use "mobility" to describe the active transport of mitochondria. The appropriate term is "mitochondrial motility," which refers to active, motor-driven movement. "Mobility" implies passive diffusion and is not scientifically accurate in this context.

      (2) "Super complex" should be consistently written as "supercomplex," in line with accepted mitochondrial biology terminology.

      We thank the reviewer for this comment and revised the text accordingly.

      (3) A significant limitation of the in vivo model is the mild phenotype observed, which is expected from an inducible knockout system. The authors should clarify whether a constitutive, tissue-specific knockout was considered and, if not, whether embryonic lethality or another limitation prevented its generation.

      This genetic model was originally developed by Dr. Janet Shaw at the University of Utah. In the original publication, Miro1 was constitutively knocked out in neurons. Germline inactivation of Miro1 was achieved by crossing mice harboring the Miro1F allele with a mouse line expressing Cre recombinase under the control of the hypoxanthine-guanine phosphoribosyltransferase (HPRT) promoter. Mating Miro1+/− mice resulted in Miro1−/− animals, which were cyanotic and died shortly after birth. Due to this outcome, we opted to develop an inducible, smooth muscle-specific model. Additionally, we considered testing whether the acute use of an inhibitor or a knockdown system targeting Miro1 could be evaluated as a potential therapeutic approach.

      (4) In Figure 1A and S1A, the authors use Western blotting to validate the knockout in the aorta and IHC in carotid arteries. The choice of different methods does not seem justified, and qPCR data are shown only for the aorta. IHC appears to be suboptimal for assessing MIRO1 levels in vascular tissue due to high autofluorescence, and IHC in Figure S1A is merely qualitative, with no quantification provided.

      We present complementary approaches to validate the deletion of Miro1. For Western blot analysis, we used the aorta because it provides more material for analysis. The autofluorescence observed via immunofluorescence is characteristic of elastin fibers within the media layer, making our results typical for this technique. As shown in Figure 1- Figure supplement 1, our data demonstrate a significant decrease, if not a complete knockout, of the target protein specifically in smooth muscle cells.

      (5) In Figure 1G, the bottom left panel (magnification) shows a lower green signal than the top left panel, suggesting these may have been collected with different signal intensity. This raises concerns about image consistency and representation.

      Top images in Figure 1G are taken at magnification 63x. Bottom images were made at magnification 20x. The intensity is different between the two magnifications, but similar between genotypes.

      (6) In Figure S3, the sampling is uncontrolled: the healthy subject and the patient differ markedly in age. The claim of colocalization is not substantiated with any quantitative analysis.

      As outlined in the Methods section, our heart samples were obtained from LVAD patients or explanted hearts from transplant recipients. Due to the limited availability of such samples, there is indeed a difference in age between the healthy subject and the patient. While we acknowledge this limitation, the scarcity of samples made it challenging to control for age. Additionally, we determined that performing a quantitative analysis of colocalization would not yield robust or meaningful data given the constraints of our sample size and variability. 

      (7) Figure S4A lacks statistical analysis, which is necessary for interpreting the data shown.

      This appears to be a misunderstanding. In this manuscript, we do present statistically significant differences and focus on those that are biologically meaningful. Specifically, we highlight differences between PDGF treatment versus no treatment within the same genotype, as well as differences between the two genotypes under the same treatment condition (control or PDGF treatment). In this particular case, there is only a statistical difference between WT+PDGF and SM-Miro1-/, but since this is not a meaningful comparison, it is not shown. Please note that this approach applies to all figures in the manuscript. Including all comparisons—whether statistically significant or not, and whether biologically meaningful or not—may appear rigorous but in our opinion, ultimately detracts from the main message of this paper.

      (8) The authors state, "given the generally poor proliferation of VSMCs from SM-MIRO1-/- mice, in later experiments we used VSMCs from MIRO1fl/fl mice and infected them with adenovirus expressing cre." This is not convincing, especially since in vivo cre efficiency is generally lower than in vitro. Moreover, the methods indicate that "VSMCs from littermate controls were subjected to the same procedure with empty vector control adenovirus," yet in Figure 2A, the control appears to be MIRO1fl/fl VSMCs transduced with Ad-EV. The logic and consistency of the controls used need clarification.

      For the initial experiments, cells were explanted from SM-MIRO1-/- mice (Figure 2- Figure supplement 1). In these mice, Cre recombination had occurred in vivo, and the cells exhibited very poor growth. In fact, their growth was so limited that we decided not to pursue this experimental approach after three independent experiments.

      For subsequent experiments, cells were explanted from Miro1fl/fl mice and passaged several times, which allowed us to generate the number of cells required for the experiments (Figure 2B). Once sufficient Miro1fl/fl cells were obtained, they were treated with adenovirus expressing Cre, as described in the Methods section. Control cells were treated with an empty vector adenovirus. To clarify, the control cells are Miro1fl/fl cells infected with an empty vector adenovirus, while the MIRO1-/- cells are Miro1fl/fl cells infected with adenovirus expressing Cre. The statement that “littermate controls were used” is incorrect as in fact, Miro1fl/fl cells from the same preparation were either infected with an empty vector adenovirus, or with adenovirus expressing Cre. As mentioned, the knockdown was confirmed by Western blotting.

      (9) Figure 2C shows a growth delay in MIRO1-/- cells. Have the authors performed additional time points to determine when these cells return to G1 and quantify the duration of the lag?

      This is an excellent suggestion. So far, we have not performed this experiment.

      (10) In the 24 h time point of Figure 2C, MIRO1-/- cells appear to be cycling, yet no cyclin E signal is detected. How do the authors explain this inconsistency? Additionally, in Figure 2H, the quantification of cyclin E is unreliable, given that lanes 3 and 4 show no detectable signal.

      We agree with the reviewer—the inconsistency is driven by the exposure of the immunoblot presented. We revisited the data, reviewed the quantification, and performed an additional experiment. We are now presenting an exposure that demonstrates levels of cyclin E (Figure 2G).

      (11) In Figure 3D, the authors present mitochondrial probability map vs. distance from center curves. How was the "center" defined in this analysis? Were radial distances normalized across cells (e.g., to the cell radius or maximum extent)? If not, variation in cell and/or nucleus size or shape could significantly affect the resulting profiles. No statistical analysis is provided for this assessment, which undermines its quantitative value. Furthermore, the rationale behind the use of mito95 values is not clearly explained.

      The center refers to the center of the microchip's Y-shaped pattern, to which each cell is attached. Since all Y-shapes on the chip are identical in size, normalization is not required. The size of the optimal Y-shapes was tested as recommended by CYTOO. For further context, please refer to the papers by the Kittler group.

      Additionally, a graph demonstrating the percentage of mitochondria localized at specific distances can be produced for any given distance. Notably, the further from the center of the chip, the more pronounced the differences become.

      (12) The authors apply a 72 h oligomycin treatment to assess proliferation and a 16 h treatment to measure ATP levels. This discrepancy in experimental design is not justified in the manuscript. The length of treatment directly impacts the interpretation of the data in Figures 4C, 4D, and 4E, and needs to be addressed.

      Thank you for this comment. We have performed additional experiments to align these time points. In the revised manuscript, we now present proliferation and ATP production measured at the same time point (Figure 4A, B for proliferation and ATP levels).

      (13) The manuscript repeatedly suggests that MIRO1 loss causes a defect in mitochondrial ATP production, yet no direct demonstration of a bioenergetic defect is provided. The claim relies on a modest decrease in supercomplex species (of undefined composition) and a mild reduction in complex I activity that does not support a substantial OXPHOS defect. Notably, the respirometry data in Figure 5I do not align with the BN-PAGE results in Figure 6I. There is increasing evidence that respiratory chain supercomplexes do not confer a catalytic advantage. The authors should directly assess the enzymatic activities of all respiratory complexes. Reported complex I activity in MIRO1-/- cells appears rotenone-like (virtually zero, figure 3K) or ~30% residual (Figure 3L), suggesting a near-total loss of functional complex I, which is not reflected in the BN-PAGE. Additionally, complex I activity has not been normalized to a mitochondrial reference, such as citrate synthase.

      Given that we work in primary cells and are limited by the number of cells we can generate, we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants (Figure 6- Figure supplement 1). Please note that the addition of Rotenone abolishes the slope of NADH consumptions (Figure 6- Figure supplement 2F).

      While the ETC1 activity is measured in Fig. 6K, the blue native gel shown in Figure 6I is performed without substrate and thus, indicative of protein complex abundance rather than complex activity.

      In additional experiments, we normalized the activity to citrate synthase as requested.

      (14) In the methods section, the complex I activity assay is incorrectly described: complex I is a NADH dehydrogenase, so the assay measures NADH oxidation, not NADPH.

      We thank the reviewer for his comment and revised the manuscript accordingly.

      (15) The authors have not assessed mitochondrial mass, which is a critical omission. Differences in mitochondrial biogenesis or content could underlie several observed phenotypes and should be controlled for.

      A qPCR assay was used to assess mitochondrial DNA copy number in WT and Miro1-/- VSMCs. We determined the abundance of COX1 and MT-RNR1 DNA as mitochondrial gene targets and NDUFV DNA as the nuclear reference gene. While the results in Miro1-/- cells were highly variable, no statistically significant reduction of copy numbers was detected (Figure 3- Figure supplement 1B).

      (16) Complex IV signal is missing in Figure 6I. Its omission is not acknowledged or explained.

      Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV.

      (17) Figure 6D does not appear representative of the quantifications shown. C-MYC signal is visibly reduced in the mutant, consistent with the lower levels of interactors such as Sam50 and NDUFA9. Additionally, the SDHA band is aligned at the bottom of the blot box. The list of antibodies used, and their catalog number is missing, or it was not provided to the reviewers. It seems plausible that the authors used a cocktail antibody set (e.g., Abcam ab110412), which includes anti-NDUFA9. This would contradict the claim of reduced complex I and SC levels, as the steady-state levels of NDUFA9 appear unchanged.

      We acknowledge that the expression of the myc-MIRO1 mutant is lower compared to myc-MIRO1 WT or myc-MIRO1 KK. Achieving identical expression levels when overexpressing multiple MIRO1 constructs is challenging. We agree that the lower expression of this mutant contributes to a reduced pull-down. Our quantification shows a reduction in association, although it is not statistically significant.

      A list of the antibodies was provided in the Methods section.

      We would like to clarify that we did not use an antibody cocktail in our experiments.

      (18) The title of Figure 6, "Loss of Miro1 leads to dysregulation of ETC activity under growth conditions," is vague. The term "dysregulation" should be replaced with a more specific mechanistic descriptor-what specific regulatory defect is meant?

      We thank the reviewer for this suggestion and rephrased the title.

      (19) In the results text for Figure 6, the authors state: "These data demonstrate that MIRO1 associates with MIB/MICOS and that this interaction promotes the formation of mitochondrial super complexes and the activity of ETC complex I." This conclusion is speculative and not mechanistically supported by the data presented.

      We appreciate the reviewer's feedback. We have revised the text to clarify the relationship between MIRO1, MIB/MICOS, supercomplex formation, and ETC activity. The updated text now states: "These data demonstrate that MIRO1 associates with MIB/MICOS. Additionally, MIRO1 promotes the formation of mitochondrial supercomplexes and enhances the activity of ETC complex I.”

      (20) In Figure 7A, it is unclear what the 3x siControl/siMiro1 pairs represent-are these different cell lines or technical replicates of the same line? No loading control is shown. If changes in mitochondrial protein abundance are being evaluated, using COX4 as a loading control is inappropriate. The uneven COX4 signal across samples further complicates interpretation

      Please note that we used primary cells, not cell lines. The three siControl/siMiro1 pairs represent independent cell isolations, each transfected with either siControl or. siMIRO1 mRNA. While the possibility of a difference in mitochondrial mass is an interesting question, the primary objective of this experiment is to demonstrate that the technique effectively results in the knockdown of Miro1, which is exclusively localized to mitochondria and not present in the cytosol. As such, we believe that Cox4 serves as a reasonable loading control. Although Miro1 knockdown may lead to a reduction in mitochondrial mass, the focus of this experiment is not to assess mitochondrial mass but to confirm the reduction in Miro1 protein levels on mitochondria. We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (Author response image 1).

      Author response image 1.

      (21) Figure 7G is difficult to interpret. Why did the authors choose to use a sensor-based method instead of the chemiluminescent assay to measure ATP in these samples?

      Both methods were employed to assess ATP levels in human samples. ATP measurements obtained with luminescent assay are provided.

    1. eLife Assessment

      This manuscript provides useful insights into how the brain can simultaneously represent events and the times when they occurred. The results include a comparison between two different basis functions for temporal selectivity and how these generate different predictions for the dynamics of neural populations. The conclusions are partly incomplete because of questions such as the impact of the linear separability assumption and whether joint encodings of event type and time can be made without it.

    2. Joint Public Review:

      Quite obviously, the brain encodes "time", as we are able to tell if something happened before or after something else. How this is done, however, remains essentially not understood. In the context of Working Memory tasks, many experiments have shown that the neural activity during the retention period "encodes" time, besides the stimulus to be remembered; that is, the time elapsed from stimulus presentation can be reliably inferred from the recordings, even if time per se is not important for the task. This implies 'mixed selectivity', in the weak sense of neural activity varying with both stimulus identity and time elapsed (since presentation).

      In this paper, the authors investigate the implications of a specific form of such mixed selectivity, that is, conjunctive coding of what (stimulus) and when (time) at the single-neuron level, on the resulting dynamics of the population activity when 'viewed' through linear dimensionality-reduction techniques, essentially Principal Component Analysis (PCA). The theoretical/modeling results presented provide a useful guide to the interpretation of the experimental results; in particular, with respect to what can, or cannot, be rightfully inferred from those experimental results (using PCA-like techniques). The results are essentially theoretical in nature; there are, however, some conclusions that require a more precise justification, in my opinion. More generally, as the authors themselves discuss in the paper, it is not clear how to generalize this coding scheme to more complicated, but behaviorally and cognitively relevant, situations, such as multi-item WM or WM for sequences.

      (1) It is unclear to me how the conjunctive code that the authors use (i.e., Equation (3)) is constrained by the theoretical desiderata (i.e., compositionality) they list, or whether it is simply an ansatz, partly motivated by theoretical considerations and experimental observations.

      The "what" part: What the authors mean by "relationships" between stimuli is never clearly defined. From their argument (and from Figure 1b), it would seem that what they mean is "angles" between population vectors for all pairs of stimuli. If this is so, then the effect of the passing time can only amount to a uniform rescaling of the components of the population vector (i.e., it must be a similarity transformation; rotations are excluded, if the linear-decoder vectors are to be time-independent); the scaling factor, then, must be a strictly monotonous function of time (increasing or decreasing), if one is to decode time. In other words, the "when" receptive fields must be the same for all neurons.

      The "when" part: The condition, \tau_3=\tau_1+\tau_2, does not appear to be used at all. In fact, it is unclear (to me at least) whether the model, as it is formulated, is able to represent time intervals between stimuli.

      (2) For the specific case considered, i.e., conjunctive coding, it would seem that one should be able to analytically work out the demixed PCA (see Kobak et al., 2016). More generally, it seems interesting to compare the results of the PCA and the demixed PCA in this specific case, even just using synthetic data.

      (3) In the Section "Dimensionality of neural trajectories...", there is some claim about how the dimensionality of the population activity goes up with the observation window T, backed up by numerical results that somehow mimic the results of Cueva et al. (2020) on experimental data. Is this a result that can be formally derived? Related to this point, it would be useful to provide a little more justification for Equation (17). Naively, one would think that the correlation matrix of the temporal component is always full-rank nominally, but that one can get excellent low-rank approximations (depending on T, following your argument).

    1. eLife Assessment

      The authors provide a scholarly review of intracranial research into the neural correlates of consciousness (NCCs). To our knowledge, this is the first such review, and it therefore may become a must-read for anyone working in the field of consciousness research. It is not so persuasive that intracranial recordings are better suited to identifying pure NCCs than other methods, which appears a problem instead solved through novel paradigms and better-developed theories - but this no doubt reflects an in-depth, timely, and insightful contribution to the literature.

    2. Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

    4. Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

    1. eLife Assessment

      This important study establishes a workflow based on environmental sampling for the discovery of bacteriophages capable of infecting antibiotic-resistant pathogens. The experimental design, analysis, and results demonstrating the effectiveness of the workflow are convincing, although a broader sampling scheme and more careful framing of the data within the current limitations of viral taxonomy could strengthen the work. This study will interest researchers working on bacterial infections, environmental microbiology, and phage-based alternatives for addressing antimicrobial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Pathogen-Phage Geomapping to Overcome Resistance," Do et al. present an impressive demonstration of using geographical sampling and metagenomics to guide sample choice for enrichment in human-associated microbes and the pathogen of interest to increase the chances of success for isolating phages active against highly resistant bacterial strains. The authors document many notable successes (17!) with highly resistant bacterial isolates and share a thoughtfully structured phage discovery effort, potentially opening the door to similar geomapping efforts across the field. While the work is methodologically strong and valuable for the community, there are a few areas where additional clarification and analysis could better align the claims with the data presented.

      Strengths:

      (1) The manuscript describes a well-executed and transparent example of overcoming a major obstacle in therapeutic virus identification, providing a practical success story that will resonate with researchers in microbiology and medicine.

      (2) Many phage researchers have anecdotally experienced a similar phenomenon, that a particular wastewater treatment plant always seems to have the pathogens you need. Quantifying this with metagenomics modernizes and adds evidence to this phenomenon in a way that could help researchers reproduce this success in a methodical way.

      (3) The methodology of combining environmental sampling, viral screening, and host-range analysis is clearly articulated and reproducible, offering a valuable blueprint for others in the field.

      (4) The data are presented with appropriate analytical rigor, and the results include robust sequencing and metagenomic profiling that deepen understanding of local viral communities.

      (5) The 17 successes yielding 35 phages have a lot of phylogenetic novelty beyond what the Tailor labs have typically found with previous methods.

      (6) The work highlights a practical and innovative solution to an increasingly important clinical problem, supporting the development of personalized antiviral strategies.

      Weaknesses:

      (1) The central concept of geomapping as a broadly applicable strategy is wonderfully supported by the 17 successes documented in the paper. While this is actually, of course, a strength, the study does not include a comparative analysis across multiple sites with varying sampling outcomes for different bacterial types, which would be necessary to validate this claim more generally.

      (2) Some elements, such as beta diversity comparisons and the metagenomics analysis of viral dark matter, would benefit from additional statistical analysis and clearer context.

      (3) Claims about therapeutic cocktails would be better framed as speculative and/or moved to the discussion section.

      (4) The manuscript could be strengthened by elaborating on the scope and composition of the phage and bacterial isolate collections, which are important for interpreting the broader significance of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Do and colleagues aims to develop a workflow for isolating and identifying bacteriophages with potential applications in phage therapy against antibiotic-resistant pathogens. The workflow integrates geΦmapping as a strategy to identify potential phage sources, ΦHD as a device for phage concentration, and RΦ as a phage library constructed from the initial sampling, resulting in the discovery of 36 new phages. The paper is overall interesting, and the proposed method appears robust and effective.

      Strengths:

      The methods proposed combined state-of-the-art strategies to solve an ever-increasing problem of antibiotic resistance. The methods are robust, and the controls are appropriate. The integration of environmental sampling, concentration strategies, and downstream genomic characterization is a clear strength and provides a potentially scalable framework for identifying candidate therapeutic phages. The manuscript is clearly written overall, and the results support the main conclusions.

      Weaknesses:


      While the authors acknowledge several limitations, some aspects require clearer framing or additional clarification. The proposed workflow focuses exclusively on aquatic environments as sources of phages, which may limit the diversity of hosts and phage types recoverable using this approach. Some interpretations, particularly regarding taxonomic classification and sampling saturation, would benefit from more cautious wording given current limitations in viral taxonomy and the observed data.

    1. eLife Assessment

      This important work shows that a history of cocaine self-administration disrupts the orbitofrontal cortex's ability to encode similarities between distinct sensory stimuli that possess identical task information - hidden states. The evidence supporting these conclusions is compelling, with methods and analyses spanning self-administration, a novel 'figure 8' sequential odor task, recordings from 3,881 single units, and sophisticated firing analyses revealing complex orbitofrontal representations of task structure. These results will be of broad interest to psychologists, neuroscientists, and clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5⁺-0⁻-1⁻-2⁺; Sequence #2: 3⁺-0⁻-1⁻-4⁺) --forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) --indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small --can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      (5) Sucrose control<br /> Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

    3. Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?<br /> The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

    1. eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic-algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

    1. eLife Assessment

      This important work contributes a transcriptional dataset that identifies potential genes involved in axon initial growth and axon regrowth, followed by a characterization of axon phenotypes after knockdown of a subset of these genes. Focused experiments on a single gene, Pmvk, highlight the potential role of the mevalonate pathway in axon regrowth. The methods are convincing, though partially incomplete. The data establish a basis for further studies on axonal development and will be of interest to both developmental neurobiologists and those seeking to develop molecular tools to target, monitor, and manipulate axon morphology and function.

    2. Reviewer #1 (Public review):

      Summary:

      Fahdan et al. present a study investigating the molecular programs underlying axon initial growth and regrowth in Drosophila mushroom body (MB) neurons. The authors leverage the fact that different Kenyon cell (KC) subtypes undergo distinct axonal events on the same developmental timeline: γ KCs prune and then regrow their axons during early pupation, whereas α/β KCs extend their axons for the first time during the same pupal period. Using bulk Smart-seq2 RNA sequencing across six developmental time points, the authors identify genes enriched during γ KC regrowth and α/β KC initial outgrowth, and subsequently perform an RNAi screen to determine which candidates are functionally required for these processes.

      Among these, they focus on Pmvk, a key enzyme in the mevalonate pathway. Both RNAi knockdown and a CRISPR-generated mutant produce strong γ KC regrowth defects. Knockdown of other mevalonate pathway components (Hmgcr, Mvk) partially recapitulates this phenotype. The authors propose that Pmvk promotes axonal regrowth through effects on the TOR pathway.

      Overall, this work identifies new molecular players in developmental axon remodeling and provides intriguing evidence connecting Pmvk to γ KC regrowth.

      While the Pmvk knockdown and loss-of-function data are compelling, the evidence that the mevalonate pathway broadly regulates γ KC axon regrowth is less clear. RNAi knockdown of enzymes upstream of Pmvk (Hmgcr, Mvk) produces only mild phenotypes, and knockdown of several downstream enzymes produces no phenotype. The authors attribute this discrepancy to the possibility of weak RNAi constructs, which is plausible but not fully demonstrated. It would be helpful for the authors to discuss alternative explanations, including non-canonical roles for Pmvk that may not require the full pathway, and clarify the extent to which the current data support the conclusion that the mevalonate pathway, rather than Pmvk specifically, is a core regulator of regrowth.

      It is not clear from the Methods whether γ KCs and α/β KCs were sorted from the same brains using orthogonal binary expression systems (e.g., Gal4 > reporter 1 and LexA > reporter 2), or isolated separately from different fly lines. If the latter, differences in genetic background, staging, or batch effects could influence transcriptional comparisons. This should be explicitly clarified in the Methods, and any associated limitations discussed in the manuscript.

      The authors have made important findings that contribute to our understanding of axon growth and regrowth. As written, some major claims are only partially supported, but these issues can be addressed through reframing and clarification. In particular, the manuscript would benefit from (1) a more cautious interpretation of the mevalonate pathway's role, potentially considering Pmvk non-canonical functions, and (2) addressing methodological ambiguities in the transcriptomic analysis.