26,920 Matching Annotations
  1. Jun 2024
    1. Reviewer #1 (Public Review):

      Summary:

      This finding shows a connection between cancer associated beta-catenin mutations extracellular vesicle secretion. A link between the beta-catenin mutation and expression of trafficking and exocytosis machinery. They used a multidisciplinary approach to explore expression levels of relevant proteins and single particle imaging to directly explore the release of extracellular vesicles. These results suggest a role of extracellular vesicles in immune evasion in liver cancer with the role needing to be further explored in other forms of cancer. I find this work to be compelling and of strong significance.

      Strengths:

      This paper uses multidisciplinary methods to demonstrate a compelling role of beta-catenin mutations in suppressing EV secretion in tumors. The results and imaging are extremely convincing and compelling.

    2. Reviewer #2 (Public Review):

      Summary:

      Dantzer and colleagues are investigating the pivotal role of ß-catenin, a gene that undergoes mutation in various cancer cells, and its influence on promoting the evasion of immune cells. In their initial experiments, the authors developed a HepG2 mutated ß-catenin KD model, conducting transcriptional and proteomic analyses. The results revealed that the silencing of mutated ß-catenin in HepG2 cells led to an up-regulation in the expression of exosome biogenesis genes.

      Furthermore, the researchers verified that these KD cells exhibited an increased production of exosomes, with the mutant form of ß-catenin concurrently decreasing the expression of SDC4 and Rab27a. Intriguingly, applying a GSK inhibitor to the cells resulted in reduced expression of SDC4 and Rab27a. Subsequent findings indicated that mutated ß-catenin actively facilitates immune escape through exosomes, and silencing exosome biogenesis correlates with a decrease in immune cell infiltration.<br /> In a crucial clinical correlation, the study demonstrated that patients with ß-catenin mutations exhibited low levels of exosome biogenesis.

      Strengths:

      Overall, the data robustly supports the outlined conclusions, and the study is commendably designed and executed. However, there are a few suggestions for manuscript improvement.

      Weaknesses: No weakness

    3. Reviewer #3 (Public Review):

      Summary:

      In this very important study by Dantzer et al., 'Emerging role of oncogenic b-catenin in exosome biogenesis as a driver of immune escape in hepatocellular carcinoma' the authors define a role for oncogenic b-catenin on exosome biology and explore the link between reduce exosome secretion and tumor immune cell evasion. Using transcriptional and proteomic analysis of hepatocellular carcinoma cells with either oncogenic or wildtype b-catenin the authors find that oncogenic b-catenin negatively regulates exosome biogenesis.

      The authors can provide compelling evidence that oncogenic b-catenin in different hepatocellular carcinoma cells negatively regulates exosome biogenesis and secretion, by downregulation of, amongst others, SDC4 and RAB27A, two proteins involved in exosome biogenesis. The authors corroborate these results by inducing b-catenin activation using CHIR99021 in a hepatocarcinoma cell line with non-oncogenic bCatenin (Huh7 cells). The authors can further demonstrate convincingly that reduction in exosome release by hepatocarcinoma spheroids leads to a reduction in immune cell infiltration into the tumor spheroid.

      Strengths:

      This is a very important and well-conceived study, that appeals to a readership beyond the field of hepatocarcinoma. The authors demonstrate a compelling link between oncogenic bCatenin and exosome biogenesis. Their results are convincing and with well-designed control experiments. The authors included various complementary lines of investigation to verify their findings.

      Weaknesses:

      One limitation of this study is that the mechanistic relationship of exosome release and how they affect immune cells remains to be elucidated. In this context, the authors conclusions rest on the assumption that hepatocarcinoma immune evasion is based exclusively on the reduced number of exosomes. However, the authors do not analyze exosome composition between exosomes of wildtype and oncogenic background, which could be different.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      While the role of Rab27 was strongly examined, the hits of the VAMP proteins were not explored in detail. I was wondering if the decrease in the presence of VAMPS directly suggests the final step of membrane fusion in the exocytosis of EVs is what is being impaired. Or if it is other trafficking steps along the EV secretion pathway.

      We appreciate the relevance of this comment and we agree that the decrease of VAMP gene expression in the β-catenin-mutated HepG2 cells could suggest an impairment of the final membrane fusion step in exocytosis of EVs. We have therefore expanded this important point in the discussion (page 10). Indeed, we identified an upregulation of VAMP2, VAMP5 and VAMP8 expressions after mutated β-catenin depletion in the transcriptomic analysis of HepG2 cells. However, these proteins were not detected in the mass spectrometry analysis. Only VAMP3 and VAMP7 proteins were detected in the proteomic analysis without any variation. This is why we didn't focus on this trafficking step, but it could be interesting to explore it further in the future. 

      Reviewer 2:

      (1) In Figure 1F, it is essential to investigate why mass spectrometry analysis indicated no significant changes in SDC4 levels.

      We agree with the reviewer that indeed whereas we did observe a significant alteration of syndecan-4 expression at the mRNA level, we did not observe significant changes in syndecan-4 levels by mass spectrometry. One possible explanation is that heparan sulfate proteoglycans like syndecan-4 exhibit a high degree of structural heterogeneity due to the biosynthetic process that produces linear polysaccharides. This characteristic can alter the robustness of mass spectrometry analyses, leading to greater variability. 

      (2) Figure 2G lacks clarity in explaining how the quantification of MVBs (multivesicular bodies) was conducted.

      We apologize for the lack in clarity in explaining how the quantification of MVBs was conducted in figure 2G. The Materials and methods section (part electron microscopy-cells, page 23) has been modified in order to emphasize this point.

      (3) In Supplementary Figure 1F, there is a suggestion to highlight exosomes using arrowheads for enhanced clarity.

      According to the reviewer’s suggestions, we added arrowheads on supplementary figure 1F in order to highlight the exosomes (page 16). This indeed improves clarity.

      (4) Figure 3C prompts a question about the peculiar appearance of Actin staining in KD cells, requiring further investigation.

      The peculiar appearance of this intense phalloidin staining between hepatocytes corresponds to bile canaliculi (BC), features of more differentiated HepG2 cells. As phalloidin-stained BC are very bright, this may diminish the visibility of other, thinner actin structures. We decided to change the image of KD cells for a more relevant one (new Figure 3C).

      (5) An intriguing avenue for exploration is suggested in testing how the treatment of a GSK inhibitor on HepG2 cells might impact Rab27a and SDC4 expression.

      We appreciate the relevance of the suggestion in testing how the treatment of a GSK inhibitor on HepG2 cells might impact Rab27a and SDC4 expression. According to the reviewer’s suggestions, experiments have been carried out and the data are presented in Author response image 1 below. In HepG2 cells, GSK inhibitor stabilized the wild-type β-catenin protein but surprisingly the mutated form of β-catenin is slightly decreased (Author response image 1A). Regarding the expression levels of both Rab27a and SDC4 mRNA, a small increase is observed (Author response image 1B). Rab27a protein is also increased upon the treatment with a GSK inhibitor on HepG2 cells (Author response image 1C). This increased in expression could be due to the decrease of the mutated form of β-catenin in HepG2 cells confirming that Rab27a and SDC4 are repressed by the mutated β-catenin. 

      Author response image 1.

      Impact of a GSK inhibitor (CHIR99021) on Rab27a and syndecan-4 (SDC4) expressions in HepG2 cells. HepG2 cells were treated by 3 µM CHIR990221 or DMSO as control for 48h. A) Western-blot (upper panel) and quantification (lower panel) of wild-type (WT) and mutated (MUT) β-catenin proteins in HepG2 cells treated with DMSO (control) or with CHIR990221. B) qRT-PCR analysis of Rab27a and SDC4 expression in HepG2 cells treated with DMSO (control) or with CHIR990221. C) Western-blot (left panel) and quantification (right panel) of Rab27a protein in HepG2 cells treated with DMSO (control) or with CHIR990221. *P<0.05

      Reviewer 3:

      (1) One limitation of this study is that the mechanistic relationship of exosome release and how they affect immune cells remains to be elucidated. In this context, the authors conclusions rest on the assumption that hepatocarcinoma immune evasion is based exclusively on the reduced number of exosomes. However, the authors do not analyze exosome composition between exosomes of wild type and oncogenic background, which could be different.

      We agree that the mechanistic relationship of exosome release and how they affect immune cells remains to be elucidated. In the discussion we mentioned that the content of ß-catenin-regulated EVs remains to be explored to fully understand their function in the immunomodulation of the tumor microenvironment. In this line, we have ongoing experiments in order to analyse the exosomal content in term of proteins and microRNAs. According to our preliminary results, we are able to say  that the exosome composition in knock-down mutated ß-catenin HepG2 cells compared to control HepG2 cells seems to be different suggesting not only an involvement of the number of exosomes in the immunomodulation but also of their content. 

      (2) The manuscript would benefit from minor language editing and the introduction from restructuring to enhance clarity.

      The manuscript has now benefited from a language editing thanks to the Professor William A. Thomas (Colby-Sawyer College, New Hampshire). Acknowledgments have been modified (page 12) to thank the Professor William A. Thomas for proof- reading of the manuscript. The introduction has been also restructured and modified according to the reviewer's suggestions to enhance clarity (page 3).

      (3) I believe that within the abstract, the authors mean 'defect' not 'default' in the sentence: Then, we demonstrated in 3D spheroid models that activation of β-catenin promotes a decrease of immune cell infiltration through a default in exosome secretion.

      We apologize for the mistake between 'default' and 'defect' in the abstract. The abstract has been modified accordingly.

      (4) Within the 'Introduction' part of the manuscript, the authors might consider reviewing and reorganizing the first paragraph for more clarity - I suggest leading with the first three sentences of the second paragraph (HCC is the most...) and then introducing b-catenin and the effects and implications of oncogenic ß-catenin in HCC.

      If the authors prefer the current structure of the 'Introduction', I would like to propose exchanging some of the wording:

      -In line 4: 'despite' instead of 'in front of'? Sentence: Thus, in front of the therapeutic revolution for cancers, with the emergence of immunotherapy and more particularly immune checkpoint inhibitors (anti-PD1, anti-PD-L1)

      -Additionally in line 7: In these tumors, the oncogenic β-catenin is able to set up a microenvironment that favors tumor progression notably by promoting immune escape. Here, 'establish' might be a better choice instead of 'set up' - In line 9 I suggest rephrasing the sentence: Few studies have reported that the defect of intercellular communication between cancer cells and immune cells is partly mediated by a decrease of chemokines production leading to a reduction of immune infiltrates.... and maybe adding a reference here.

      The introduction has been altered accordingly. Thanks for these suggestions that helped us to improve our manuscript.

    1. eLife assessment

      The study elucidates a detailed molecular mechanism of the initial stages of transport in the medically relevant Na+-coupled GABA neurotransmitter transporter GAT1 and thus generates useful new insights into this protein family. In particular, it presents convincing evidence for the presence of a "staging binding site" that locally concentrates Na+ ions to increase transport activity, whilst solid evidence for how Na+ binding influences larger scale dynamics.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the s1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      Comments on revised version:

      The authors have satisfactorily addressed my comments and this has significantly improved the clarity of the manuscript.

      The only point that I would like to inquire about is the role of EL4 in modulating Na+ entry. In the simulations do the authors see no role of EL4 in controlling Na+ entry. It is particularly intriguing as some studies in the recent past displayed charged mutations in EL4 of dDAT, SERT and GAT1 as being detrimental for substrate entry/uptake. It would therefore be nice to add a small discussion if there is any role for EL4 in Na+ entry.

    3. Reviewer #2 (Public Review):

      Summary

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be co-transported with the substrate, GABA (and a co-transported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281A-E283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths

      - MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      - A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      - The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      Weaknesses

      - Assessing the effects of Na+ binding on the large scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to a AF2 model that's not well packed or really a feature of the open outward conformation.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study elucidates a detailed molecular mechanism of the initial stages of transport in a medically relevant GABA neurotransmitter transporter GAT1 and thus generates useful new insights for this protein family. In particular, it presents convincing evidence for the presence of a "staging binding site" that locally concentrates Na+ ions to increase transport activity, whilst solid evidence for how Na+ binding affects the larger scale dynamics.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the S1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank our reviewer for the overall positive evaluation.

      Weaknesses:

      (1) How conserved are the residue pair of D281-E283 in other SLC6 transporters. The authors commented on the presence of these residues in SERT but it would be nice to know how widespread these residues are in other SLC6 transporters like NET, GlyT, and DAT.

      We have created a sequence alignment of the entire human SLC6 family (Supplementary Figure 1) and found that E283 is polar or charged in all SLC6 transporters. D281 shows a higher level of conservation across the family compared to E283. D281 is negatively charged in approximately 50% of the SLC6 family members, an aspartate in all GABA transporters and a glutamate in all monoamine transporters.

      (2) Further, one would like to see the effect of individual mutations D281A and E283A on transport, surface expression, and EC50 of Na+ to gauge the effect on transport.

      We have carried out experiments to investigate the effects of the individual mutations. The results revealed intermediate effects between WT and the double mutant (D281A-E283A) and showed that the effects mostly align with the degree of conservation, as a neutralisation of D281 by alanine has a stronger effect than the E283A mutant. Both single mutants had minimal effects on the sodium dependence of uptake, D281A had a stronger effect on expression, Km and Vmax as compared to E283. Only D281A reduced surface expression, while E283A expresses to a similar level as wild type GAT1.

      (3) A clear figure of the S1 site where Na+ tends to stay prior to Na1 site interactions needs to be provided with a clear figure. Further, it is not entirely clear how access to S1 is altered if the transporter is in an outwardoccluded conformation if F294 is blocking solvent access. Please comment.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility. We have also added a comment on the role of F294 as part of the outer hydrophobic gate to the discussion. In short, F294 does not occlude the passage to the S1 as long as GAT1 is outward open, and we find that GAT1 is outward open in all sodium binding simulations.

      (4) The p-value of the EC50 differences between GAT1WT and GAT1double mutant need to be mentioned. The difference in sodium dependence EC50 seems less than twofold, and it would be useful to mention how critical the role of the recruitment site is. Since the transport is not affected the site could play a transient role in attracting ions.

      We have added p-values or standard deviation to our data.

      (5) It would be very nice to know how K+ ions are attracted by this recruitment site. This could further act as a control simulation to test the preference for Na+ ions among SLC6 members.

      We think that attraction of potassium to the recruitment site is not of relevance, as the residues are at the extracellular side and exposed to bulk, where the concentration of sodium is high (typically 130-150 mM), while the concentration of potassium is very small (3-5 mM). Exploring sodium binding by simulations for all SLC6 members could be interesting, but clearly outside the scope of this manuscript.

      (6) Some of the important figures are not very clear. For instance, there should be a zoomed-in view of the recruitment site. The current one in Fig. 1b and 1c could be made clearer. Similarly as mentioned earlier the Na residence at the S1 site away from the Na1 and Na2 sites needs to be shown with greater clarity by putting side chain information in Fig. 6d.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility.

      (7) The structural features that comprise the two principal components PC1 and PC2 should be described in greater detail.

      We have modified Figure 6 and added images that show the motions along PC1 and PC2. In addition, these are now better explained in the text.

      Reviewer #2 (Public Review):

      Summary:

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be cotransported with the substrate, GABA (and a co-transported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281A-E283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths:

      -  MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      -  A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      -  The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the review for the very supportive assessment.

      Weaknesses:

      -  Assessing the effects of Na+ binding on the large-scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to an AF2 model that's not well packed or really a feature of the open outward conformation.

      The long range effect of sodium binding to GAT1 and destabilisation of the inner gate has, based on our data, a causal effect. PCA separates conformational motions into degrees of freedom and sorts them according to the largest motions. Motions of TM5a were among the 2 largest motions, which suggests that these are relevant motions. To directly quantify their behaviour, we measured informative distances at the inner gate of GAT1, as shown in Figure 6i,j,k and separated data according to the presence of sodium in NA2.

      For the following reasons we exclude that the results are a consequence of structural inconsistencies introduced by AlphaFold2 and therefore not reflecting functionally relevant effects:

      (1) If depending on the model instead of sodium binding, the effects should not be correlated with the presence of sodium in the NA2 binding site.

      (2)  We carried out new simulations starting from the occluded GAT1 structure (Figure 6j,k). The data shows that in the occluded state the distance across the inner vestibule and the length of TM5a differ, consistent with our interpretation of the data. As sodium binding fixes GAT1 outwardfacing, as it also occurs in other SLC6 family members (Szöllősi and Stockner, 2022), the distances of the outward-open GAT1 are at the short extreme of the scale, distances of the inward-open state of the cryo-EM structure(s) are at the other extreme, while the occluded conformation of GAT1 shows intermediate values.

      (3)  We have observed the same property in SERT, for which we used experimental structures as starting structure (Gradisch et al., 2024), suggesting that this could be a generally mechanism.

      (4)  All available structures from the entire SLC6 family are consistent with structural effects of TM5a in response to bundle domain motions and therefore to binding of sodium to NA2 as it stabilized the outward-open state as well as transition to the inward facing conformation.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      Simulations can always be too short and therefore not fully describe the complete underlying conformational ensemble. We added a statement in the discussion indicating this shortcoming. With respect to the tICA analysis in our manuscript, the tICA approach does, by design, not need long simulations that capture the full binding and unbinding in multiple instances to construct a correct free energy landscape. Instead, the tICA method builds on Markov chain dependencies and relies only on the convergence of transitions between hundreds of conformational microstates and the fluxes between them. The free energy profile derived for the S1, including NA1, TMP and NA2 and up to the salt bridge of the outer gate is well converged and we observed many transitions. In contrast, the entry from the recruitment side to the S1 has most likely a too low density of microstate and a too small number of transition to be considered converged with respect to quantifying the free energy of binding from bulk. We now explain this shortcoming.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Authors should furnish p-values in the figure legends for experimental results.

      We have added the p-values to text and figure legends.

      Reviewer #2 (Recommendations For The Authors):

      -  Deposit simulation data in a public repository (input files, trajectories (possibly subsampled)).

      We deposited the data to Zenodo and provided the DOI: 10.5281/zenodo.10686813 to the data. As we were unable to upload the trajectories to zenodo, we deposited the starting and the end structures of the simulations.

      -  Please include a short discussion of the reliability of using an AF2 model instead of experimental structures. What is expected to be correct/which parts of the structure are potentially incorrect? What makes you think that the AF2 model is a good model of the OF conformation of GAT1?

      Unfortunately, an outward-facing structure of GAT1 is not available. We have initially worked with an outward-open homology model of GAT1 based on SERT (build with MODELLER), but the structural differences between SERT and GAT1 are sufficiently large that these models did not behave well in simulations and too frequently could not maintain a sealed inner gate, also forming a channel. In contrast to the SERT-based GAT1 model, the AlphaFold2 model of GAT1 behaved as expected and consistent with the behaviour of SERT in simulations and with general knowledge of protein dynamics from literature. Based on structural analysis of our simulations and on the comparison to SERT we could not identify a region of GAT1 which would be potentially behave incorrect or unexpectedly. We added a statement to the discussion on this potential limitation of the use of homology models.

      -  Fig 1a: Na+ densities are not very clear (both due to small size and the transparency). I have a hard time seeing where bulk, 2*bulk regions are --- are you showing "onion shells" of density? Perhaps investigate presenting as cuts through the full density?

      I like the labelling in terms of absolute density and multiples of bulk.

      We have created new images to improve the visualisation of data. The data are shown as onion shells (isosurface), with the shells at the indicated densities. This is now clearly stated. Transparency is needed, otherwise e.g. the inner onion shells would not be visible. The cut-through is intuitive, but we could not find a useful plain, as the densities are too extensively distributed in 3D and not on a single plain.

      -  Fig 1h-k: would be clearer if "recruitment site" (TMP?) was indicated in the figure.

      We have created a new image for the recruiting site (Figure 1b,c) and temporary site (Figure 1g) and indicated these two sites as appropriate.

      -  Show time series of Na+ binding with a suitable order parameter (z or distances to NA1 and NA2?) to show how ions bind spontaneously. Mark the different sites. Mark pre- and post-binding parts of trajectories.

      We have added time series for every simulation that shows sodium binding to the NA1 or NA2 to the supplementary information Figure 2a,b,c. These quantify the distances to the recruiting site, the temporary site and the respective sodium binding site.

      -  PCA - how much of the total variance was captured by PC1 and PC2?

      The variance captured by the PCs are shown as eigenvalues in supplementary information Figure 4. PC1 captures about 19% of the variance, PC2 8%.

      -  "We found that the inner hydrophobic gate is dynamic in the absence of Na2" -- is this instability due to the AF2 model or likely realistic? E.g. was similar behaviour ever observed in simulations of the occluded state?

      In simulations of the occluded state we do not see such instabilities as observed in the outward-open state in the absence of sodium (Figure 6). As these larger scale fluctuations are not randomly distributed across all simulations starting from the AlphaFold2 models, but confined to the systems without sodium, it is unlikely an effect of the AlphaFold2 model.

      Please note, we have seen comparable behaviour in simulations of SERT starting from experimental structures (Gradisch et al., 2024), therefore suggesting a more general mechanism.

      -  Cooperativity between GABA-binding and Na+ binding to NA1: How would this lead to an experimentally measurable signature, i.e., which experiments could validate this interesting prediction?

      Direct detection of cooperativity is difficult to separate from other effects in experiments, as sodium binding and transport involves NA1 and NA2, NA2 has a higher affinity according to our data, while mutations will not only affect cooperativity, but will also have other effects.

      Conformational changes can also complicate experimental detection, as NA2 stabilises the outward-open conformation, while NA1+GABA binding triggers the transition to the inward-open state. To quantify cooperativity, it would be important to isolate the cooperative from all other effects, which is a challenge. Support for cooperativity has been found by (Zhou, Zomot and Kanner, 2006; Meinild and Forster, 2012) using this route. In the first paper the authors make use of lithium that only binds to the NA2, even though lithium is not only a mere NA2 selective ligand and otherwise identical to sodium. By comparing two GABA concentrates the authors showed that the sodium dependence of GABA transport is left shifted at higher GABA concentrations, which is not the case in the absence of lithium. This data is indirect, but consistent with cooperativity between GABA and NA1-bound sodium, as GABA transport mainly reflects binding of sodium to NA1. Similar approaches could be further explored, for example by varying the GABA concentration instead of sodium. Other options could be to create an outward-facing and conformationally locked GAT1 and to measure the cooperativity of sodium and GABA binding using for example the scintillation proximity assay. Most likely the assay would also need a way to be NA2 binding independent. We are not aware of such a GABA transporter system.

      -  There are some instances of [SI Figure] or [citation needed] that should be cleaned up.

      We have corrected these instances.

      References

      Gradisch, R. et al. (2024) ‘Ligand coupling mechanism of the human serotonin transporter differentiates substrates from inhibitors’, Nature Communications, 15(1), p. 417. Available at: https://doi.org/10.1038/s41467-023-44637-6.

      Meinild, A.-K. and Forster, I.C. (2012) ‘Using lithium to probe sequential cation interactions with GAT1’, American Journal of Physiology. Cell Physiology, 302(11), pp. C1661-1675. Available at: https://doi.org/10.1152/ajpcell.00446.2011.

      Szöllősi, D. and Stockner, T. (2022) ‘Sodium Binding Stabilizes the Outward-Open State of SERT by Limiting Bundle Domain Motions’, Cells, 11(2), p. 255. Available at: https://doi.org/10.3390/cells11020255.

      Zhou, Y., Zomot, E. and Kanner, B.I. (2006) ‘Identification of a lithium interaction site in the gamma-aminobutyric acid (GABA) transporter GAT-1’, The Journal of Biological Chemistry, 281(31), pp. 22092–22099. Available at: https://doi.org/10.1074/jbc.M602319200.

    1. eLife assessment

      In this potentially important study, the authors report results of QM/MM simulations and kinetic measurements for the phosphoryl-transfer step in adenylate kinase. The results point to the mechanistic proposal that the transition state ensemble is broader in the most efficient form of the enzyme (i.e., in the presence of Mg2+ in the active site) and thus a different activation entropy. With a broad set of computations and experimental analyses, the level of evidence is considered solid by some reviewers. On the other hand, there remain limitations in the computational analyses, especially regarding free energy profiles using different methodologies and the activation entropy, leading some reviewers to the evaluation that the level of evidence is incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      This study investigated the phosphoryl transfer mechanism of the enzyme adenylate kinase, using SCC-DFTB quantum mechanical/molecular mechanical (QM/MM) simulations, along with kinetic studies exploring the temperature and pH dependence of the enzyme's activity, as well as the effects of various active site mutants. Based on a broad free energy landscape near the transition state, the authors proposed the existence of wide transition states (TS), characterized by the transferring phosphoryl group adopting a meta-phosphate-like geometry with asymmetric bond distances to the nucleophilic and leaving oxygens. In support of this finding, kinetic experiments were conducted with Ca2+ ions at different temperatures and pH, which revealed a reduced entropy of activation and unique pH-dependence of the catalyzed reaction.

      Strengths:

      A combined application of simulation and experiments is a strength.

      Weaknesses:

      The conclusion that the enzyme-catalyzed reaction involves a wide transition state is not sufficiently clarified with some concerns about the determined free energy profiles compared to the experimental estimate. (See Recommendations for the authors.)

    3. Reviewer #2 (Public Review):

      Summary:

      The authors report results of QM/MM simulations and kinetic measurements for the phosphoryl-transfer step in adenylate kinase. The main assertion of the paper is that a wide transition state ensemble is a key concept in enzyme catalysis as a strategy to circumvent entropic barriers. This assertion is based on observation of a "structurally wide" set of energetically equivalent configurations that lie along the reaction coordinate in QM/MM simulations, together with kinetic measurements that suggest a decrease of the entropy of activation.

      Strengths:

      The study combines theoretical calculations and supporting experiments.

      Weaknesses:

      The current paper hypothesizes a "wide" transition state ensemble as a catalytic strategy and key concept in enzyme catalysis. Overall, it is not clear the degree to which this hypothesis is fully supported by the data. The reasons are as follows:

      (1) Enzyme catalysis reflects a rate enhancement with respect to a baseline reaction in solution. In order to assert that something is part of a catalytic strategy of an enzyme, it would be necessary to demonstrate from simulations that the activation entropy for the baseline reaction is indeed greater and the transition state ensemble less "wide". Alternatively stated, when indicating there is a "wide transition state ensemble" for the enzyme system - one needs to indicate that is with respect to the non-enzymatic reaction. However, these simulations were not performed and the comparisons not demonstrated. The authors state "This chemical step would take about 7000 years without the enzyme" making it impossible to measure; nonetheless, the simulations of the nonenzymatic reaction would be fairly straight forward to perform in order to demonstrate this key concept that is central to the paper. Rather, the authors examine the reaction in the absence of a catalytically important Mg ion.

      (2) The observation of a "wide conformational ensemble" is not a quantitative measure of entropy. In order to make a meaningful computational prediction of the entropic contribution to the activation free energy, one would need to perform free energy simulations over a range of temperatures (for the enzymatic and non-enzymatic systems). Such simulations were not performed, and the entropy of activation was thus not quantified by the computational predictions. The authors instead use a wider TS ensemble as a proxy for larger entropy, and miss an opportunity to compare directly to the experimental measurements.

    4. Reviewer #3 (Public Review):

      Summary:

      By conducting QM/MM free energy simulations, the authors aimed to characterize the mechanism and transition state for the phosphoryl transfer in adenylate kinase. The qualitative reliability of the QM/MM results has been supported by several interesting experimental kinetic studies. However, the interpretation of the QM/MM results is not well supported by the current calculations.

      Strengths:

      The QM/MM free energy simulations have been carefully conducted. The accuracy of the semi-empirical QM/MM results was further supported by DFT/MM calculations, as well as qualitatively by several experimental studies.

      Weaknesses:

      (1) One key issue is the definition of the transition state ensemble. The authors appear to define this by simply considering structures that lie within a given free energy range from the barrier. However, this is not the rigorous definition of transition state ensemble, which should be defined in terms of committor distribution. This is not simply an issue of semantics, since only a rigorous definition allows a fair comparison between different cases - such as the transition state in an enzyme vs in solution, or with and without the metal ion. For a chemical reaction in a complex environment, it is also possible that many other variables (in addition to the breaking and forming P-O bonds) should be considered when one measures the diversity in the conformational ensemble.

      In the revised ms, the authors included committor analysis. However, the discussion of the result is very brief. In particular, if we use the common definition of the transition state ensemble (TSE) as those featuring the committor around 0.5, the reaction coordinate of the TSE would span a much narrower range than those listed in Table 1. This point should be carefully addressed.

      (2) While the experimental observation that the activation entropy differs significantly with and without the Ca2+ ion is interesting, it is difficult to connect this result with the "wide" transition state ensemble observed in the QM/MM simulations so far. Even without considering the definition of the transition state ensemble mentioned above, it is unlikely that a broader range of P-O distances would explain the substantial difference in the activation entropy measured in the experiment. Since the difference is sufficiently large, it should be possible to compute the value by repeating the free energy simulations at different temperatures, which would lead to a much more direct evaluation of the QM/MM model/result and the interpretation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a potentially important study that integrates QM/MM free energy simulations and experimental kinetic analyses to probe the nature of phosphoryl transfer transition state in adenylate kinase. The idea that the transition state ensemble encompasses conformations with substantially different structural features (including the breaking/forming bonds) is interesting and potentially applicable to many other enzyme systems. In the current form, however, the study is considered incomplete since the connection between the putative transition state ensemble from the computations and key experimental observables, such as the activation entropy, is not well established.

      Thank you so much for your great professional work as the senior editor. We thank you and the reviewers for carefully reading our manuscript and for very valuable suggestions. In response, we have performed the recommended additional calculations and modified the manuscript as suggested, in order to improve the connection between the transition state ensemble obtained from simulations and experimental observables. Importantly, the new simulations fully corroborate our original findings, and thanks to your work made the revised manuscript stronger and better.

      Below are our point-to-point responses:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the phosphoryl transfer mechanism of the enzyme adenylate kinase, using SCC-DFTB quantum mechanical/molecular mechanical (QM/MM) simulations, along with kinetic studies exploring the temperature and pH dependence of the enzyme's activity, as well as the effects of various active site mutants. Based on a broad free energy landscape near the transition state, the authors proposed the existence of wide transition states (TS), characterized by the transferring phosphoryl group adopting a meta-phosphate-like geometry with asymmetric bond distances to the nucleophilic and leaving oxygens. In support of this finding, kinetic experiments were conducted with Ca2+ ions (instead of Mg2+) at different temperatures, which revealed a negative entropy of activation. Overall, in its present form, the manuscript has more weaknesses in terms of interpretation of the simulation results than strengths, which need to be addressed by the authors.

      We thank the reviewer for carefully reviewing our manuscript and the great suggestions for the revisions. Thanks to these points raised we are able to submit a revised manuscript addressing all questions.

      There are several major concerns:

      First, the authors' claim that the catalytic mechanism of adenylate kinase (Adk) has not been previously studied by QM/MM free energy simulations is somewhat inaccurate. In fact, two different groups have previously investigated the catalytic mechanism of Adk. The first study, cited by the authors themselves, used the string method to determine the minimum free energy profile, but resulted in an unexpected intermediate; note that they obtained a minimum free energy profile, not a minimum energy profile. The second study (Ojedat-May et al., Biochemistry 2021 and Dulko-Smith et al., J Chem Inf Model 2023) overlaps substantially with the present study, but its main conclusions differ from those of the present study. Therefore, a thorough discussion comparing the results of these studies is needed.

      We thank the reviewer for pointing out two additional articles to the one we had discussed. Accordingly, we have changed the claim that the Adk mechanism was not previously studied using QM/MM, and added a discussion of the latter two citations. Notably, although the general outcome is consistent with our results, the conclusions and details of findings differ. The two additional papers agree with our findings of a concerted TS, and not the metastable intermediate as observed in the QM/MM simulation of Shibanuma et al., 2020.

      The difference of the two papers by Nam/Wolf-Watz and our manuscript pointed out by the reviewer is mainly in the interpretation. Importantly, the authors do not primarily focus on the nature of the Transition State for the P-transfer reaction, but on the connection between the chemical and conformational steps. We have extensively reported on the fact that the conformational changes of lid opening and closing are obviously unrelated to the chemical step, see also our free energy landscape in Fig. 1a. Consequently, there cannot be a coupling. We note that our group had extensively studied the lid opening step both experimentally and computationally before. In contrast, we discover here a fundamental concept for rate enhancement by an optimal enzyme: the reduction in the activation entropy by a wide TSE. New experiments were triggered by this finding, that then delivered experimental validation of this concept.

      In the revised version of the manuscript, and according to the reviewer’s suggestion we expanded our discussion to these two additional papers.

      Second, the interpretation of the TS ensemble needs deeper scrutiny. In general, the TS is defined as the hypersurface separating the reactant and product states. Consequently, if a correct reaction coordinate is defined, trajectories initiated at the TS should have equal probabilities of reaching either the reactant or product state; if an approximate reaction coordinate, such as the distance difference used in this study, is used, recrossing may be introduced as a correction into the probabilities. Thus, in order to establish the presence of a wide TS region, it is necessary to characterize the TS ensemble through a commitment analysis across the TS region.

      We thank the reviewer for suggesting to add a commitment analysis to our calculations. The newly performed commitment analysis is shown in Fig. 4b. The corresponding analysis further strengthens our original findings of the wide TS in the fully active enzyme.

      The relatively flat free energy surface observed near TS in Figures 1c and 2a, may be attributed to the cleavage and formation of P-O bonds relative to the marginally stable phosphorane intermediate, as described in Zhou et al.'s work (Chem Rev 1998, 98:991). This scenario is clearly different from a wide TS ensemble concept. In addition, given the inherent similarity in reactivity of the two oxygens towards the phosphoryl atom, it is reasonable to expect a single TS as shown in Figure 1 - supplement 9, rather than two TSs with a marginally stable intermediate as shown in Figure 1c. Consequently, it remains uncertain whether the elongated P-O bonds observed near the TS and their asymmetry are realistic or potentially an artifact of the pulling/non-equilibrium MD simulations. Further validation in this regard is required.

      The reviewer raises the key issue of how realistic the observation of the wide TSE is, and the possibility of it being a potential artifact of the simulation strategy, and suggests that further validation is required in this regard. According to his/her suggestion, in the revised version we have further validated this key observation by two additional simulations. First, we performed a commitment analysis (see above), and second, we also performed Umbrella Sampling, see Fig. 4a. We consistently observe one wide TSE in the presence of Mg2+, but not in the absence of Mg2+. The fact that this wide TSE is observed with the three strategies (i.e pulling/nonequilibrium MD, commitment analysis, and umbrella sampling) most likely rules out the possibility of an artifact related to the simulation strategy.

      Third, there are several inconsistencies in the free energy results and their discussion. First, the data from Kerns et al. (Kerns, NSMB, 2015, 22:124) indicate that the ATP/AMP -> ADP/ADP reaction proceeds at a faster rate than the ADP/ADP -> ATP/AMP reaction, suggesting that the ADP/ADP state has a lower free energy (approximately -1.0 kcal/mol) compared to the ATP/ATP state. This contrasts with Figure 1c, which shows a higher free energy of 6.0 kcal/mol for the ATP/ADP state. This discrepancy needs to be discussed.

      The reviewer correctly found our experimental result on the equilibrium of about -1 kcal/mol for ADP/ADP relative to ATP/AMP with Mg. Importantly, that was measured at a pH of 7. With a pKA of about 7.2 for ADP, under these experimental conditions more than 50% is in the monoprotonated state. As we found in our QM/MM simulations, for the monoprotonated state the ADP/ADP is much more stable than ATP/AMP (see Figure 1 – supplement 4, about 8 kcal/mol). In contrast, as shown in Fig. 1c and highlighted by the reviewer, for the nonprotonated state the equilibrium is flipped. Consequently our QM/MM simulations roughly recapitulate the ensemble equilibrium of substrates/products measured at pH 7. 

      We should have better described these facts in the manuscript, and we thank the reviewer for noting this point, as it promoted us to better explaining this agreement between experiments and computation for this on enzyme equilibrium between the substrate and product states (see page 11 in the revised manuscript).

      Furthermore, the barrier for ATP/AMP -> ADP/ADP, calculated to be 20 kcal/mol for the fully charged state, exceeds the corresponding barrier for the monoprotonated state. This cautions against the conclusion that the fully charged state is the reactive state. In addition, the difference in the barrier for the no-Mg2+ system compared to the barriers with Mg2+ is substantially too large (21 kcal/mol from the calculation versus 7 kcal/mol from the experimental values). These inconsistencies raise questions as to their origins, whether they result from the use of the pulling/non-equilibrium MD simulation approach, which may yield unrealistic TS geometries, or from potential issues related to the convergence of the determined free energy values. To address this issue, a comparison of results obtained by umbrella sampling and similar methodologies is necessary.

      We agree that these points need to be clarified. For the resubmission, we performed an umbrella sampling for the fully charged nucleotide with Mg2+, and for the noMg2+ systems, and added these new figures to the manuscript (new Fig. 4). We agree with the reviewer that the obtained free energy profiles from the umbrella sampling are more reliable; the original simulations for the monoprotonated state have larger errors, see Fig. 1, supplement 4. Importantly, we experimentally measured the pH dependence of the reaction in the direction ADP/ADP to AMP and ATP, and hence compare the corresponding barriers in this direction.

      In respect to the comparison of the simulated (9.5 kcal/mol) to the experimental barriers with and without Mg, the experimental barrier is 7 kcal/mol for Ca2+ versus no metal, but larger for Mg2+ versus no metal, for which the simulations were performed. The P-transfer with Mg2+ is faster than 500 sec-1, meaning the experimental barrier for the no Mg versus magnesium is ≥ 11 kcal/mol, which is in quite good agreement with our umbrella sampling barrier differences (Fig. 4a). In response to this reviewer’s question, we added these points into the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors report the results of QM/MM simulations and kinetic measurements for the phosphoryl-transfer step in adenylate kinase. The main assertion of the paper is that a wide transition state ensemble is a key concept in enzyme catalysis as a strategy to circumvent entropic barriers. This assertion is based on the observation of a "structurally wide" set of energetically equivalent configurations that lie along the reaction coordinate in QM/MM simulations, together with kinetic measurements that suggest a decrease in the entropy of activation.

      We thank the reviewer for the endorsement, and very useful suggestions to improve the manuscript in an revised manuscript. Thanks to the questions, we have edited our manuscript accordingly. All suggested additional simulations and analysis further support our original findings.

      Strengths:

      The study combines theoretical calculations and supporting experiments.

      Weaknesses:

      The role(s) of entropy in enzyme catalysis has been discussed extensively in the literature, from the Circe effect proposed by Jencks and many other works. The current paper hypothesizes a "wide" transition state ensemble as a catalytic strategy and key concept in enzyme catalysis. Overall, it is not clear the degree to which this hypothesis is supported by the data. The reasons are as follows:

      (1) Enzyme catalysis reflects a rate enhancement with respect to a baseline reaction in solution. In order to assert that something is part of a catalytic strategy of an enzyme, it would be necessary to demonstrate from simulations that the activation entropy for the baseline reaction is indeed greater and the transition state ensemble less "wide". Alternatively stated, when indicating there is a "wide transition state ensemble" for the enzyme system - one needs to indicate that is with respect to the non-enzymatic reaction. However, these simulations were not performed and the comparisons were not demonstrated.

      We agree with the reviewer, that the ideal comparison to address enzyme catalytic power is to compare with the baseline reaction in solution. However, as is the case for many biological relevant reactions, in solution the reactions are too slow (i.e have too high barriers) and thus cannot be measured (this reaction would take about 7000 years without the enzyme). Moreover, in many cases, the reaction mechanism in solution is too different to that observed in the enzyme.

      To overcome this problem, another reference reaction is used instead of that in solution, such as a mutant enzyme, or the enzyme lacking a key cofactor, hence a non-optimized enzyme. In the present case, this baseline reaction corresponds to enzyme reaction in the absence of the Mg ion. Consistently, our results clearly show that the reaction without Mg which displays a larger barrier, has a narrower TS. We want to highlight that the extensive and excellent literature about QM/MM calculations of the hydrolysis of ATP hydrolysis in solution, which shows narrow transitions state ensembles, just to mention a few: Klähn, M., Rosta, E., & Warshel, A. (2006).

      On the mechanism of hydrolysis of phosphate monoesters dianions in solutions and proteins.

      Journal of the American Chemical Society, 128(47), 15310–15323. https://doi.org/10.1021/ja065470t; Wang, C., Huang, W., & Liao, J. lou. (2015). QM/MM investigation of ATP hydrolysis in aqueous solution. Journal of Physical Chemistry B, 119(9), 3720–3726. https://doi.org/10.1021/jp512960e.

      (2) The observation of a "wide conformational ensemble" is not a quantitative measure ofentropy. In order to make a meaningful computational prediction of the entropic contribution to the activation of free energy, one would need to perform free energy simulations over a range of temperatures (for the enzymatic and non-enzymatic systems). Such simulations were not performed, and the entropy of activation was thus not quantified by the computational predictions.

      In the present work we do not intend to quantify entropy from the simulations, since such calculations are known to have too large errors.  However, even if not strictly quantified, a wider TS ensemble is a proxy for a larger entropy.

      (3) The authors indicate that lid-opening, essential for product release, and not P-transfer is therate-limiting step in the catalytic cycle and Mg2+ accelerates both steps. How is it certain that the kinetic measurements are reporting on the chemical steps of the reaction, and not other factors such as metal ion binding or conformational changes?

      These questions were indeed the absolute critically ones we needed to answer early for studying how adenylate kinase is catalyzing the reaction by more than 14 orders of magnitude. This was done by a combination of pre-steady state, steady-state experiments combined with NMR dynamics, published in (Kerns et al., 2015), and described in the beginning of this manuscript in Fig. 1a. We agree with the reviewer that for many other enzymes such experimental examination of all microscopic steps for the enzymatic cycle had not been performed, leading to the risk of wrong interpretation of observed kinetic rates.

      (4) The authors explore different starting states for the chemical steps of the reaction (e.g.,different metal ion binding and protonation states), and conclude that the most reactive enzyme configuration is the one with the more favorable reaction-free energy barrier. However, it is not clear what is the probability of observing the system in these different states as a function of pH and metal ion concentration without performing appropriate pKa and metal ion binding calculations. This was not done, and hence these results seem somewhat inconclusive.

      As noted by the reviewer, in the present work our aim was to compare the chemical step of the reaction in different metal ion and protonation states. Our computational results show that the most reactive enzyme configuration is the nonprotonated state with Mg2+ in our forward reaction.

      We actually know what the probability of the metal-bound states are for this enzyme. The experimental data were described in (Kerns et al., 2015), we directly experimentally determined the concentration needed to fully occupy the Mg site with Mg or Ca, therefore no metal binding calculations are needed as the experiments are a direct measurement. From our x-ray structures we know the accurate binding site, and also see full occupancy. This is also true for the pH dependence of the chemical step, measured in this manuscript and shown in Fig. 5b. We note that the excellent agreement between our simulations and the experiments are one of the key features of the current manuscript.  As stated in the manuscript, we analyzed the pH dependence of the P-transfer step and showed that the rate increases with higher pH in the presence of Ca2+, while without a metal the opposite trend is observed. These results further support the QM/MM results showing that the fully-charged nucleotides state was the most reactive in the presence of the metal, whereas in the absence of the cation, only the monoprotonated nucleotides (low pH) were reactive.

      Reviewer #3 (Public Review):

      Summary:

      By conducting QM/MM free energy simulations, the authors aimed to characterize the mechanism and transition state for the phosphoryl transfer in adenylate kinase. The qualitative reliability of the QM/MM results has been supported by several interesting experimental kinetic studies. However, the interpretation of the QM/MM results is not well supported by the current calculations.

      Strengths:

      The QM/MM free energy simulations have been carefully conducted. The accuracy of the semiempirical QM/MM results was further supported by DFT/MM calculations, as well as qualitatively by several experimental studies.

      We thank the reviewer for the positive comments on the manuscript, particularly highlighting the support of the QM/MM results by additional DFT/MM calculations and several experiments.

      Weaknesses:

      (1) One key issue is the definition of the transition state ensemble. The authors appear to define this by simply considering structures that lie within a given free energy range from the barrier. However, this is not the rigorous definition of transition state ensemble, which should be defined in terms of committor distribution. This is not simply an issue of semantics, since only a rigorous definition allows a fair comparison between different cases - such as the transition state in an enzyme vs in solution, or with and without the metal ion. For a chemical reaction in a complex environment, it is also possible that many other variables (in addition to the breaking and forming P-O bonds) should be considered when one measures the diversity in the conformational ensemble.

      We thank the reviewer for noting this issue and for this great suggestion, as this led to a strengthening of the key findings in the revised manuscript version.  According to his/her suggestion, we performed a commitment analysis to properly define the TSE and compare the results between the enzyme in the presence/absence of Mg2+ (see new Fig. 4b).  The results further strengthen our previous finding and interpretation of a wider TSE for the reaction with Mg relative to without Mg.

      (2) While the experimental observation that the activation entropy differs significantly with and without the Ca2+ ion is interesting, it is difficult to connect this result with the "wide" transition state ensemble observed in the QM/MM simulations so far. Even without considering the definition of the transition state ensemble mentioned above, it is unlikely that a broader range of P-O distances would explain the substantial difference in the activation entropy measured in the experiment. Since the difference is sufficiently large, it should be possible to compute the value by repeating the free energy simulations at different temperatures, which would lead to a much more direct evaluation of the QM/MM model/result and the interpretation.

      In the present work we do not intend to quantify entropy from the simulations, since such calculations are known to have too large errors.  However, even if not strictly quantified, a wider TS ensemble is a proxy for a larger entropy. We believe that the additional committor calculations and the umbrella sampling (new Fig. 4a) are a strong support of our original findings, and better suited for supporting our findings as compared to repeating the free energy simulations at different temperatures.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Make sure consistent units are used, either kJ/mol or kcal/mol.

      Thanks, we made the changes.

      In the case of the mono-protonated simulation, where does the proton transfer between AD(T)P and AMP occur in both the forward and reverse reactions? It is worthwhile to note that the proton transfer may take place at different reaction coordinate values (between the two reactions), as it is not explicitly defined in the reaction coordinate. In this context, it is also necessary to discuss how to combine the results to generate a single free energy profile.

      We agree with the reviewer on this point. Accordingly, we have analyzed for the monoprotonated reaction when (or where in terms of RC) the proton transfer occurs in both forward and reverse reactions. The proton transfer occurs at -0.7 of the reaction coordinate (average value, figures 3-supplement 5 e and f).

      The methods section needs improvements:

      (1) Computational setup of the system: Were the systems neutralized? If so, what types of ions were used, and how many of them were included? If systems were not neutralized, discuss a potential artifact in the results. In addition, if the system for the reverse reaction (and no-Mg2+ systems) was prepared separately, provide details regarding their preparation.

      We thank the reviewer for noting this issue. Accordingly, we have provided the requested additional details of the computational setup in the revised version.

      (2) Simulation parameters: Clarify how non-bonded interactions were treated in both MM and QM/MM simulations. For the QM/MM simulation, specify the time step used, whether the Shake was applied; whether the NPT simulations were performed, and any other relevant parameters.

      We thank the reviewer for noting this issue. Accordingly, we have provided the requested additional details of the simulation parameters.

      (3) Free energy determination strategy: Describe how the two profiles (forward and reverse profiles) were combined and provide a theoretical justification for this approach. Additionally, include a comment on whether Jarzynski's inequality equation is directly applicable to the NPT simulation.

      According to the reviewer request, in the revised version of the manuscript we have described how the two profiles where combined and provided a theoretical justification for this approach.

      Reviewer #3 (Recommendations For The Authors):

      See recommendations in the Public Review regarding the analysis of transition state ensemble and activation entropy.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Response to reviewer #1:

      We thank the reviewer for the further recommendations for improving our presentation. We would like to carefully address the remaining concerns of the reviewer.

      (1) I realize now that I didn't make my point clear enough, which was that as far as I know there is no reason to believe that an oscillatory state cannot be induced with synaptic depression as with spike frequency adaptation when used in the context of the author's model. I'm fine with how the authors have distinguished their model from R&T 2015, but I think the more interesting question is whether there is any reason to believe that STD is not equally capable of doing all the things mentioned in this paper as SFA, and if not why not. I would like the authors to go out on a limb and address this, if only with a few sentences in the discussion. 

      Thank you for pointing this out again. In response to your query regarding the comparison between STD and SFA in generating bump sweeps, we have done simulations based on STD. The results showed that both STD and SFA are capable of inducing bi-directional sweeps. However, (based on our simulations) only SFA can produce uni-directional sweeps. The absence of uni-directional sweeps based on STD may be due to the subtle yet important differences between the two mechanisms. Specifically, STD modulates the neural activity by weakening the recurrent connections, which theoretically can only inhibit recurrent inputs, while SFA can attenuate all forms of excitatory inputs, including external inputs. However, since we did not exhaustively explore the entire parameter space, we cannot conclude that STD is incapable of producing uni-directional sweeps. Future simulations are required.

      According to the Reviewer’s suggestion, we added few sentences to discuss the distinctions between STD and SFA in generating theta sweeps in the CANN in line 432 to 440 in the Discussion session:

      “Based on our simulation, both STD and SFA show the ability to produce bi-directional sweeps within a CANN model, with the SFA uniquely enabling uni-directional sweeps in the absence of external theta inputs. This difference might be due to the lack of exhaustively exploration of the entire parameter space. However, it might also attribute to the subtle yet important theoretical distinctions between STD and SFA. Specifically, STD attenuates the neural activity through a reduction in recurrent connection strength, whereas SFA provides inhibitory input directly to the neurons, potentially impacting all excitatory inputs. These differences might explain the diverse dynamical behaviors observed in our simulations. Future experiments could clarify these distinctions by monitoring changes in synaptic strength and inhibitory channel activation during theta sweeps.”

      (2) I appreciate the inclusion of the experimental data in Fig 6a (though I don't find the left-most panel very useful). I also understand what the authors are trying to convey with plots in 6c and 6c. However, I don't find the text that was added above very helpful at all. I was hoping for a simpler demonstration of the effect, by plotting a series of sequential sweeps (cell index vs time, with color indicating firing rate, as in Fig 2d) in the case of both the slow speed and fast speed regimes. Here, vertical lines could mark the individual theta cycles and the firing of individual cells, showing the constancy of the former but change of the latter. 

      Thank you for your constructive feedback. It seems there might be a misunderstanding in our previous explanation, for which we apologize. The phenomenon we want to elucidate is not an increase in the theta frequency as detected in LFPs, but rather the slope of phase precession with respect to the animal's movement speed. Due to phase precession, the oscillations of place cells as the animal traverses the field is higher than the theta frequency. A plot as Fig 2.d will not make this point clearer, since it shows the baseline theta frequency (i.e., theta sweeps as we claimed previously). A straightforward way of thinking this point is as we added previously: “…The faster the animal runs, the faster the extra half cycle can be accomplished. Consequently, the firing frequency will increase more (a steeper slope in Fig. 6c red dots) than the baseline frequency”. We hope this clarification addresses the concerns raised.

      (3) This is still confusing to me. I just don't understand how the *phase* of the oscillating activity bump has anything to do with the movement of the animal. I would like to see a plot of the sweeps (again, cell index vs time, with color indicating the firing rate) before and after inactivation for short and long duration inactivation. Perhaps I am not understanding or appreciating how the bump recovers after inactivation and how this is related to the motion of the animal. 

      Thank you for pointing this out again. The activity bump will naturally pop out at the input location (which moves forward than before) after we remove the inactivation and then starts to sweep again as before the inactivation. Single cell phase precession and populational theta sweeps are actually the two sides of the same coin (if all cells start at roughly the same phase in theta cycles). If the reviewer accept this, then at the new location, the activity bump sweeps again (around the new location), and therefore phase precession starts again at a further phase, since phase codes the position as the animal traverses the place field.

      (4) I am glad the authors are spending more time discussing this phenomenon, but I am unsure of their explanation: for a sweep moving at constant speed, neurons all along the path will be equally affected (inhibited), so where does the bias for suppressing the "end" neurons come from? 

      While it may appear that neurons along the path are equally inhibited as the bump sweeps over them, our model incorporates external inputs with Gaussian profiles. These inputs bias neurons closer to the input location, resulting in fewer activations in neurons further away from the input position.

      (5) Here I was hoping that the authors might comment on what they suspect happens when the animal starts (or stops) moving, and how the network shifts from tracking regime to oscillatory regime (or vice versa), as is typically seen in experimental data (see for example, Kay et al., 2020, fig 4b,c). My apologies for not making this point clearer. 

      Thank you for pointing this out. In our model, we observed that when the animal stops, the network continues to generate theta oscillations near the input location, albeit with reduced amplitude (so the network dynamics looks like in the tracking regime). However, we hypothesize that when the animal pauses its movement for enough time (immobile but awake states), sensory input into the hippocampus also decreases, which is similar to removing external inputs in our model. In this case, the activity bump spontaneously moves away, resembling the phenomenon of replay (see also Romani & Tsodyks 2015).

      Regarding the experimental data (Kay et al.), it indeed appears that theta sweeps decoded from neural activity become less pronounced when the mouse moves at slower speeds. This observation could potentially correspond to a decrease in the amplitude of bump oscillations when external inputs associated with movement are halted but not entirely removed in our model. However, in experiments, when the mouse's movement slows down, hippocampal activity no longer oscillates at theta frequency, making it challenging to decode theta sweeps.

      We appreciate your clarification on this point and recognize the importance of further investigating how our model can accurately replicate the transition between tracking and oscillatory regimes observed in experimental data.

    2. eLife assessment

      This study provides valuable new insights on how a prevailing model of hippocampal sequence formation can account for recent data, including forward and backward sweeps, as well as constant cycling of sweeps across different arms of a T-maze. The convincing evidence presented in support of this work relies on classical analytical and computational techniques about continuous attractor networks.

    3. Reviewer #1 (Public Review):

      Continuous attractor networks endowed with some sort of adaptation in the dynamics, whether that be through synaptic depression or firing rate adaptation, are fast becoming the leading candidate models to explain many aspects of hippocampal place cell dynamics, from hippocampal replay during immobility to theta sequences during run. Here, the authors show that a continuous attractor network endowed with spike frequency adaptation and subject to feedforward external inputs is able to account for several previously unaccounted aspects of theta sequences, including (1) sequences that move both forwards and backwards, (2) sequences that alternate between two arms of a T-maze, (3) speed modulation of place cell firing frequency, and (4) the persistence of phase information across hippocampal inactivations.

      I think the main result of the paper (findings (1) and (2)) are likely to be of interest to the hippocampal community, as well as to the wider community interested in mechanisms of neural sequences. In addition, the manuscript is generally well written and the analytics are impressive. However, several issues should be addressed, which I outline below.

      Major comments:

      In real data, population firing rate is strongly modulated by theta (i.e., cells collectively prefer a certain phase of theta - see review paper Buzsaki, 2002) and largely oscillates at theta frequency during run. With respect to this cyclical firing rate, theta sweeps resemble "Nike" check marks, with the sweep backwards preceding the sweep forwards within each cycle before the activity is quenched at the end of the cycle. I am concerned that (1) the summed population firing rate of the model does not oscillate at theta frequency, and (2) as the authors state, the oscillatory tracking state must begin with a forward sweep. With regards to (1), can the authors show theta phase spike preference plots for the population to see if they match data? With regards to (2), can the authors show what happens if the bump is made to sweep backwards first, as it appears to do within each cycle?

      I could not find the width of the external input mentioned anywhere in the text or in the table of parameters. The implication is that it is unclear to me whether, during the oscillatory tracking state, the external input is large compared to the size of the bump, so that the bump lives within a window circumscribed by the external input and so bounces off the interior walls of the input during the oscillatory tracking phase, or whether the bump is continuously pulled back and forth by the external input, in which case it could be comparable to the size of the bump. My guess based on Fig 2c is that it is the latter. Please clarify and comment.

      I would argue that the "constant cycling" of theta sweeps down the arms of a T-maze was roughly predicted by Romani & Tsodyks, 2015, Figure 7. While their cycling spans several theta cycles, it nonetheless alternates by a similar mechanism, in that adaptation (in this case synaptic depression) prevents the subsequent sweep of activity from taking the same arm as the previous sweep. I believe the authors should cite this model in this context and consider the fact that both synaptic depression and spike frequency adaptation are both possible mechanisms for this phenomenon. But I certainly give the authors credit for showing how this constant cycling can occur across individual theta cycles.

      The authors make an unsubstantiated claim in the paragraph beginning with line 413 that the Tsodyks and Romani (2015) model could not account for forwards and backwards sweeps. Both the firing rate adaptation and synaptic depression are symmetry breaking models that should in theory be able to push sweeps of activity in both directions, so it is far from obvious to me that both forward and backward sweeps are not possible in the Tsodyks and Romani model. The authors should either prove that this is the case (with theory or simulation) or excise this statement from the manuscript.

      The section on the speed dependence of theta (starting with line 327) was very hard to understand. Can the authors show a more graphical explanation of the phenomenon? Perhaps a version of Fig 2f for slow and fast speeds, and point out that cells in the latter case fire with higher frequency than in the former?

      I had a hard time understanding how the Zugaro et al., (2005) hippocampal inactivation experiment was accounted for by the model. My intuition is that while the bump position is determined partially by the location of the external input, it is also determined by the immediate history of the bump dynamics as computed via the local dynamics within the hippocampus (recurrent dynamics and spike rate adaptation). So that if the hippocampus is inactivated for an arbitrary length of time, there is nothing to keep track of where the bump should be when the activity comes back on line. Can the authors please explain more how the model accounts for this?

      Can the authors comment on why the sweep lengths oscillate in the bottom panel of Fig 5b during starting at time 0.5 seconds before crossing the choice point of the T-maze? Is this oscillation in sweep length another prediction of the model? If so, it should definitely be remarked upon and included in the discussion section.

      Perhaps I missed this, but I'm curious whether the authors have considered what factors might modulate the adaptation strength. In particular, might rat speed modulate adaptation strength? If so, would have interesting predictions for theta sequences at low vs high speeds.

      I think the paper has a number of predictions that would be especially interesting to experimentalists but are sort of scattered throughout the manuscript. It would be beneficial to have them listed more prominently in a separate section in the discussion. This should include (1) a prediction that the bump height in the forward direction should be higher than in the backward direction, (2) predictions about bimodal and unimodal cells starting with line 366, (3) prediction of another possible kind of theta cycling, this time in the form of sweep length (see comment above), etc.

    4. Reviewer #2 (Public Review):

      In this work, the authors elaborate on an analytically tractable, continuous-attractor model to study an idealized neural network with realistic spiking phase precession/procession. The key ingredient of this analysis is the inclusion of a mechanism for slow firing-rate adaptation in addition to the otherwise fast continuous-attractor dynamics. The latter continuous-attractor dynamics classically arises from a combination of translation invariance and nonlinear rate normalization.

      For strong adaptation/weak external input, the network naturally exhibits an internally generated, travelling-wave dynamics along the attractor with some characteristic speed. For small adaptation/strong external stimulus, the network recovers the classical externally driven continuous-attractor dynamics. Crucially, when both adaptation and external input are moderate, there is a competition with the internally generated and externally generated mechanisms leading to an oscillatory tracking regime. In this tracking regime, the population firing profile oscillates around the neural field tracking the position of the stimulus. The authors demonstrate by a combination of analytical and computational arguments that oscillatory tracking corresponds to realistic phase precession/procession. In particular the authors can account for the emergence of unimodal and bimodal cells, as well as some other experimental observations with respect the dependence of phase precession/procession on the animal's locomotion.

      The strengths of this work are at least three-fold: 1) Given its simplicity, the proposed model has a surprisingly large explanatory power of the various experimental observations. 2) The mechanism responsible for the emergence of precession/procession can be understood as a simple yet rather illuminating competition between internally driven and externally driven dynamical trends. 3) Amazingly, and under some adequate simplifying assumptions, a great deal of analysis can be treated exactly, which allows for a detailed understanding of all parametric dependencies. This exact treatment culminates with a full characterization of the phase space of the network dynamics, as well as the computation of various quantities of interest, including characteristic speeds and oscillating frequencies.

      As mentioned by the authors themselves, the main limitation of this work is that it deals with a very idealized model and it remains to see how the proposed dynamical behaviors would persists in more realistic models. For example, the model is based on a continuous attractor model that assumes perfect translation-invariance of the network connectivity pattern. Would the oscillating tracking behavior persist in the presence of connection heterogeneities? Another limitation is that the system needs to be tuned to exhibit oscillation within the theta range and that this tuning involves a priori variable parameters such as the external input strength. Is the oscillating-tracking behavior overtly sensitive to input strength variations? The author mentioned that an external pacemaker can serve to drive oscillation within the desired theta band but there is no evidence presented supporting this. A final and perhaps secondary limitation has to do with the choice of parameter, namely the time constant of neural firing which is chosen around 3ms. This seems rather short given that the fast time scale of rate models (excluding synaptic processes) is usually given by the membrane time constant, which is typically about 15ms. I suspect this latter point can easily be addressed.

    1. eLife assessment

      This solid study assesses a novel mitochondrial inhibitor in combination with the BCL-2 inhibitor venetoclax, with the aim to increase its activity in acute myeloid leukemia. It provides valuable findings of combinatorial efficacy using preclinical models, confirming the overall importance of targeting oxidative phosphorylation to overcome venetoclax resistance in acute myeloid leukemia, and could be strengthened through mechanistic studies demonstrating drug specificity, pharmacodynamic efficacy studies in vivo to test clinical utility and extended statistical analyses of the results. The study is of interest to hematologists because it addresses a key biomedical issue in acute myeloid leukemia (venetoclax resistance) and provides data regarding the safety and activity of a novel inhibitor of the mitochondrial polymerase addressed in combination with venetoclax.

    2. Reviewer #1 (Public Review):

      This study exploits novel agent (IMT) that inhibits mitochondrial activity in combination with venetoclax. While the concept is not novel, the agent is novel (inhibitor of the mitochondrial RNA polymerase, described in Nature in other tumor models), and quest for safe mitochondrial inhibitors is highly warranted. The strength is in vivo activity data shown in CLDX and in one of the two AML PDX models tested, and the apparent safety of the combination. However, the impact on survival is impressive in CLDX but not in PDX, and unclear why Ven-sensitive PDX is resistant to combination (opposite what cell line data show). The paper is lacking mechanistic data beyond Seahorse and standard apoptosis assays, and even transcriptome analysis from PDX cells is poorly analyzed. There is no real evidence that this agent overcome Ven resistance, which could be done for example in primary AML cells. Finally, no on-target pharmacodynamic endpoints are measured in vivo to support the activity of the compound on mitochondrial activity at the doses used (which are safe). These multiple weaknesses significantly reduce my enthusiasm for this manuscript.

      The cell line data show additive/synergistic effects of IMT and Ven on cell viability in p53-WT cells. However, no mechanisms of synergy beyond OCR are shown, which is a missed opportunity.

      No data are shown in primary AML cells in vitro. This could address venetoclax-resistant AML cells with distinct genomic profiles.

      The in vivo CLDX model (MV4;11) data is quite impressive, showing reduction of tumor burden and meaningful extension of survival in combination cohort. It is unclear why venetoclax used at highest dose normally sued in vivo (100mg/kg) did not show any impact on survival in this Ven-sensitive model. It is disappointing that no biomarkers of mitochondrial activity (for example, simple pAMPK, or levels of mitochondrial subunits) are shown to support on-target pharmacodynamic activity. However, efficacy in human PDX is less impressive, for example in Fig 6C the combination has extended survival from 96 to 112 days, possibly due to early stopping of treatment (around day 30); and no extension of survival is seen in another PDX in Fig 7. Still, this is indicative of combinatorial activity in TP53-mutant PDX. There is however discrepancy with in vitro studies that show no impact of combination in TP53 mutant cells and synergy in TP53-wt cells, and the opposite findings in vivo, which is not explained. Overall, the activity of the combination is modest. The safety is encouraging, but again, no pharmacodynamic measurements are shown to support that IMT at least partially inhibited mitochondrial activity in AML cells.

      In Discussion the statement that inhibition of POLRMT can overcome venetoclax resistance is not supported by the data, as no additive effects are seen in vitro in TP53 mutant cells, and no other resistant models (such as primary AML cells) are tested. In vivo as stated above there is some activity in TP53 mutant PDX but this alone cannot be sued to justify this strong statement. Also, the sentence that "...we were able to reduce the tumor burden in all (cell- and patient-derived) xenografted mice treated with a combination of IMT and venetoclax" is not supported by data in Fig 7.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Arabanian and colleagues presents studies showing how inhibition of mitochondrial transcription and replication with a novel inhibitor of the mitochondrial polymerase, IMT, can promote AML cell death in combination with the Bcl2 inhibitor venetoclax. They further show that this combinatorial efficacy is evident in vivo in both the AML cell line MV411 and in a PDX model. Given the multiple studies showing the importance of Oxphos in maintaining AML cell survival, the current studies provide an additional strategy to inhibit Oxphos and thus improve the therapeutic management of AML.

      Strengths:

      A novel aspect of this work is that IMT is a new class of mitochondrial inhibitor that acts by inhibiting the mitochondrial polymerase. In addition, the demonstration of therapeutic efficacy both in vitro and in vivo (including with PDX), together with some data showing minimal toxicity, adds to the impact of this work. Their overall conclusion that IMT increases the potency of Vex in treating AMLs is supported.

      Weaknesses:

      There are several deficiencies that should be addressed to substantiate the rigor and impact of this study. Of most importance, they need to show that IMT actually inhibits the mitochondrial polymerase in AML cells, and there are additional concerns with their models that if addressed would improve the ability of IMT to be developed clinically.

    1. eLife assessment

      This valuable study aims to present a mathematical theory for why the periodicity of the hexagonal pattern of grid cell firing would be helpful for encoding 2D spatial trajectories. The idea is supported by solid evidence, but some of the comparisons of theory to the experimental data seem incomplete, and the reasoning supporting some of the assumptions made should be strengthened. The work would be of interest to neuroscientists studying neural mechanisms of spatial navigation.

    2. Reviewer #1 (Public Review):

      Rebecca R.G. et al. set to determine the function of grid cells. They present an interesting case claiming that the spatial periodicity seen in the grid pattern provides a parsimonious solution to the task of coding 2D trajectories using sequential cell activation. Thus, this work defines a probable function grid cells may serve (here, the function is coding 2D trajectories), and proves that the grid pattern is a solution to that function. This approach is somewhat reminiscent in concept to previous works that defined a probable function of grid cells (e.g., path integration) and constructed normative models for that function that yield a grid pattern. However, the model presented here gives clear geometric reasoning to its case.

      Stemming from 4 axioms, the authors present a concise demonstration of the mathematical reasoning underlying their case. The argument is interesting and the reasoning is valid, and this work is a valuable addition to the ongoing body of work discussing the function of grid cells.

      However, the case uses several assumptions that need to be clearly stated as assumptions, clarified, and elaborated on: Most importantly, the choice of grid function is grounded in two assumptions:<br /> (1) that the grid function relies on the activation of cell sequences, and<br /> (2) that the grid function is related to the coding of trajectories. While these are interesting and valid suggestions, since they are used as the basis of the argument, the current justification could be strengthened (references 28-30 deal with the hippocampus, reference 31 is interesting but cannot hold the whole case).

      The work further leans on the assumption that sequences in the same direction should be similar regardless of their position in space, it is not clear why that should necessarily be the case, and how the position is extracted for similar sequences in different positions. The authors also strengthen their model with the requirement that grid cells should code for infinite space. However, the grid pattern anchors to borders and might be used to code navigated areas locally. Finally, referencing ref. 14, the authors claim that no existing theory for the emergence of grid cell firing that unifies the experimental observations on periodic firing patterns and their distortions under a single framework. However, that same reference presents exactly that - a mathematical model of pairwise interactions that unifies experimental observations. The authors should clarify this point.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors consider why grid cells might exhibit hexagonal symmetry - i.e., for what behavioral function might this hexagonal pattern be uniquely suited? The authors propose that this function is the encoding of spatial trajectories in 2D space. To support their argument, the authors first introduce a set of definitions and axioms, which then lead to their conclusion that a hexagonal pattern is the most efficient or parsimonious pattern one could use to uniquely label different 2D trajectories using sequences of cells. The authors then go through a set of classic experimental results in the grid cell literature - e.g. that the grid modules exhibit a multiplicative scaling, that the grid pattern expands with novelty or is warped by reward, etc. - and describe how these results are either consistent with or predicted by their theory. Overall, this paper asks a very interesting question and provides an intriguing answer. However, the theory appears to be extremely flexible and very similar to ideas that have been previously proposed regarding grid cell function.

      Major strengths:

      The general idea behind the paper is very interesting - why *does* the grid pattern take the form of a hexagonal grid? This is a question that has been raised many times; finding a truly satisfying answer is difficult but of great interest to many in the field. The authors' main assertion that the answer to this question has to do with the ability of a hexagonal arrangement of neurons to uniquely encode 2D trajectories is an intriguing suggestion. It is also impressive that the authors considered such a wide range of experimental results in relation to their theory.

      Major weaknesses:

      One major weakness I perceive is that the paper overstates what it delivers, to an extent that I think it can be a bit confusing to determine what the contributions of the paper are. In the introduction, the authors claim to provide "mathematical proof that ... the nature of the problem being solved by grid cells is coding of trajectories in 2-D space using cell sequences. By doing so, we offer a specific answer to the question of why grid cell firing patterns are observed in the mammalian brain." This paper does not provide proof of what grid cells are doing to support behavior or provide the true answer as to why grid patterns are found in the brain. The authors offer some intriguing suggestions or proposals as to why this might be based on what hexagonal patterns could be good for, but I believe that the language should be clarified to be more in line with what the authors present and what the strength of their evidence is.

      Relatedly, the authors claim that they find a teleological reason for the existence of grid cells - that is, discover the function that they are used for. However, in the paper, they seem to instead assume a function based on what is known and generally predicted for grid cells (encode position), and then show that for this specific function, grid cells have several attractive properties.

      There is also some other work that seems very relevant, as it discusses specific computational advantages of a grid cell code but was not cited here: https://www.nature.com/articles/nn.2901.

      A second major weakness was that some of the claims in the section in which they compared their theory to data seemed either confusing or a bit weak. I am not a mathematician, so I was not able to follow all of the logic of the various axioms, remarks, or definitions to understand how the authors got to their final conclusion, so perhaps that is part of the problem. But below I list some specific examples where I could not follow why their theory predicted the experimental result, or how their theory ultimately operated any differently from the conventional understanding of grid cell coding. In some cases, it also seemed that the general idea was so flexible that it perhaps didn't hold much predictive power, as extra details seemed to be added as necessary to make the theory fit with the data.

      I don't quite follow how, for at least some of their model predictions, the 'sequence code of trajectories' theory differs from the general attractor network theory. It seems from the introduction that these theories are meant to serve different purposes, but the section of the paper in which the authors claim that various experimental results are predicted by their theory makes this comparison difficult for me to understand. For example, in the section describing the effect of environmental manipulations in a familiar environment, the authors state that the experimental results make sense if one assumes that sequences are anchored to landmarks. But this sounds just like the classic attractor-network interpretation of grid cell activity - that it's a spatial metric that becomes anchored to landmarks.

      It was not clear to me why their theory predicted the field size/spacing ratio or the orientation of the grid pattern to the wall.

      I don't understand how repeated advancement of one unit to the next, as shown in Figure 4E, would cause the change in grid spacing near a reward.

      I don't follow how this theory predicts the finding that the grid pattern expands with novelty. The authors propose that this occurs because the animals are not paying attention to fine spatial details, and thus only need a low-resolution spatial map that eventually turns into a higher-resolution one. But it's not clear to me why one needs to invoke the sequence coding hypothesis to make this point.

      The last section, which describes that the grid spacing of different modules is scaled by the square root of 2, says that this is predicted if the resolution is doubled or halved. I am not sure if this is specifically a prediction of the sequence coding theory the authors put forth though since it's unclear why the resolution should be doubled or halved across modules (as opposed to changed by another factor).

    4. Reviewer #3 (Public Review):

      The manuscript presents an intriguing explanation for why grid cell firing fields do {\em not} lie on a lattice whose axes aligned to the walls of a square arena. This observation, by itself, merits the manuscript's dissemination to the journals audience.

      The presentation is quirky (but keep the quirkiness!).

      But let me recast the problem presented by the authors as one of combinatorics. Given repeating, spatially separated firing fields across cells, one obtains temporal sequences of grid cells firing. Label these cells by integers from $[n]$. Any two cells firing in succession should uniquely identify one of six directions (from the hexagonal lattice) in which the agent is currently moving.

      Now, take the symmetric group $\Sigma$ of cyclic permutations on $n$ elements.<br /> We ask whether there are cyclic permutations of $[n]$ such that

      So, for instance, $(4,2,3,1)$ would not be counted as a valid permutation of $(1,2,3,4)$, as $(2,3)$ and $(1,4)$ are adjacent.

      Furthermore, given $[n]$, are there two distinct cyclic permutations such that {\em no} adjacencies are preserved when considering any pair of permutations (among the triple of the original ordered sequence and the two permutations)? In other words, if we consider the permutation required to take the first permutation into the second, that permutation should not preserve any adjacencies.

      {\bf Key question}: is there any difference between the solution to the combinatorics problem sketched above and the result in the manuscript? Specifically, the text argues that for $n=7$ there is only {\em one} solution.

      Ideally, one would strive to obtain a closed-form solution for the number of such permutations as a function of $n$.

    1. eLife assessment

      Notch1 is expressed uniformly throughout the mouse endocardium during the initial stages of heart valve formation, yet it remains unclear how Notch signaling is activated in specific regions to induce valve formation. To answer this question, the authors used a combination of in vivo and ex vivo experiments in mice to demonstrate ligand-independent activation of Notch1 by circulation induced-mechanical stress and provide partially convincing evidence for stimulation of a novel mechanotransduction pathway involving post-translational modification of mTORC2 and Protein Kinase C (PKC) upstream of Notch1. While these findings represent an important advance in our understanding of Notch1-mediated valve formation, data supporting the main claims are incomplete.

    2. Joint Public Review:

      The overall goal of this manuscript is to understand how Notch signaling is activated in specific regions of the endocardium, including the OFT and AVC, that undergo EMT to form the endocardial cushions. Using dofetilide to transiently block circulation in E9.5 mice, the authors show that Notch receptor cleavage still occurs in the valve-forming regions due to mechanical sheer stress as Notch ligand expression and oxygen levels are unaffected. The authors go on to show that changes in lipid membrane structure activate mTOR signaling, which causes phosphorylation of PKC and Notch receptor cleavage.

      The strengths of the manuscript include the dual pharmacological and genetic approaches to block blood flow in the mouse, the inclusion of many controls including those for hypoxia, the quality of the imaging, and the clarity of the text. However, several weaknesses were noted surrounding the main claims where the supporting data are incomplete.

      PKC - Notch1 activation:

      (1) Does deletion of Prkce and Prkch affect blood flow, and if so, might that be suppressing Notch1 activation indirectly?

      (2) It would be helpful to visualize the expression of prkce and prkch by in situ hybridization in E9.5 embryos.

      (3) PMA experiments: Line 223-224: A major concern is related to the conclusion that "blood flow activates Notch in the cushion endocardium via the mTORC2-PKC signaling pathway". To make that claim, the authors show that a pharmacological activation with a potent PKC activator, PMA, rescues NICD levels in the AVC in dofetilide-treated embryos. This claim would also need proof that a lack of blood flow alters the activity of mTORC2 to phosphorylate the targets of PKC phosphorylation. Also, this observation does not explain the link between PKC activity and Notch activation.

      (4) In addition, the authors hypothesise that shear stress lies upstream of PKC and Notch activation, and that because shear stress is highest at the valve-forming regions, PKC and Notch activity is localised to the valve-forming regions. Since PMA treatment affects the entire endocardium which expresses Notch1, NICD should be seen in areas outside of the AVC in the PMA+dofetilide condition. Please clarify.

      Lipid Membrane:

      (1) It is not clear how the authors think that the addition of cholesterol changes the lipid membrane structure or alters Cav-1 distribution. Can this be addressed? Does adding cholesterol make the membrane more stiff? Does increased stiffness result from higher shear stress?

      (2) The loss of blood flow apparently affects Cav1 membrane localization and causes a redistribution from the luminal compartment to lateral cell adhesion sites. Cholesterol treatment of dofetilide-treated hearts (lacking blood flow) rescued Cav1 localization to luminal membrane microdomains and rescued NICD expression. It remains unclear how the general addition of cholesterol would result in a rescue of regionalized membrane distribution within the AVC and in high-shear stress areas.

      (3) The authors do not show the entire heart in that rescue treatment condition (cholesterol in dofetilide-treated hearts). Also, there is no quantification of that rescue in Figure 4B. Currently, only overview images of the heart are shown but high-resolution images on a subcellular scale (such as electron microscopy) are needed to resolve and show membrane microdomains of caveolae with Cav1 distribution. This is important because Cav-1could have functions independent of caveolae (eg. Lolo et al., https://doi.org/10.1038/s41556-022-01034-3).

      Figure Legends, missing data, and clarity:

      (1) The number of embryos used in each experiment is not clear in the text or figure legends. In general, figure legends are incomplete (for instance in Figure 1).

      (2) Line 204: The authors refer to unpublished endocardial RNAseq data from E9.5 embryos. These data must be provided with this manuscript if it is referred to in any way in the text.

      (3) Figure 1 shows Dll4 transcript levels, which do not necessarily correlate with protein levels. It would be important to show quantifications of these patterns as Notch/Dll4 levels are cycling and may vary with time and between different hearts.

      (4) Line 212-214: The authors describe cardiac cushion defects due to the loss of blood flow and refer to some quantifications that are not completely shown in Figure 3. For instance, quantifications for cushion cellularity and cardiac defects at three hours (after the start of treatment?) are missing.

      (5) Related to Figure 5. The work would be strengthened by quantification of the effects of dofetilide and verapamil on heartbeat at the doses applied. Is the verapamil dosage used here similar to the dose used in the clinic?

      Overstated Claims:

      (1) The authors claim that the lipid microstructure/mTORC2/PKC/Notch pathway is responsive to shear stress, rather than other mechanical forces or myocardial function. Their conclusions seem to be extrapolated from various in vitro studies using non-endocardial cells. To solidify this claim, the authors would need additional biomechanical data, which could be obtained via theoretical modelling or using mouse heart valve explants. This issue could also be addressed by the authors simply softening their conclusions.

      (2) Line 263-264: In the discussion, the authors conclude that "Strong fluid shear stress in the AVC and OFT promotes the formation of caveolae on the luminal surface of the endocardial cells, which enhances PKCε phosphorylation by mTORC2." This link was shown rather indirectly, rather than by direct evidence, and therefore the conclusion should be softened. For example, the authors could state that their data are consistent with this model.

      (3) In the Discussion, it says: "Mammalian embryonic endocardium undergoes extensive EMT to form valve primordia while zebrafish valves are primarily the product of endocardial infolding (Duchemin et al., 2019)." In the paper cited, Duchemin and colleagues described the formation of the zebrafish outflow tract valve. The zebrafish atrioventricular valve primordia is formed via partial EMT through Dll-Notch signaling (Paolini et al. Cell Reports 2021) and the collective cell migration of endocardial cells into the cardiac jelly. Then, a small subset of cells that have migrated into the cardiac jelly give rise to the valve interstitial cells, while the remainder undergo mesenchymal-to-endothelial transition and become endothelial cells that line the sinus of the atrioventricular valve (Chow et al., doi: 10.1371/journal.pbio.3001505). The authors should modify this part of the Discussion and cite the relevant zebrafish literature.

    1. Reviewer #2 (Public Review):

      Summary:

      The authors demonstrated that maternal choline supplementation (MCS) improved spatial memory, reduced a marker of hyperexcitability/epilepsy (FosB expression), and reduced oxidative stress (as measured by restored NeuN expression) in an Alzheimer's disease mouse model. This multidisciplinary study spanned behavior, EEG, and histological measures and constituted a large amount of work. Overall, the results supported that MCS does have important effects on hippocampal function, which may substantially impact human AD.

      Strengths:

      The strength of the group was the ability to monitor the incidence of interictal spikes (IIS) over the course of 1.2-6 months in the Tg2576 Alzheimer's disease model, combined with meaningful behavioral and histological measures. The authors were able to demonstrate MCS had protective effects in Tg2576 mice, which was particularly convincing in the hippocampal novel object location task.

      Weaknesses:

      Although choline deficiency was associated with impaired learning and elevated FosB expression, consistent with increased hyperexcitability, IIS was reduced with both low and high choline diets. Although not necessarily a weakness, it complicates the interpretation and requires further evaluation.

    1. eLife assessment

      In this fundamental work, the authors demonstrated that maternal choline supplementation improved spatial memory, reduced hyperexcitability, and restored NeuN expression in a familial Alzheimer's disease mouse model. Interestingly, choline deficiency increased mortality, while paradoxically reduced hyperexcitability. Through behavioral, electrophysiological, and histological measures, the authors present convincing evidence supporting the significant role of maternal choline supplementation in protecting hippocampal functions vulnerable to Alzheimer's disease.

    2. Joint Public Review:

      Chartampila et al. describe the effect of early-life choline supplementation on cognitive functions and epileptic activity in a mouse model of Alzheimer's disease. The cognitive abilities were assessed by the novel object recognition test and the novel object location test, performed in the same cohort of mice at 3 months and 6 months of age. Neuronal loss was tested using NeuN immunoreactivity, and neuronal hyperexcitability was examined using deltaFosB and video-EEG recordings, providing multi-level correlations between these different parameters.

      The study was designed as a 6-month follow-up, with repeated behavioral and EEG measurements through disease development and multilevel correlations providing valuable and interesting findings on AD progression and the effect of early-life choline supplementation. Moreover, the behavioral data that suggest an adverse effect of low choline in WT mice are interesting and important also beyond the context of AD, highlighting the dramatic effect of diet on the phenotypes of animal.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Weaknesses:

      The readability could be improved.

      We have gone through the paper again and tried to revise the text to improve readability.

      Reviewer #1 (Recommendations For The Authors):

      (1) Thank you for adding the discrimination ratio. However, as Fig 2 and 3 depict the same experimental data, consider harmonizing the presentation (symbols and colors) and consolidating the Figs for clarity.“

      This is an excellent point but it is actually very hard to harmonize symbols and colors because the data are divided in different ways. Upon considering this further, we actually don’t want to make the symbols and colors the same because it would be misleading. For example, WT and Tg training and testing session data are divided into grey and white throughout Figure 2, but in Figure 3, training and testing session data are pooled. To color code them grey and white in Figure 3 might make it seem that in Figure 3 training and testing were separated.

      (2) Fig 5 is missing

      We are not sure why Figure 5 was absent since it was present in our copy of the submitted pdf. We have double checked and in the revised manuscript we are sure Figure 5 is included.  

      (3) Fig 6 add raw data for WT

      We have added raw WT data. Revised figure 6 includes the raw data in part A4.

      (4) Fig 7 add raw data for WT

      We have added raw WT data. Revised Figure 7 includes the raw data in part A4.

    1. eLife assessment

      In this important work, a quantitative analysis method for three-dimensional morphogenetic processes during embryonic development is introduced. The proposed method is a pipeline combining several methods, allowing quantitative analysis of developmental processes without cell segmentation and tracking. Upon application of their method, the authors obtain convincing evidence that ascidian gastrulation is a two-step process. This work should be of interest to a broad range of developmental biologists who aim to obtain a quantitative understanding of morphogenesis.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors propose a new method to quantitatively assess morphogenetic processes during organismal development. They apply their method to ascidian morphogenesis and thus find that gastrulation is a two-step process.

      The method applies to morphogenetic changes of surfaces. It consists of the following steps: first, surface deformations are quantified based on microscopy images without requiring cellular segmentation and tracking. This is achieved by mapping, at each time point, a polygonal mesh initially defined on a sphere to the surface of the embryo. The mapped vertices of this polygonal mesh then serve as (Lagrangian) markers for the embryonic surface. From these, one can infer the deformation of the surface, which can be expressed in terms of the strain tensor at each point of the surface. Changes in the strain tensor give the strain rate, which captures the morphogenetic processes. Second, at each time point, the strain rate field is decomposed in terms of spherical harmonics. Finally, the evolution of the weights of the various spherical harmonics in the decomposition is analysed via wavelet analysis. The authors apply their workflow to ascidian development between 4 and 8.7 hpf. From their analysis, they find clear indications for gastrulation and neurulation and identify two sub-phases of gastrulation, namely, endoderm invagination and 'blastophore closure'.

      Strengths:

      The combination of various tools allows the authors to obtain a quantitative description of the developing embryo without the necessity of identifying fiducial markers. Visual inspection shows that their method works well. Furthermore, this quantification then allows for an unbiased identification of different morphogenetic phases.

      Weaknesses:

      At times, the explanation of the method is hard to follow, unless the reader is already familiar with concepts like level-set methods or wavelet transforms. Furthermore, the software for performing the determination of Lagrangian markers or the subsequent spectral analysis does not seem to be available to the readers.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors proposed a method to quantitatively analyze 3D live imaging data of early developing embryos, using ascidian development as an example. For this purpose, the previously proposed level set method was used to computationally track the temporal evolution of reference points introduced on the embryo surface. Then, from the obtained three-dimensional trajectories, the velocity field was obtained, from which the strain rate field was computed according to the idea of continuum mechanics. The information in the strain rate field was reduced to a scalar field, determined by taking the square root of the sum of the squares of the eigenvalues. The scalar field is then further decomposed into a spectrum using spherical harmonics. In this paper, the authors focused on the modes with lower order with real coefficients. The time evolution of these modes was analyzed using wavelet transforms. The authors claimed that the results reflected the developmental stages of ascidian embryos.

      Strengths:

      In this way, this manuscript proposes a pipeline of analyses combining various methods. The strength of this method lies in its ability to quantitatively analyze the deformation of the entire embryo without the requirement for cellular segmentation and tracking.

      Weaknesses:

      The limitations of the proposed analysis pipeline are not clearly indicated. Claims such as the identification of developmental stages need more quantitative validation. In addition, it is not clearly shown how the proposed method can distinguish between the superposition of individual cell behavior and the collective behavior of cells.

    1. eLife assessment

      This important study describes a neural circuit contributing to two behavioral processes affecting pathogen avoidance in the nematode C. elegans. The method used to identify specific contributing neurons is innovative and the experimental evidence supporting the major claims is solid. This study will be of interest to neuroscientists studying behavior, in particular in C. elegans.

    2. Reviewer #1 (Public Review):

      This study identifies two behavioral processes that underlie learned pathogen avoidance behavior in C. elegans: exiting and re-entry of pathogenic bacterial lawns. Long-term behavioral tracking indicates that animals increase the prevalence of both behaviors over long-term exposure to the pathogen Pseudomonas aeruginosa. Using an optogenetic silencing screen, the authors identify groups of neurons, whose activity regulates lawn occupancy. Surprisingly, they find that optogenetic inhibition of neurons during only the first two hours of pathogen exposure can establish subsequent long-term changes in pathogen aversion. By leveraging a compressed sensing approach, the authors define a set of neurons involved in either lawn exit or lawn re-entry behavior using a constrained set of transgenic lines that drive Arch-3 expression in overlapping groups of neurons. They then measure the calcium activity of the candidate neurons involved in lawn re-entry in freely moving animals using GCaMP, and observe a reduction in their neural activity after exposure to a pathogen. Optogenetic inhibition of AIY and SIA neurons during acute pathogen exposure in naïve animals delays lawn entry whereas activating these neurons in animals previously exposed to pathogen enhances lawn entry, albeit transiently.

      This work is missing several controls that are necessary to substantiate their claims. My most important concern is that the optogenetic screen for neurons that alter pathogenic lawn occupancy does not have an accompanying control on non-pathogenic OP50 bacteria. Hence, it remains unclear whether these neuronal inhibition experiments lead to pathogen-specific or generalized lawn-leaving alterations. For strains that show statistical differences between - and + ATR conditions, the authors should perform follow-up validation experiments on non-pathogenic OP50 lawns to ensure that the observed effect is PA14-specific. Similarly, neuronal inhibition experiments in Figures 5E and H are only performed with naïve animals on PA14 - we need to see the latency to re-entry on OP50 as well, to make general conclusions about these neurons' role in pathogen-specific avoidance.

      My second major concern is regarding the calcium imaging experiments of candidate neurons involved in lawn re-entry behavior. Although the data shows that AIY, AVK, and SIA/SIB neurons all show reduced activity following pathogen exposure, the authors do not relate these activity changes to changes in behavior. Given the well-established links between these cells and forward locomotion, it is essential to not only report differences in activity but also in the relationship between this activity and locomotory behavior. If animals are paused outside of the pathogen lawn, these neurons may show low activity simply because the animals are not moving forward. Other forward-modulated neurons may also show this pattern of reduced activity if the animals remain paused. Given that the authors have recorded neural activity before and after contact with pathogenic bacteria in freely moving animals, they should also provide an analysis of the relationship between proximity to the lawn and the activity of these neurons.

      This work is missing methodological descriptions that are necessary for the correct interpretation of the results shown here. Figure 2 suggests that the determination of statistical significance across the optogenetic inhibition screen will be found in the Methods, but this information is not to be found there. At various points in the text, authors refer to "exit rate", "rate constant", and "entry rate". These metrics seem derived from an averaged measurement across many individual animals in one lawn evacuation assay plate. However "latency to re-entry" is only defined on a per-animal basis in the lawn re-exposure assay. These differences should be clearly stated in the methods section to avoid confusion and to ensure that statistics are computed correctly.

      This work also contains mislabeled graphs and incorrect correspondence with the text, which make it difficult to follow the authors 'claims. The text suggests that Pdop-2::Arch3 and Pmpz-1::Arch3 show increased exit rates, whereas Figure 2 shows that Pflp-4::Arch3 but not Pmpz-1::Arch3 has increased exit rate. The authors should also make a greater effort to correctly and clearly label which type of behavioral experiment is used to generate each figure and describe the differences in experimental design in the main text, figure legends, and methods. Figure 2E depicts trajectories of animals leaving a lawn over a 2.5-minute interval but it is unclear when this time window occurs within the 18-hour lawn leaving assay. Likewise, Figure 2H depicts a 30-minute time window which has an unclear relationship to the overall time course of lawn leaving. This figure legend is also mislabeled as "Infected/Healthy", whereas it should be labeled "-/+ ATR".

      This work raises the interesting possibility that different sets of neurons control lawn exit and lawn re-entry behaviors following pathogen exposure. However, the authors never directly test this claim. To rigorously show this, the authors would need to show that lawn-exit-promoting neurons (CEPs, HSNs, RIAs, RIDs, SIAs) are dispensable for lawn re-entry behavior and that lawn re-entry promoting neurons (AVK, SIA, AIY, MI) are dispensable for lawn exit behavior in pathogen-exposed animals. The authors identify AVK neurons as important for modulating lawn re-entry behavior by brief inhibition at the start of pathogen exposure but fail to find that these neurons are required for increased latency to re-entry in naïve animals (Figure 5D). Recent work from Marquina-Solis et al (2024) shows that chronic silencing of these neurons delays pathogen lawn leaving, due to impaired release of flp-1 neuropeptide. Authors may wish to connect their work more closely with the existing literature by investigating the behavioral process by which AVK contributes to lawn evacuation.

      If the authors work through these criticisms, this work can become an important contribution to the field of pathogen learning in C. elegans. However, in its current form, this work remains incomplete.

    3. Reviewer #2 (Public Review):

      In this manuscript, Hallacy et al. used a compressed sensing-based optogenetic screening method to investigate the crucial neurons that regulate pathogenic avoidance behavior in C. elegans. They further substantiate their findings using complementary optogenetic activation and imaging techniques to confirm the roles of the key neurons identified through extensive screening efforts. Notably, they identified AIY and SIA as pivotal neurons in the dynamic process of pathogenic avoidance. Their significant discovery is the delayed or stalled reentry process, which drives avoidance behavior; to my knowledge, this dynamic has not been previously documented. Additionally, the successful integration of quantitative optogenetic tools and compressed sensing algorithms is noteworthy, demonstrating the potential for obtaining highly quantitative data from the C. elegans nervous system. This approach is quite rare in this field, yet it represents a promising direction for studying this simple nervous system.

      However, the paper's main weakness lies in its lack of a detailed mechanism explaining how the delayed reentry process directly influences the actual locomotor output that results in avoidance. The term 'delayed reentry' is used as a dynamic metric for quantifying the screening, yet the causal link between this metric and the mechanistic output remains unclear. Despite this, the study is well-structured, with comprehensive control experiments, and is very well constructed.

    4. Reviewer #3 (Public Review):

      Summary:

      Using a compressed sensing-based approach applied previously by the author's group, the authors conducted an initial screen for neurons that when optogenetically down-regulated, influenced learned pathogen avoidance consisting of two component behaviors, exit from the bacterial lawn and lawn re-entry. Authors found that 4 classes of neurons AVK, SIA, AIY, and MI were inferred over a wide range of sparsity parameters, thereby indicating the importance of lawn re-entry. They found six classes of neurons required for lawn exit. The authors then went on to further analyze the neurons for the re-entry behavior, and conducted calcium imaging of those neurons in the freely behaving animals. They found that the activities of AIY and SIA neurons decreased after the animals that had been exposed to the pathogenic bacteria tried to re-enter the bacterial lawn. They also found that when those neurons of the animals that had not been exposed to pathogenic bacteria were downregulated by optogenetics, those operated animals increased the latency of the re-entry, which is a similar behavioral modification to that of the animals that had been exposed to the pathogen. Conversely, those neurons of the animals that were exposed to pathogenic bacteria were up-regulated by optogenetics, those animals showed a shortened latency of the re-entry, which is similar to the behavior observed in the animals not exposed to pathogen.

      Strengths:

      This is overall a very nice piece of work. Most importantly, an initial screening of neurons was conducted by a compressed sensing-based approach previously applied by the same group. It is also worth emphasizing that this compressed analysis is applicable when the behavior of interest involves a small number of neurons, as the authors pointed out in the Introduction Session. Therefore, the readers should keep in mind that the validation and significance of this work heavily depend on the justification of scarcity parameters that the authors chose. Nevertheless, this work is well justified because neurons identified by the initial screening were thoroughly analyzed by various methods including calcium imaging and optogenetic manipulation of neuronal activities and behavioral analyses using an animal-tracking system.

      Weaknesses:

      My only concern is that the authors should be more careful about describing their "compressed sensing-based approach". Authors often cite their previous Nature Methods paper, but should explain more because this method is critical for this manuscript. Also, this analysis is based on the hypothesis that only a small number of neurons are responsible for a given behavior. Authors should explain more about how to determine scarcity parameters, for example.

    1. eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

    2. Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      - Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      - Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      - MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      - Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      - Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      - Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      - P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      - Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      - The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

    4. Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      (2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      - The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      (3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      (4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation can not simply be translated to the aperiodic signal (slope).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      (5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      (6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges? Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      (7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      (8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx andGABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

    5. Author response:

      eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are only very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies which relates MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 1.5 years.

      We took care of the validity of our results with two measures; first, assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 38 individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential  neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022).

      In the present study, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022)) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we will add the recently published MRS quality assessment form to the supplementary materials. Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we will more clearly indicate that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject.

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences.  

      In the revised manuscript, we will discuss the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable.

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we will clearly indicate that the exploratory correlation analyses are reported to put forth hypotheses for future studies.

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to  be predominantly driven by an effect of chronological age.

      In the revised manuscript, we will add the linear regressions with age as a covariate included below, for the relationship between aperiodic intercept and Glx concentration in the CC group. 

      a. A linear regression was conducted within the CC group to predict the intercept during visual stimulation, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.82_, t_(2,7)=16.1_, 𝑝=0.0024._ Note that the coefficient for age was not significant, 𝛽=0.007, t(7)=0.82, 𝑝=0.439. The regression coefficients and their respective statistics are presented in Author response table 1.

      Author response table 1.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Visual Stimulation) in the CC group

      b. A linear regression was conducted to predict the intercept during eye opening at rest, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.842_, t_(2,7)=18.6,  𝑝=0.00159_._ Note that the coefficient for age was not significant, 𝛽=−0.005, t(7)=−0.90, 𝑝=0.400. The regression coefficients and their respective statistics are presented in Author response table 2.

      Author response table 2.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Eyes Open) in the CC group

      c. Given that the Glx coefficient is significant in both models and age does not significantly predict either outcome, it can be concluded that Glx independently predicts the intercept of the aperiodic intercept.

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we will improve the phrasing. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we will add this analysis to the supplementary material.

      Author response image 1.

      Aperiodic intercept (top) and slope (bottom) for congenital cataract-reversal (CC, red) and age-matched normally sighted control (SC, blue) individuals. Distributions of these parameters are displayed as violin plots for three conditions; at rest with eyes closed (EC), at rest with eyes open (EO) and during visual stimulation (LU). Aperiodic parameters were calculated across electrodes Fp1 and Fp2. Solid black lines indicate mean values, dotted black lines indicate median values. Coloured lines connect values of individual participants across conditions.

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration. However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature.

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz).  

      As in the discussion section and response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the  Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we will more clearly indicate in the discussion that these are possible post-hoc interpretations. We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such. The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we will detail the advantages and disadvantages of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. It is worth noting that our EEG results fully align  with those of a larger sample of CC individuals (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures, as predicted.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      We marked the correlation analyses as exploratory; note that we do not base most of our discussion on the results of these analyses. As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot.  Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights.  

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      In the revised manuscript, we will add permutation analyses for our outcomes, as well as multiple regression models investigating whether the variance in visual history might have driven these results. Note that in the supplementary materials (S6, S7), we have reported the correlations between visual history metrics and MRS/EEG outcomes.

      The alpha level used for the ANOVA models specified in the methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age and sex. Moreover, the controls were recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition, the ANOVA conducted on the EEG metrics was 2x3 because it had group (CC, SC) and condition (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the methods and figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and will upload the consistency report with the revised supplementary material.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we will cite those studies not already included in the introduction.

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/ sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects. There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. We will clarify our interpretation in the revised manuscript. Note that Ossandon et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range. We will allude to these findings in the revised manuscript.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in addition to monkey ECoG (Medel et al., 2020) (now published as (Medel et al., 2023)) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG. We will make more clear in the introduction of the revised manuscript that this metric is indirect.

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged . We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two channels, O1 and O2, neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023).

      In both published works, we did not consider frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations. The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used the cleanline.m function to remove line noise before filtering, and the group differences remained stable. We will report this analysis in the supplementary version of the revised manuscript. Further, both groups were measured in the same lab, making line noise as an account for the observed group effects highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition is below. Mean percentage of 6.25 long segments rejected in each group for the visual stimulation condition are also included, and will be added to the revised manuscript:

      Author response table 3.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This will be explicitly stated in the revised manuscript.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values.  Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023); The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group. We will add the fit quality metrics and show individual subjects’ fits in the revised manuscript.

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey, which was released in 2020 and uses linear combination modeling to fit the peak as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited spectral analysis toolbox at the time, and still is widely used. In the revised manuscript, we will include a supplementary section reanalyzing the main findings with Osprey.

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript we will remove the statement regarding stability, and add the Blant-Altman plot.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we will rephrase this sentence accordingly. 

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We indicate clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the discussion section as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revision, we will check that speculations are clearly marked and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript we will add “might imply” to better indicate the hypothetical character of this idea.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we will add a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandon et al. We will adapt the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandon et al. (2023).

      References

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Medel, V., Irani, M., Ossandón, T., & Boncompte, G. (2020). Complexity and 1/f slope jointly reflect cortical states across different E/I balances. bioRxiv, 2020.09.15.298497. https://doi.org/10.1101/2020.09.15.298497

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

    1. eLife assessment

      This study by Cuaya et al. reveals and characterizes two distinct forms of spike timing-dependent long-term depression (t-LTD) at the synapses between excitatory afferents from lateral (LPP) and medial (MPP) perforant pathways to granule cells (GC) of the dentate gyrus (DG) in mice. The findings are valuable for the field of synaptic physiology and are based on solid electrophysiological data. The study extends current knowledge by elucidating additional plasticity mechanisms at PP-GC synapses, complementing existing literature.

    2. Reviewer #1 (Public Review):

      Summary:

      The study characterized the cellular and molecular mechanisms of spike timing-dependent long-term depression (t-LTD) at the synapses between excitatory afferents from lateral (LPP) and medial (MPP) perforant pathways to granule cells (GC) of the dentate gyrus (DG) in mice.

      Strengths:

      The electrophysiological experiments are thorough. The experiments are systematically reported and support the conclusions drawn.<br /> This study extends current knowledge by elucidating additional plasticity mechanisms at PP-GC synapses, complementing existing literature.

      Weaknesses:

      To more conclusively define the pivotal role of astrocytes in modulating t-LTD at MPP and LPP GC synapses through SNARE protein-dependent glutamate release, as posited in this study, the authors could adopt additional methods, such as alternative mouse models designed to regulate SNARE-dependent exocytosis, as well as optogenetic or chemogenetic strategies for precise astrocyte manipulation during t-LTD induction. This would provide more direct evidence of the influence of astrocytic activity on synaptic plasticity.

    3. Reviewer #2 (Public Review):

      Summary:

      This work reports the existence of spike timing-dependent long-term depression (t-LTD) of excitatory synaptic strength at two synapses of the dentate gyrus granule cell, which are differently connected to the entorhinal cortex via either the lateral or medial perforant pathways (LPP or MPP, respectively). Using patch-clamp electrophysiological recording of tLTD in combination with either pharmacology or a genetically modified mouse model, they provide information on the differences in the molecular mechanism underlying this t-LTD at the two synapses.

      Strengths:

      The two synapses analyzed in this study have been understudied. This new data thus provides interesting new information on a plasticity process at these synapses, and the authors demonstrate subtle differences in the underlying molecular mechanisms at play. Experiments are in general well controlled and provide robust data that are properly interpreted.

      Weaknesses:

      - Caution should be taken in the interpretation of the results to extrapolate to adult brain as the data were obtained in P13-21 days old mice, a period during which synapses are still maturing and highly plastic.<br /> - In experiments where the drug FK506 or thapsigargin are loaded intracellularly, the concentrations used are as high as for extracellular application. Could there be an error of interpretation when stating that the targeted actors are necessarily in the post-synaptic neuron? Is it not possible for the drug to diffuse out of the cell as it is evident that it can enter the cell when applied extracellularly?<br /> - The experiments implicating glutamate release from astrocytes in t-LTD would require additional controls to better support the conclusions made by the authors. As the data stand, it is not clear how the authors identified astrocytes to load BAPTA and if dnSNARE expression in astrocytes does not indirectly perturb glutamate release in neurons.

      Significance:

      While this is the first report of t-LTD at these synapses, this plasticity process has been mechanistically well investigated at other synapses in the hippocampus and in the cortex. Nevertheless, this new data suggests that mechanistic differences in the induction of t-LTD at these two DG synapses could contribute to the differences in the physiological influence of the LPP and MPP pathways.

    4. Reviewer #3 (Public Review):

      Coatl et al. investigated the mechanisms of synaptic plasticity of two important hippocampal synapses, the excitatory afferents from lateral and medial perforant pathways (LPP and MPP, respectively) of the entorhinal cortex (EC) connecting to granule cells of the hippocampal dentate gyrus (DG). They find that these two different EC-DG synaptic connections in mice show a presynaptically expressed form of long-term depression (LTD) requiring postsynaptic calcium, eCB synthesis, CB1R activation, astrocyte activity, and metabotropic glutamate receptor activation. Interestingly, LTD at MPP-GC synapses requires ionotropic NMDAR activation whereas LTD at LPP-GC synapse is NMDAR independent. Thus, they discovered two novel forms of t-LTD that require astrocytes at EC-GC synapses. Although plasticity of EC-DG granule cell (GC) synapses has been studied using classical protocols, These are the first analysis of the synaptic plasticity induced by spike timing dependent protocols at these synapses. Interestingly, the data also indicate that t-LTD at each type of synapse require different group I mGluRs, with LPP-GC synapses dependent on mGluR5 and MPP-GC t-LTD requiring mGluR1.

      The authors performed a detailed analysis of the coefficient of variation of the EPSP slopes, miniature responses and different approaches (failure rate, PPRs, CV, and mEPSP frequency and amplitude analysis) they demonstrate a decrease in the probability of neurotransmitter release and a presynaptic locus for these two forms of LTD at both types of synapses. By using elegant electrophysiological experiments and taking advantage of the conditional dominant-negative (dn) SNARE mice in which doxycycline administration blocks exocytosis and impairs vesicle release by astrocytes, they demonstrate that both LTD forms require the release of gliotransmitters from astrocytes. These data add in an interesting way to the ongoing discussion on whether LTD induced by STDP participates in refining synapses potentially weakening excitatory synapses under the control of different astrocytic networks. The conclusions of this paper are mostly well supported by data, but some aspects the results must be clarified and extended.

      (1) It should be clarified whether present results are obtained with or without the functional inhibitory synapse activation. It is not clear if GABAergic synapses are blocked or not. If GABAergic synapses are not blocked authors must discuss whether the LTD of the EPSPs is due to a decrease in glutamatergic receptor activation or an increase in GABAergic receptor activation. Moreover, it should be recommended to analyze not only the EPSPs but also the EPSCs to address whether the decrease in synaptic transmission is caused by a decrease in the input resistance or by a decrease in the space constant (lambda).<br /> (2) Authors show that Thapsigargin loaded in the postsynaptic neuron prevents the induction of LTD at both synapses. Analyzing the effects of blocking postsynaptic IP3Rs (Heparin in the patch pipette) and Ryanodine receptors (Ruthenium red in the patch pipette) is recommended for a deeper analysis of the mechanism implicated in the induction of this novel forms of LTD in the hippocampus.<br /> (3) Authors nicely demonstrate that CB1R activation is required in these forms of LTD by blocking CB1Rs with AM251, however an interesting unanswered question is whether CB1R activation is sufficient to induce this synaptic plasticity. This reviewer suggests studying whether applying puffs of the CB1R agonist, WIN 55,212-2, could induce these forms of LTD.<br /> (4) Finally, adding a last figure with a cartoon summarizing the proposed model of action in these novel forms of LTD would add a positive value and would help the reading of the manuscript, especially in those aspects related with the discussion of the results.

      The extension of these results would improve the manuscript which provides interesting results showing two novel forms of presynaptic t-LTD in the brain synapses with different action mechanisms probably implicated in the different aspects of information processing.

  2. www.researchsquare.com www.researchsquare.com
    1. eLife assessment

      In this important study, the authors use a genetically engineered mouse model to reveal a tumor suppressive role for focal adhesion kinase in right-sided colon cancer. The evidence in support of the authors' claims is generally solid, although the data supporting the mechanism through which FAK deletion promotes tumorigenesis are incomplete. This work will be of interest to cancer researchers and others studying the biological consequences of tuning signal transduction pathways.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors provide solid evidence with a mouse model as well as supporting in vitro and analysis of clinical samples that loss of Fak increases the development of BRAF V600E-induced dysplastic lesions and carcinomas in the cecum via downregulation of Egfr-mediated Erk phosphorylation. This fine-tuning of Erk phosphorylation increases the expression of Lrg4 mRNA expression and promotes Lrg4 stability through downregulation of the E3 ubiquitin ligase Nedd4. The high Lrg4 expression correlates with an increased intestinal stem cell transcriptional signature that the authors suggest drives higher rates of transformation. This provides important insight that factors such as FAK may be able to modulate MAPK-driven tumorigenesis in specific circumstances. The data presented here are largely specific to the cecum. While these specific findings may ultimately have practical implications for human CRC outside the cecum and even therapeutic implications, these remain unexplored and will be a point for future investigations.

      Strengths:

      The authors use a mouse model (intestinal specific BRAF V600E +/- Fak knockout) as well as supporting in vitro analyses and clinical sample characterization to support their model. For both in vitro and in vivo studies, the authors use a combination of genetic and pharmacologic (including EGFR, FAK, and MEK inhibitors) tools to modulate the MAPK pathway. They also use a combination of transcriptional (RNA-Seq) and protein (IHC and Western blotting) readouts to support their proposed model. Importantly, they use a distinct mouse model (mutant Kras) to demonstrate their findings with Fak loss are specific to instances where EGFR can modulate ERK activation, providing strong evidence for their model. Finally, they also correlate their findings in the murine model with patient samples and with trends in the TCGA database. Collectively, these create a solid and convincing basis for their proposed model.

      Weaknesses:

      (1) The murine data is largely confined to the cecum. While the analysis of the cecum is appropriate based on the cecum specificity of their phenotype, they often use these findings to make broader generalizations about the nature of tumorigenesis in the intestinal epithelia and in CRC more generally. In my opinion, there was insufficient evidence presented supporting the extension of the proposed model beyond the cecum. While this is a weakness, it could be part of a growing effort to characterize left and right-sided malignancies as related but separate disease processes.

      (2) The authors generally do a good job of focusing their analysis on the cecum and supporting their model. For example, Figure 5A examines different colon compartments, including the cecum. However, the authors fail to demonstrate that Fak loss only promotes Lrg4 upregulation in the cecum, where they observe an increase in BRAF V600E dysplasia and carcinoma. This is again seen in Figure 6A, where they only characterize Nedd4 expression in the cecum and not other compartments of the colon.

      (3) The authors evaluate a broad range of tissues, including normal colonic mucosa, polyps, pre-cancerous dysplastic lesions, adenocarcinomas, and adenocarcinoma cell lines. While this breadth is a strength of the paper, the authors, at times, equate experimental observations in each of these conditions, despite the difference in the biology of these tissues/cells. For example, in their mouse model, they equate the development of dysplastic lesions and carcinoma lesions. This makes it difficult to accurately interpret their data and conclusions.

      (4) In Figure 5i, this experiment was only completed in one cell line (HT29), despite the conclusion that Lrg4 expression is increased by decreased ERK phosphorylation due to protein stabilization. HT29 cells are a transformed human CRC cell line, quite different than a pre-malignant cecum intestinal epithelial cell. While convincing, the authors could have performed this key experiment in non-transformed murine cecal organoids (as they did for other experiments in Figure 5E), which would better recapitulate the mouse and pre-malignant setting to explain their mouse phenotype.

      (5) While a large portion of the discussion focusses on the therapeutic implications of these findings, the authors only really investigate tumorigenesis. They likely have additional investigations planned for future manuscripts.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Gao et al. described a study identifying the role of FAK in fine-tuning the activation levels of ERK signaling in BRAF-V600E-driven colorectal cancer. The authors generated new mouse models combining Vill-Cre mediated BRAF-V600E expression with FAK deletion. Analyses of intestinal tumor phenotypes revealed that FAK-loss promotes BRAF-V600E-induced tumor formation, specifically in the cecum. Interestingly, these tumors closely resemble human sessile serrated adenoma/polyps. Using bioinformatics analysis, the authors found that FAK deletion upregulates the intestinal stem cell and fetal-type transcriptomic signatures compared to mice expressing BRAF-V600E alone. In addition, FAK-loss decreases the phosphorylation of ERK whereas it increases the expression of Lgr4 at both mRNA and protein levels. To mechanistically connect FAK-mediated downregulation of ERK and upregulation of Lgr4 in the context of BRAF-V600E mutation, results from biochemical experiments showed that MEK inhibitor treatment decreases the expression of NEDD4, a previously identified ubiquitin E3 ligase of Lgr4, which coincides with increased Lgr4 protein expression both in cells and in vivo. Moreover, the FAK-dependent modulation of ERK signaling is specific to BRAF-V600E-driven tumorigenesis only as knockout of FAK has no effect in Vill-Cre/KRAS-G12D mice. Collectively, the authors proposed a "just right" model in that a tunable FAK expression controls the optimal level of ERK pathway output needed for BRAF-V600E-induced cecal tumor formation.

      Strengths:

      This study provides new insights into the mechanisms underlying the serrated pathway-driven tumorigenesis in colorectal cancer. The newly established mouse model with compound mutations of BRAF and FAK offers a useful resource for future studies of the serrated pathway. The conclusions of this paper are mostly supported by data.

      Weaknesses:

      However, some aspects of the paper can be strengthened with additional mechanistically focused experiments.

      (1) Some of the conclusions of the paper mainly rely on bioinformatic analyses of RNA-seq data. For example, it has been noted in several places in the paper that the knockout of FAK in Vill-Cre/BRAF-V600E mice does not affect the transcriptional outcome downstream of ERK while ERK phosphorylation levels are decreased. This statement is based on the lack of significant difference in the MAPK signature according to GSEA. However, whereas a significant enrichment of certain pathways can be used as support evidence, the lack of enrichment does not necessarily indicate those pathways are not involved. Other experiments are needed to examine the expression of ERK target genes to confirm. Similarly, the upregulation of fetal stem cell signature in FAK knockout mice needs to be verified using other methods besides GSEA.

      (2) According to Figure 5i, the half-life of Lgr4 is around 48 hours in HT29 cells. However, it has been reported by at least two other publications cited in this paper (Ref. 44 and 45) that the half-life of Lgr4 is much shorter. This discrepancy is not explained.

      (3) The effect of decreased ERK signaling on NEDD4 expression has only been briefly explored in Figure 6. The mechanisms by which FAK-loss and/or inhibition of MEK/ERK activity regulate NEDD4 expression are currently unclear. Moreover, the levels of NEDD4 expression are only analyzed in one mouse per group in Figure 6a. Quantitative analysis of NEDD4 as well as Lgr4 expression in additional numbers of mice will provide more solid support for the inverse correlation between NEDD4 and Lgr4 proteins. Since MEK inhibitor treatment also increases Lgr4 mRNA expression as shown in Figure 5f-g, the relative contribution of this altered mRNA expression vs. NEDD4L-mediated ubiquitination has not been investigated.

      (4) It is an interesting finding that knockout FAK has no effect on KRAS-G12D-driven hyperplasia as shown in Figure 7. However, additional studies are needed to further explore the potential mechanisms by which FAK-loss specifically decreases EGFR/ERK signaling in the context of BRAF-V600E mutation.

    4. Reviewer #3 (Public Review):

      Summary:

      Right-sided colorectal Cancer (CRC) is very different from left-sided CRC. Therefore it is important to model this cancer in mice and find new molecular targets. A broad set of data exists on FAK (Focal Adhesion Kinase) being important in colorectal cancer. However, this has focussed on APC mutant CRC which tends to be left-sided. BRAF mutation is common in right-sided CRC (and is rarely mutated with APC). Therefore the authors have tested whether FAK is important in this context. The authors show that FAK deletion surprisingly accelerates BRAF mutant CRC. Tumours arise in the proximal colon (which recapitulates BRAF mutant right-sided CRC). There are low for Lgr5 and high for foetal programmes. Mechanistically they suggest a pathway from FAK to NEDD4 to Lgr4 may underpin this phenotype.

      Strengths:

      Strong genetic data from FAK revealed that there is an acceleration of tumourigenesis and mice now develop proximal colon tumours and can be viewed as a good model of right-sided CRC.<br /> The expression data between humans and mice is strong.

      Weaknesses:

      The functional mechanism of how FAK loss promotes tumourigenesis is still quite correlative. An alternative hypothesis is that it drives inflammation in the proximal colon that drives tumourigenesis.

      We still did not know the functional role for LGR4 (loss leads to a loss of paneth cells in homeostasis) so I'm not sure you can hypothesise a stem cell role.

    5. Author response:

      We thank the editor and reviewers for the time invested in our manuscript and their valuable and insightful critiques. However, we believe that the results justified our conclusions in the manuscript well; therefore, we have decided not to revise it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I have one major concern regarding this draft of the manuscript:

      (1) In the manuscript (lines 130-31) it is stated that "About 55% (8/15) of mice with unilateral AAV-hM3Dq centered in the PMv showed an increase in LH release above 0.5ng/ml within 10-20 min following the CNO injection" However, data at time zero are not shown for 4 of the 8 "LH peak" animals. The missing data at time zero seems problematic for the analysis of the CNO-stimulated cohort. As mentioned in the manuscript, the area under the curve was calculated between the range of -10 to 20min post-injection. Because diestrus animals have spontaneous LH pulses, it is highly possible that an LH pulse is initiated in the10 minutes prior to drug delivery, as seen in the AAV-mCherry group in 1D, and similarly in 2C. Given the current form of analysis, it seems possible that a spontaneous LH pulse initiated anywhere up to 10 minutes prior to drug delivery could conceivably count as an experimentally induced "LH peak". Can you address this concern?

      We understand the reviewer’s concern about the spontaneous LH pulses. This is the reason we have been very strict on our analysis and have taken multiple approaches to analyze these data. In our hM3Dq group 55% of the animals responded to CNO with an increase in LH, while 0 responded in the negative control group. But also, in the clozapine group, where no time 0 points were missing, 100% of the animals with hM3Dq showed an LH increase after the injection while only 28% (2/7) showed the increase in the negative control group. Rigorously, the DREADDs approach doubled the chances of LH increase. Note that the spontaneous LH peaks observed in negative controls or during baseline show a very sharp increase and decrease at the next time point, while the 4 “PMv hits” without time 0 and increase in LH in the CNO-hM3Dq group showed a sustained rise after the 10 min or prolonged high LH levels (above 1ng/ml) even 30 min after the injection. But, ultimately, the cFOS levels in the PMv of CNO-hM3Dq group with increase in LH are significantly higher than in any other group and the number of cFOS neurons are highly correlated to LH levels. Another important aspect that should not be dismissed is that in this experimental design, we used unilateral injection in animals that are in a fed state, therefore the leptin role in rising LH levels is probably dampened.

      We have added a statement to clarify this issue.

      The following are minor concerns:

      a) Figure 4 a-d, it is clear that Vglut2 is absent in the VMH, but it seems more relevant to show this expression pattern in the PMv.

      We chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the conditional Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparse. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      b) Methods section, targeting PMv: please check the injection coordinate: "dura-mater [dorsoventral -0.54]"

      Thank you for noticing this mistake, all coordinates for the injection have now been corrected (-5.4 mm, ±0.5 and -5.4mm)

      Reviewer #2 (Recommendations For The Authors):

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      My specific comments are as follows. Please note that none of them require additional experiments and that they can be answered by amending the text.

      (1) Please provide information on the serotypes and promoters of the AAVs used in the study to enhance reproducibility.

      Thank you, serotypes and promoters have been added for all AAVs.

      (2) Please reformulate lines 220-221. Indeed, this reviewer does not agree with the fact that balanopreputial separation (BPS) is a sign of puberty completion. BPS is merely a sign of the advancement of sexual maturation, akin to vaginal opening in females. In certain mouse strains, BPS coincides with mini puberty rather than puberty. The definitive sign of puberty completion involves the presence of spermatozoa in the vas deferens (equivalent to the first ovulation/first estrus in females).

      Thank you for this remark, this statement has now been modified.

      (3) The authors convincingly show that the potential contamination of the arcuate nucleus of the hypothalamus (ARH) with the AAV injections targeted to the PMv should not account for the DREADD-mediated activation of LH release. However, do the authors believe that DREADD activation of LepRb-expressing PMv neurons, inducing cFOS expression in these neurons, could also activate ARH kisspeptin neurons (which do not express LepRb) via transsynaptic action? Alternatively, do they posit direct activation of GnRH cell bodies in the preoptic region or GnRH axon/dendrites in the ARH/median eminence region?

      Thank you for this comment. We don’t have enough evidence from this DREADDs experiment to make a strong prediction on the downstream pathways. However, as discussed, from the DREADDs khrGFP females, we observed very few kisspeptin cells expressing cFOS, reducing the evidence for a PMv to ARH kisspeptin action in this case. With the evidence from our LepR-Cre;Vglut2flox animals that showed no alterations in kiss1 gene expression but a strong decrease in GnRH release, we hypothesize that this acute activation of LH is mediated by direct inputs from PMv to GnRH neurons, while acknowledging the possible existence of alternative pathways. These arguments have been added to the discussion. 

      (4) This reviewer finds it intriguing that glutamatergic signaling is required for LepRb re-expression in the PMv to restore fertility. Given that the authors and others have shown that PMv neurons heavily express NOS1, the activity of which is known to heavily rely on glutamatergic NMDAR activation, the authors may want to contextualize their results in light of the recent study showing that NOS1 is found to be a new causative gene in people with congenital hypogonadotropic hypogonadism.

      Thank you for the advice, we have added a paragraph discussing the possible involvement of nNos from PMv neurons in the discussion.

      (5) Does the absence of vGlut2 have any impact on the obesity phenotype in mice where LepRb is selectively re-expressed in the PMv?

      We have followed the weight of these animals after the AAV injections. However, due to the difficulty of generating dual homozygous (LepRnull homozygous are infertile) and producing adequate stereotaxic injections with minimum contamination of adjacent nuclei, the groups could not be run all together and thus, we refrained from performing comparative analysis of energy balance. Analysis of body weight in LepRnull mice with reactivation of LepR in PMv neurons have been published before (Donato et al., 2011 using the Flp/Frt model and Mahany et al., 2018 using the Cre/loxP system). No difference in body weight was observed in both studies. Below is the progression of body weight in mice with reactivation of LepR and deletion of Vglut2 in PMv neurons. We added a comment on this regard.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice. Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities, or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing, and support the conclusion. This reviewer has the following suggestions for the authors to further improve this work and the manuscript.

      (1) The DREADD study had some issues. For example, "2 out of 7 control mice with no AAV showed an increase in LH...", indicating that LH increase may just happen randomly. More importantly, 45% of PMv-hit mice did not show LH response to CNO, making it hard to interpret the positive LH responses from the other 55% PMv-hit mice undergoing the same treatment. Overall, there are just too many variabilities in these DREADD data for anyone to come up with a clean and convincing conclusion. This reviewer suggests repeating these experiments or removing the DREADD data altogether. After all, the rest of the results are much more convincing and stand alone to support the role of glutamate release from these PMv LepR neurons.

      We appreciate the reviewer’s concern. Indeed, LH shows spontaneous pulsatility which is one of the biggest challenges in our field. We have answered this concern for Reviewer 1 above and modified the text accordingly. We decided to keep the data in the publication because we believe that this is very important evidence supporting our observations since this is the only experiment that approaches the role of the PMv in a free-moving, ad libitum fed mouse model that is not deficient for leptin signaling or glutamatergic neurotransmission. Altogether this paper strongly supports a role for glutamate signaling on leptin’s action in reproductive function. Evidence for this role were dismissive or contentious until now.

      (2) The mCherry signals in Figure 3 are of low quality and do not look like cell bodies.

      We have now equally increased the contrast and brightness in all higher magnification images of mCherry neurons (Fig 3F, G, I and J) to improve their visibility. The lower magnification images are high quality images of areas with high density of mCherry positive neurons. Thick section (30µm) at low magnification compromises the focus at different Z-axis levels. We feel that images 3E and 3H are important to define the location of cells in the arcuate nucleus. Colocalization and mCherry expression are clear in high magnification images.

      (3) The validation of Vglut2 deletion in LepR neurons (Fig. 4A-D) is very nice and convincing, but the images are from the VMH region. Why not show the PMv region?

      As mentioned to Reviewer 1, we chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparce. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      (4) Figures 4-5 used LepR-Cre as controls, while Figure 6 used Vglut2flox as controls. Why? Also, how did the authors set up the breedings to generate "littermates" in each of these studies?

      We used the LepR-Cre as controls for our experiments since we need Cre homozygous for proper Cre expression and we had the LepR-Cre homozygous colony from the DREADDs experiment. Also, these mice had previously been thoroughly evaluated and no metabolic and/or reproductive disruption were noticed (please, see lines 213-214 of the original submission). However, our LepR-Cre colony had to be drastically reduced during COVID and suffered from unexpected Δ recombination leading to loss of Vglut2 homozygotes. To overcome these issues, we used VGlut2-floxed controls for the gene expression and GnRH immunoreactivity experiments. These mice had previously been used as controls for metabolic experiments with the LepCre-Vglut2fl genotype (Xu et al., 2013 Mol Metab), showing no deficiencies in the metabolic phenotype.

      As described in the methods section (lines 464-466 of the original preprint), to inactivate glutamate in leptin responsive cells, LepRb-Cre mice were crossed with mice carrying loxP-modified Vglut2 alleles. Our experimental mice were homozygous for the LepRb-Cre allele (LepRb_cre/cre_) and homozygous for the Vglut2-loxP allele (Vglut2_fl/fl_). Our controls consisted of mice homozygous for the Cre allele (LepRb_cre/cre_;Vglut2_+/+, named LepRb-Cre) or homozygous for the Vglut2-loxP allele (LepRb+/+;Vglut2_fl/fl, named Vglut2_flox_). Both experimental (LepRb_cre/cre_;Vglut2_fl/fl_, named LepRbΔVglut2) and control mice were derived from the same litters with parents homozygous for one of the genes and heterozygous for the other gene (LepRb_cre/cre_;Vglut2_fl/+or LepRb_cre/+;Vglut2_fl/fl_). Mice were genotyped at weaning (21 days) and again at the end of the experiments.

      (5) The labeling of Figures 5E-F is missing, making it hard to read.

      We have confirmed that Figure 5E and F were mentioned in the figure legends and in the results text. To improve the analysis of the figure we have added the Y axis titles to Figure 5 C,D, E and F, previously only shown in Fig 5A and B.

      (6) The last experiment was very nice confirming the role of glutamate release from PMv LepR neurons. However, the key phenotypes (puberty development, pregnancy) were not graphed and only stated in the text.

      Thank you for your comment. Since the key result is that none the LeprLoxTb;Vglut2flox animals showed vaginal opening or pregnancy, we don’t feel the need to graph this. All the details of the reproductive and metabolic phenotyping of the Lepr-loxTB with re-expression of LepR in the PMV were described in Mahany et al., 2018.

    2. Reviewer #1 (Public Review):

      Summary:

      In previous work the Elias group has shown that leptin sensing PMv neurons make connections with the neuroendocrine reproductive axis and are involved in reproductive function/s. Sáenz de Miera et al. build on this body of work to investigate the sufficiency of leptin sensing PMv neurons to evoke the release of luteinizing hormone. The team further investigates how glutamate signaling from leptin-sensing neurons can influence pubertal timing in females, along with mature estrous cycles. Genetic ablation of Slc17a6 (Vglut2) from LepRb-expressing cells resulted in a delay of the first estrus cycle post pubertal transition, along with a significantly lengthened estrous cycle in mature females. However, this deficit did not lengthen the latency to birth of the first litter in experimental dams. Restoration of leptin signaling in LepRb PMv neurons that was previously shown to induce puberty and instate reproductive function in LepRb knock-out female mice (Mahany et al., 2018). Here, Sáenz de Miera et al. use a combined genetic and viral strategy to demonstrate that glutamate signaling in LepRb PMv neurons is required for sexual maturation in LepRb knock-out female mice.

      Strengths:

      Most of the experiments performed in this manuscript are well justified and rigorously tested. The genetic method to simultaneously remove glutamate signaling and restore the leptin receptor in LepRb PMv neurons was well executed and showed that glutamate signaling in LepRb PMv neurons is necessary for leptin-dependent fertility.

      Weaknesses:

      Analysis of experimentally induced luteinizing hormone release could be confounded by spontaneous pulses of luteinizing hormone that are independent of LepRb PMv neurons.

    3. Reviewer #2 (Public Review):

      Summary:

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      Strengths:

      The authors employ state-of-the-art methods and their conclusions are robustly supported by the results.

      Weaknesses:

      None identified. Only minor comments have been formulated.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice.

      Strengths:

      Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing and support the conclusion.

      Comments on revised version:

      The authors have addressed most of my comments.

    1. eLife assessment

      The findings of this study are valuable as they challenge the dogma regarding the link between lowered bacterial metabolism and tolerance to aminoglycosides. The authors propose that the well-known tolerance to AG of mutants such as those of complexes I and II is not due to a decrease in the proton motive force and thus antibiotic uptake. The results presented here are convincing.

    2. Reviewer #2 (Public Review):

      Summary:

      This interesting study challenges the dogma regarding the link between bacterial metabolism decrease and tolerance to aminoglycosides (AG). The authors demonstrate that mutants well-known for being tolerant to AG, such as those of complexes I and II, are not so due to a decrease in the proton motive force (PMF) and thus antibiotic uptake, as previously reported in the literature.

      Strengths:

      This is a complete study that employs several read-outs.

      In this revised version, the authors have carefully addressed all the reviewers' comments. I appreciate the effort made in this new version to clarify that this study does not refute the PMF-dependent mechanism of aminoglycoside uptake (in the discussion_ lines 731-734_).

      The addition of the requested experiments using lower concentrations of aminoglycosides is a considerable improvement as it allows for comparison with previously published results.

    1. eLife assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis in a murine model. They provide solid evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The overall presentation of the manuscript and logical flow requires improvement and the work will be of interest to infectious disease researchers.

    2. Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Strengths:

      An extensive study on probiotic property of the Bacillus velezensis strain HBXN2020

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have potential benefit to serve as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).<br /> (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.<br /> (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.<br /> (4) Next, the authors tested the ability to HBXN2020 to inhibit growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.<br /> (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.<br /> (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.<br /> (2) Most observations are supported using multiple approaches.<br /> (3) Mouse experiments are very convincing.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there no investigation of the mechanism that underpins this.<br /> (2) Mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores that current gold standard for treatment.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Weaknesses:

      Few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis to prevent colitis in a mouse model. They provide solid evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The work will be of interest to infectious disease researchers and those studying the microbiome.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      An extensive study on the probiotic properties of the Bacillus velezensis strain HBXN2020.

      Response: Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      - The main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Response: Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback.   

      - Most of the results and analysis parts are separated without a link or any story-telling to deliver a concise message.

      Response: Thank you for your comments and suggestions on our manuscript. The comments improve the quality and depth of manuscript. Based on your suggestions, we have revised modifications to the entire manuscript.

      The updated contents were presented in the revised manuscript.

      - For the Salmonella Typhimurium-induced mouse model of colitis, it is not clear how an oral infection of C57BL/6 would lead to colitis. Streptomycin is always pretreated (https://link.springer.com/protocol/10.1007/978-1-0716-1971-1_17).

      Response: Thank you very much for your reading and comments our manuscript. The S. Typhimurium ATCC14028 (STm) used in this study is a highly virulent strain. The findings of the predimed trial indicated that mice infected with 107 CFU STm exhibited notable symptoms in the absence of streptomycin pretreatment. Hence, streptomycin was not utilized as a pretreatment for mice in this study. We appreciate your review and feedback and hope that our response adequately addresses your concerns.  

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have the potential benefit of serving as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability of HBXN2020 to inhibit the growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose-dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance of colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate the effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting them with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Response: Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Response: Thanks for the comments and the positive reception of the manuscript.

      (3) The mouse experiments are very convincing.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there is no investigation of the mechanism that underpins this.

      Response: Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) The mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores with the current gold standard for treatment.

      Response: We gratefully appreciate for your valuable comments. The objective of this study is to investigate the potential of B. velezensis spores in mitigating bacterial-induced colitis. In this experiment, animal experimental design referred to the method described in previous studies with slight modifications (10.1038/s41467-019-13727-9, 10.1126/scitranslmed.abf4692). We appreciate your review and feedback. We hope that our response adequately addresses your concerns.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation. Overall, the manuscript is of potential interest to readers.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      There are few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Response: Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract:

      It is quite wordy, without a clear emphasis on the major point of the study. It is obvious how the host-probiotic-microbiota behaves and why it works out well, which is the key part.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have modified this in the revised manuscript as suggested.

      The updated contents were presented in line 30-32, 34-39 and 41-46 in abstract section of the revised manuscript.

      Please remove "novel", Many previous works have already documented the probiotic Bacillus velezensis. It is also NOT novel species...

      Response: Thank you for your suggestion. We have corrected it as suggested. Please see line 26 in abstract section of the revised manuscript.

      Lines 44-46. The way this conclusion is delivered is inappropriate; it should be clarified exactly according to the supported results.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 44-46 in abstract section of the revised manuscript.

      Introduction:

      Lines 71-71, Lines 75-77, Line 92 "the homeostasis of", please remove.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 96 in introduction section of the revised manuscript.

      Are the Salmonella loads the key indicator for this model?

      Response: We gratefully appreciate for your valuable comments. In this study, we aimed to evaluate whether B. velezensis can alleviate S. Typhimurium-induced colitis in mice. It has been reported that S. Typhimurium enters the intestine, colonizes and proliferates in the intestinal epithelium, and then breaks through the intestinal barrier to reach the whole body with the blood circulation system, leading to systemic infection. Thereby, the load of Salmonella in the intestine and tissue organs is also one of the key indicators reflecting Salmonella infection. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The introduction should really focus on the knowledge gap in general and in a specific field, which is not available in the current version.

      Response: Thank you for your valuable suggestion. The comments improve the depth of the manuscript. We have corrected it as suggested.

      The updated contents were presented in line 53-57, 61-64, 69-75, 85-88 and 97-100 in introduction section of the revised manuscript.

      Results:

      "Genomic Characteristics" of B. velezensis HBXN2020 are separated. There are no links between this work for safety and probiotic effects.

      Response: Thank you for your suggestion. Based on your suggestion, we have revised modifications to the "genomic characteristics" in the results section. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Are the AMR and virulent genes available on the chromosome? Is there any gene cluster that codes useful stuff that is linked to probiotic efficacy in vitro and in vivo?

      Response:  Thanks for your suggestion. The comments improve the quality and depth of manuscript. In this study, the HBXN2020 genome contains fragments of AMR and virulence genes. However, the results of antibiotic sensitivity test and safety test showed that HBXN2020 did not exhibit resistance and toxicity. Furthermore, the HBXN2020 genome contains 13 different clusters of secondary metabolic synthesis genes. such as surfactin (genomic position: 323,509), macrolactin H (genomic position: 1,384,185), bacillaene (genomic position: 1,691,549), fengycin (genomic position: 1,865,856), difficidin (genomic position: 2,270,091), bacillibactin (genomic position: 3,000,977) and Bacilysin (genomic position: 3,589,078) (Table S2). These secondary metabolites have been shown to have varying degrees of inhibition on fungi (10.3390/foods11020140), Gram-positive pathogens (10.1371/journal.pone.0251514) and Gram-negative pathogens (10.1007/s00253-017-8095-x). We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 108-110 in results section of the revised manuscript and supplementary Table 2 in the revised supplemental material.

      Finally, the raw data (Illumina, Pacbio) should also be provided.

      Response: Thanks for pointing this out. According to your suggestion, we have submitted the raw data of the HBXN2020 genome to the GenBank database, GenBank accession number CP119399.1. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      Lines 100-108, please replace this part for a more meaningful investigation that could be possibly supported by the following experimental assays.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we try our best to remove some minor results and supplement more meaningful research findings. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Lines 119-126, which are not important, did you further check what or which parts make the bacteriostasis?

      Response: Thanks for pointing this out. According to your suggestion, we try our best to remove some minor results by removing unnecessary words and sentences. Furthermore, in the following research, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.   

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      "Biosafety"? Is there a standard way to conduct this investigation? please clarify.

      Response: Thank you for pointing out this problem in manuscript. In this experiment, Biosafety assessment of B. velezensis HBXN2020 referred to the method described by Zhou et al. with slight modifications (10.1038/s41467-022-31171-0). We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 651-652 in results section of the revised manuscript.

      Why are spores used, not whole bacteria? Please clarify.

      Response: Thanks for pointing this out. We apologize for any incomprehension caused by the use of B. velezensis HBXN2020 spores in manuscript. In this study, mice were treated with B. velezensis by oral gavage, while gastric acid will drastically reduce the activity of B. velezensis. However, spores tolerated strong acidic environments well. Additionally, previous studies have also precedents of using spores (10.1126/scitranslmed.abf4692). Thank you for your comments and feedback and hope that our response adequately addresses your concerns.

      Line 196, line 287, repeated assays were conducted, but the logical link is missing.

      Response: We gratefully appreciate for your valuable comments. We apologize for any inconvenience caused by the organization and coherence of our results section. According to your suggestion, we try our best to improve the manuscript's layout by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 195-198, 246-248, 256-257 and 285-287 in results section of the revised manuscript.

      Discussion:

      Please shorten it; it is wordy but without focus.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we try our best to shorten the discussion length by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 353-355, 358-360, 366-371, 381-385, 395-401, 417-419, 430-438, 459-466, 478-481 and 484-485 in discussion section of the revised manuscript.

      Conclusion:

      Please clarify and rework it.

      Response: Thanks for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have now rewritten the conclusion.

      The updated contents were presented in line 492-496 in conclusion section of the revised manuscript.

      Materials and Methods:

      Much more detailed information should be provided.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised detailed modifications to the experimental method. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 513-515, 530-533 and Supplementary Table 5 in revised manuscript and supplemental material.

      All previous bacterial sampling and a list of results should be provided as the supplemental document.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript. In this study, we conducted preliminary biological activity testing on 362 isolates of Bacillus against pathogenic bacteria, which included S. Typhimurium ATCC14028, E. coli ATCC35150, S. aureus ATCC43300 and ATCC29213. We found that the antagonistic activity of four strains of BacillusB. subtilis H1, B. velezensis HBXN2020, B. amyloliquefaciens 6-1 and B. licheniformis BSK14)against these pathogenic bacteria, while the rest have no significant activity. So we chose these four strains to further evaluate their antibacterial activity against Gram-negative and Gram-positive pathogens (Supplementary Table 5). Based on the antibacterial test results, we found that B. velezensis HBXN2020 strain had the best antibacterial activity. so we chose B. velezensis HBXN2020 for subsequent experiments. 

      The updated contents were presented in Supplementary Table 5 in supplemental material.

      Minor points:

      All bacterial genera and species should be italicized.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      Line 39, remove repeated "importantly"

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 39 in abstract section of the revised manuscript.

      Lines 55-56, please rewrite.

      Response: Thanks for your suggestion. We have now rephrased the sentence.  

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      The relevant references should be updated, in the right format.

      Response: Thanks for your suggestion. Based on your suggestion, we have revised modifications according to the literature format of eLife magazine.

      The updated contents were presented in reference section of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) In Figure 2, the authors make the argument that the increased survival of Bacillus spores at high temperatures and low pH renders the strain useful as a probiotic as it would survive in the gut. However, the gut temperature is not significantly higher than the rest of the body (certainly not 95 degrees). One assumes the pH argument applies to surviving in stomach acid so that spores can travel to the gut. These conclusions should be clarified/revised. The survival in bile salts gastric fluid etc makes more sense.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised these conclusions. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 129-132 in results section of the revised manuscript.

      (2) The overall differences in the microbiota on the stacked bar graphs are difficult to determine. In many cases, it looks like the HBXN2020 does not have a significant effect. The subsequent scattergrams are more convincing. Perhaps the authors can think of a better way to compare composite populations. If not, I suggest moving these stacked graphs to the supplementary information.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have moved stacked graphs to the supplemental material. In addition, we replaced bar graphs with heatmaps, the differences of microbial community composition among different experimental groups were evaluated using the depth of color. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see Figure 7and 10 in revised manuscript and supplemental material.

      Minor editorial:

      (1) Line 55 - "....antibiotic therapy is...".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) Line 60 - replace "emergent search" - poor syntax.

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.  

      The updated contents were presented in line 61-62 in introduction section of the revised manuscript.

      (3) Line 63 - "...play an important...".

      Response: Thanks for pointing this out. We have now rephrased the sentence.

      The updated contents were presented in line 63-64 in introduction section of the revised manuscript.

      (4) Figure 1C is not very useful, simply reinforces the data from 1A and 1B - this can be moved to the supplementary information.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript.

      Based on your suggestion, we have moved figure 1C to the supplemental material. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see figures in revised manuscript and supplemental material.

      (5) Line 126, "...that the growth of B. velezensis HBXN2020 was relatively stable." What do the authors mean by this? "Stable" implies no increase in biomass, but the growth curve does not indicate this, there was an increase in biomass after which, the culture appeared to reach a stationary phase. This should be clarified.

      Response: Thanks for pointing this out. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      (6) In Figure 5 - all the graphs in panel A can be amalgamated into one figure using different colours/symbols.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have merged all the graphics in panel A in Figure 5 into one figure.

      The updated contents were presented in Figure 5 in the revised manuscript.

      (7) The overall cohesiveness of the manuscript could be improved.

      Response: Thank you for your valuable comments. The comments improve the quality and depth of manuscript. We have revised the entire manuscript based on your suggestions. The updated contents were presented in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      There are some issues that following issues require clarification to improve the quality of the manuscript further.

      (1) L.55: Replace "antibiotic therapies" with "antibiotic therapy".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) "Bacillus" should be modified to italics in the manuscript (see e.g., L. 26, 65, 68, 109).

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      (3) The first appearance of bacterial names in the manuscript requires the full English name (see e.g., L. 158, 159, 160).

      Response: Thank you for pointing out this problem in manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 153-156 in results section of the revised manuscript.

      (4) L.166 and 167: "we evaluated its biological safety in a mouse model" suggest modifying to "we evaluated the biological safety of HBXN2020 in a mouse model".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 163-164 in results section of the revised manuscript.

      (5) L.229: Replace "suggest" with "suggested".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 226 in results section of the revised manuscript.

      (6) L.367: The tense of "can" should be consistent with "demonstrated".

      Response: Thanks for pointing this out. We have corrected this as suggested.

      (7) L.368 and L. 369: Replace "Gram positive and Gram negative" with "Gram-positive and Gram-negative".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      (8) L.372: Replace "and" with "as well as".

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 365 in discussion section of the revised manuscript.

      (9) NCBI accession number of supplementing 16SrRNA sequencing raw data.

      Response: Thank you for your suggestion. We have added it in the revised manuscript.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      (10) L. 1020 and L. 1073: It's recommended to reduce the word count in the annotations of Figures 5 and 8.

      Response: Thank you for your valuable suggestion. We have corrected it as suggested.

      The updated contents were presented in the annotations of Figure 5 and Figure 8 in figure legends section of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors a substantial impact on aging and brain structure and function.

      Weaknesses:

      (1) Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of the two groups are not obvious and lack further details. Can they also stratified by different methods? i.e. PCA?

      Response: Thanks for the comment. In this study, principal component analysis (PCA) was applied to individualized deviation of anatomic region of interest (ROI) for dimensionality reduction, which yielded the first 15 principal components explaining approximately 70% of the total variations for identifying longitudinal brain aging patterns. These two patterns can be stratified by both linear and non-linear dimensionality reduction methods: PCA and locally linear embedding (LLE)1. The grey matter volume (GMV) of 40 ROIs at baseline were linearly adjusted for sex, assessment center, handedness, ethnic, intracranial volume (ICV), and second-degree polynomial in age to be consistent with the whole-brain GMV trajectory model. There was a clear boundary between two patterns in the projected coordinate space, indicating distinct structural differences in brain aging between the two patterns (Author response image 1).

      Author response image 1.

      Stratification of the identified brain aging patterns using linear and non-linear dimensionality reduction methods. (a) The principal component space of PC1 and PC2, and (b) two-dimensional projected locally linear embedding space derived from brain volumetric measures. Points have been colored and shaped according to grouping labels of the brain aging patterns.

      (2) Are there any external data that can be used for validation?

      Response: Thanks for the comment. We were given access to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which aimed at determining the relationships between clinical, cognitive, imaging, genetic, and biochemical biomarkers across the entire spectrum of Alzheimer’s disease. ADNI recruits participants aged between 55 and 90 years at 57 sites in the United States and Canada, who undergo a series of initial tests that are repeated at intervals over subsequent years. 

      Unfortunately, there are no appropriate and sufficient data, especially clinical, cognitive, and genetic data, to support unbiased validation of the heterogeneity in structural brain aging patterns. Only 890 (31.83%) of the 2796 subjects included in the ADNI were cognitively normal, of which 656 were included in the analyses after quality control of structural MRI and exclusion of missing covariate, with a mean age at the screen visit of 70.8 years (SD = 6.48 years), and 60.21% of the subjects were female. Thus, there are significant differences between ADNI and UK Biobank in terms of the population composition, with ADNI collecting more older subjects due to its focus on defining the progression of Alzheimer’s disease.

      Moreover, among 656 subjects with structural imaging data, the dataset used to validate the clinical, cognitive, and genetic manifestations of the brain aging patterns were missing to varying degrees. For example, blood biochemistry tests and telomere length data were missing at baseline by approximately 58% and 82% respectively, and genotype data were not assayed for more than 70 percent of the subjects. As for cognitive function tests, only the results of Mini-Mental State Examination were complete, while other tests such as the Trail Making Test and Digit Span Backward were available for less than 10 percent of subjects. 

      (3) Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Response: Thanks for the suggestion. As we mentioned in the manuscript lines 274-277, participants with brain aging pattern 2 (lower baseline total GMV and more rapid GMV decrease) were characterized by accelerated biological aging and cognitive decline. Previous research on brainAGE2,3 (the difference between chronological age and the age predicted by the machine learning model of brain imaging data) showed that as a biomarker of accelerated brain aging, people with older brainAGE have accelerated biological aging and early signs of cognitive decline, which is consistent with our discoveries in this study (lines 302-306).

      Further, genome-wide association studies identified significant genetic loci contributing to accelerated brain aging, some of which can be found in pervious GWAS on image-derived phenotypes4, such as regional and tissue volume, cortical area and white matter tract measurements, and specific brain aging mode using a data-driven decomposition approach5 (lines 207-213).

      In addition, we demonstrated the “last in, first out” mirroring patterns between structural brain aging and brain development, and found that mirroring patterns are predominantly localized to the lateral / medial temporal cortex and the cingulate cortex, noted in the manuscript lines 231-234. Large differences in the patterns of change between adolescent late development and aging in the medial temporal cortex were previously found in studies of  brain development and aging patterns6 (lines 315-317).

      (4) Sex was merely used as a covariate. Were there sex differences during brain aging? What was the sex ratio difference in groups 1 and 2?

      Thanks for the comment. Sex differences during brain aging can be observed by investigating sex-stratified whole-brain GMV trajectories. We fitted the growth curve and estimated rate of change for total grey matter volume (TGMV) separately for male and female using generalized additive mixed effect models (GAMM), which included 40,921 observations from 17,055 males and 19,958 females (Author response image 2). Overall, among healthy participants aged 44-82 years in UK Biobank, males overall had higher total GMV and a faster rate of GMV decrease over time, while females had lower total GMV and a lower rate of GMV decrease. Similar conclusion can be found in normative brain-volume trajectories across the human lifespan7 . Supplementary Table 5 showed baseline and demographic characteristics for all participants and participants stratified by brain aging patterns. There were slightly more females than males among the total participants and for brain aging pattern 1 (53.4%) and pattern 2 (54.4%), and χ^2 tests showed no significant difference in the sex ratio between the two patterns (P = 0.06).

      Author response image 2.

      Total gray matter volume (TGMV) (a) and the estimated rate of change (b) for females (red) and males (blue). Rates of volumetric change for total gray matter and each ROI were estimated using GAMM, which incorporates both cross-sectional between-subject variation and longitudinal withinsubject variation from 22,067 observations for 19,958 females, and 18,854 observations for 17,055 males. Covariates include assessment center, handedness, ethnic, and ICV. Shaded areas around the fit line denotes 95% CI.

      (5) Although statistically significant, Figure 3 shows minimal differences. LTL and phenoAge are displayed in adjusted values but what are the actual values that differ between patterns 1 and 2?

      Response: Thanks for the comment. We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot. Associations between biological aging biomarkers and brain aging patterns were listed in Supplementary Table 6. Compared to brain aging pattern 1, participants in pattern 2 with more rapid GMV decrease had shorter leucocyte telomere

      length (P = 0.009, Cohen’s D = -0.028) and higher PhenoAge (P = 0.019, Cohen’s D = 0.027) without covariate adjustment. Specifically, participants in brain aging pattern 1 had average Z-standardized LTL 0.083 (SD 0.98) and average PhenoAge 41.35 years (SD 8.17 years), and those in pattern 2 had average Z-standardized LTL 0.055 (SD 0.97) and average PhenoAge 41.58 years (SD 8.32 years).

      (6) It is not intuitive to link gene expression results shown in Figure 8 and brain structure and functional differences between patterns 1 and 2. Any overlap of genes identified from analyses shown in Figure 6 (GWAS) and 8 (gene expression)?

      Response: Thanks for the comment. We apologize for the confusion. As we mentioned in the Result Section Gene expression profiles were associated with delayed brain development and accelerated brain aging, seventeen of the 45 genes mapped to GWAS significant SNP were found in Allen Human Brain Atlas (AHBA) dataset. Gene expression of LGR4 (rspearman = 0.56, Ppermutation = 2.5 × 10-4) were significantly associated with delayed brain development, and ESR1 (rspearman = 0.53, Ppermutation = 1.5 × 10-4) and FAM3C (rspearman = -0.37, Ppermutation = 0.004) were significantly associated with accelerated brain aging. BDNF-AS was positively associated with both delayed brain development and accelerated brain aging after spatial permutation test. Full association between gene expression profiles of mapped genes and estimated APC during brain development / aging were presented in Supplementary Tables 12 and 13, respectively.  

      Furthermore, we screened the genes based on their contributions and effect directions to the first PLS components in brain development and brain aging. We have found genes mapped to GWAS significant SNP among the genes screened for inclusion in the functional enrichment analysis (Author response table 1), with LGR4 (PLSw1(LGR4) = 3.70, P.FDR = 0.002) associated with delayed development and ESR1 (PLSw1(ESR1) = 3.91, P.FDR = 6.12 × 10-4) and FAM3C (PLSw1(FAM3C) = -3.68, P.FDR = 0.001) associated with accelerated aging.

      Author response table 1.

      Contributions and effect directions of the first PLS components in brain development and brain aging of genes that mapped to GWAS significant SNP. The bold P values reflect significance (P < 0.005, inclusion in the functional enrichment analysis) after FDR correction.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      There appears to be a lack of connection between the analysis results and their claims. Readers lacking sufficient background knowledge of the brain may find it difficult to understand the paper. It would be beneficial to modify the figures and writing to make the authors' claims clearer to readers. Furthermore, the paper gives an overall impression of being less polished in terms of abbreviations, figure numbering, etc. These aspects should be revised to make the paper easier for readers to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Gray matter volume (GMV) is defined later in the manuscript and may confuse readers.

      Response: Thanks for the comment. We have now defined GMV upon its first appearance in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In conducting GWAS, the authors used total GMV at the age of 60 as a phenotype (line 195). It would be beneficial to provide additional explanation as to why only the data from individuals aged 60 were utilized, especially considering the ample availability of GMV data.

      Response: Thanks for the comment and we apologize for the confusion. As we mentioned in the Methods Section Genome Wide Association Study to identify SNPs associated with brain aging patterns, we performed Genome-wide association studies (GWAS) on individual deviations of total GMV relative to the population average at 60 years using PLINK 2.0. Therefore, data from all individuals were used in the GWAS, rather than only those aged at 60y. To accomplish this, deviation of total GMV from the population average for each participant at age 60y was calculated using mixed effect regression model as described in the Methods Section Identification of longitudinal brain aging patterns.

      (2) Whole-brain gene expression data was linked to GMV (Line 237). Gray matter is known to account for about 40% of the total brain. Thus, interpreting whole-brain data in connection with GMV might introduce significant errors. Could this potential source of error be addressed?

      Response: Thanks for the comment. In our study, the Allen Human Brain Atlas (AHBA) dataset were processed using abagen toolbox version 0.1.3 (https://doi.org/10.5281/zenodo.5129257) with Desikan-Killiany atlas8, resulting in a matrix (83 regions × 15,633 gene expression levels) of transcriptional level values that contains brain structure of cortex and subcortex in bilateral hemispheres, and brainstem. Only data from 34 cerebral cortex regions, but not the whole brain, were included in the analysis of the association between regional change rate of gray matter volume and gene expression profiles using partial least squares (PLS) regression. We have clarified in the revised manuscript that we utilized AHBA microarray expression data from regions of interest (ROIs) in the cortex.

      (3) The paper lacks biological interpretation of the important genetic factors (SNPs and genes) for brain aging discovered in this study, as well as the results of gene ontology analysis. Many readers would be curious about the biological significance of these genetic differences and what kind of outcomes they may produce.

      Response: Thanks for the suggestion. As we mentioned in our manuscript, six independent single nucleotide polymorphisms (SNPs) were identified at genome-wide significance level (P < 5 ×1 0-8) (Fig. 6). Among them, two SNPs (rs10835187 and rs779233904) were also found to be associated with multiple brain imaging phenotypes in previous studies, such as regional and tissue volume, cortical area and white matter tract measurements. Compared to the GWAS using global gray matter volume as the phenotype, our GWAS revealed additional signal in chromosome 7 (rs7776725), which was mapped to the intron of FAM3C and encodes a secreted protein involved in pancreatic cancer and Alzheimer's disease. This signal was further validated to be associated with specific brain aging mode by another study using a data-driven decomposition approach. In addition, another significant locus (rs10835187, P = 1.11 ×1 0-13) is an intergenic variant between gene LGR4-AS1 and LIN7C, and was reported to be associated with bone density, and brain volume and total cortical area measurements. LIN7C encodes the Lin-7C protein, which is involved in the localization and stabilization of ion channels in polarized cells, such as neurons and epithelial cell. Previous study has revealed the association of both allelic and haplotypic variations in the LIN7C gene with ADHD. In addition, ESR1 was found to be involved in I-kappaB kinase/NF-kappaB signaling in the functional enrichment associated with accelerated brain aging (Figure 8 and Supplementary Figure 5), and its activation leads to a variety of human pathologies such as neurodegenerative, inflammatory, autoimmune and cancerous disease9. 

      In summary, the analyses from using the databases of GO biological processes and KEGG Pathways indicate synaptic transmission as an important process in the common mechanisms of brain development and aging, and cellular processes (autophagy), as well as the progression of neurodegenerative diseases, are important processes in the mechanisms of brain aging.

      (4) As mentioned in the public review, it would be helpful if figures were revised to more clearly represent the claims.

      (4.1) For Figure 1, it would be beneficial to explain how the authors analyzed the differences between the mentioned cross-section and longitudinal trajectory, which they identified as a strength of the study.

      Response: We have added the strengths of adopting longitudinal data for modeling brain aging trajectories compared to only using cross-sectional data in Figure 1 caption in the revised manuscript:

      “Fig. 1 Overview of the study workflow. a, Population cohorts (UK Biobank and IMAGEN) and data sources (brain imaging, biological aging biomarkers, cognitive functions, genomic data) involved in this study. b, Brain aging patterns were identified using longitudinal trajectories of the whole brain GMV, which enabled the capturing of long-term and individualized variations compared to only use cross-sectional data, and associations between brain aging patterns and other measurements (biological aging, cognitive functions and PRS of major neuropsychiatric disorders) were investigated. c, Mirroring patterns between brain aging and brain development was investigated using ztransformed brain volumetric change map and gene expression analysis.”

      (4.2) In Figure 3, it's challenging to distinguish differences between patterns 1 and 2 in LTL and PhenoAge. (e.g. It's unclear whether Pattern 1 is higher or lower). Clarifying this visually would be useful.

      Response: We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot.

      Author response image 3.

      Distributions of biological aging biomarkers (leucocyte telomere length (LTL) and PhenoAge) among participants with brain aging patterns 1 and 2.

      (4.3) Figure 7 explains the mirroring pattern, but it's hard to discern significant differences from the figures alone (especially in Figures 7b and 7c). Using an alternative method (graph, etc.) to clearly represent this would be appreciated.

      Response: We have included an arrow pointing to the brain regions with significant differences in each subfigure.

      Author response image 4.

      The “last in, first out” mirroring patterns between brain development and brain aging.

      (5) Abbreviations should be explained when they are first introduced in the paper. For example, GMV continues to be used without explanation, and in line 203, it is written out as 'gray matter volume'. ADHD and ASD first appear at line 172, but the explanation is found in lines 177-178. Additionally, there are terms without explanations in the manuscript. For instance, BMI is not explained in the main manuscript but is defined in the Supplementary Information (Table S6).

      Response: We have corrected the inappropriate formatting regarding misplaced and missing abbreviations in the revised manuscript and Supplementary Information.

      (6) Figure numbers should follow the order of appearance in the paper. The first Supplementary Fig. in the manuscript is Supplementary Figure 3. It should be Supplementary Figure 1.

      Response: We have relabeled the figures with the order of appearance in the paper in the revised manuscript and Supplementary Information.

      Reference:

      (1) Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. science 290, 2323–2326 (2000).

      (2) Christman, S. et al. Accelerated brain aging predicts impaired cognitive performance and greater disability in geriatric but not midlife adult depression. Translational Psychiatry 10, 317 (2020).

      (3) Elliott, M. L. et al. Brain-age in midlife is associated with accelerated biological aging and cognitive decline in a longitudinal birth cohort. Molecular psychiatry 26, 3829–3838 (2021).

      (4) Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature neuroscience 24, 737–745 (2021).

      (5) Smith, S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. elife 9, e52677 (2020).

      (6) Tamnes, C. K. et al. Brain development and aging: overlapping and unique patterns of change. Neuroimage 68, 63–74 (2013).

      (7) Bethlehem, R. A. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).

      (8) Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).

      (9) Singh, S. & Singh, T. G. Role of nuclear factor kappa B (NF-κB) signalling in neurodegenerative diseases: an mechanistic approach. Current Neuropharmacology 18, 918–935 (2020).

    2. eLife assessment

      Duan et al analyzed brain imaging data in UKBK and divided structural brain aging into two groups, revealing that one group is more vulnerable to aging and brain-related diseases compared to the other group. Such subtyping could be valuable and utilized in predicting and diagnosing cognitive decline and neurodegenerative brain disorders in the future. This discovery, supported by solid evidence, harbors a substantial impacts in aging and brain structure and function.

    3. Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors substantial impacts in aging and brain structure and function.

      Weaknesses:

      Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of two groups are not obvious and lack further details. Can they also stratified by different method? i.e. PCA?

      Any external data can be used for validation?

      Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Sex was merely used as a covariate. Were there sex-differences during brain aging? Sex ratio difference in group 1 and 2?

      Although statistically significant, Fig 3 shows minimal differences. LTL and phenoAge is displayed in adjusted values but what is the actual values that differ between pattern 1 and 2?

      It is not intuitive to link gene expression result shown in Fig 8 and brain structure and functional differences between pattern 1 and 2. Any overlap of genes identified from analyses shown in Fig 6 (GWAS) and 8 (gene expression)?

    4. Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      The authors provided appropriate answers to the reviewers' questions and revised the manuscript accordingly, and as a result, the paper has been edited to be more easily understood.

    1. eLife assessment

      This study presents an important dataset that captures the transition from epiblast to amnion using a novel in vitro model of human amnion formation. The supporting evidence for the authors' claims is convincing. Key strengths of the study include the efficiency and purity of the cell populations produced, a high degree of synchrony in the differentiation process, comprehensive benchmarking with single-cell data and immunocytochemistry from primate embryos, and the identification of critical markers for specific differentiation phases. A notable limitation, however, is the model's exclusion of other embryonic tissues.

    2. Reviewer #2 (Public Review):

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated.

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.

    3. Reviewer #3 (Public Review):

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.

      [Editors' note: In the revised manuscript, the authors have added new results and made textual revisions that address the reviewers' concerns. These changes have significantly enhanced the clarity, quality, and impact of the study. ]

    4. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the reviewers for their insightful comments, which have helped to improve the manuscript. We provide specific examples and a point-by-point response to all comments, below. Based on the Reviewers’ comments, we revised our manuscript, adding considerable amount of new data (found in Fig. 1A,B, 4E-G, 7C,D, 8C,E, S1B,C, S2C-G, S4C, and Video 1). In the main manuscript text, blue fonts indicate added or revised texts. An additional author (Lauren N. Juga) is added for the newly generated data in the revised manuscript.

      Reviewer #1: 

      Sekulovski et al present an interesting and timely manuscript describing the temporal transition from epiblast to amnion. The manuscript builds on their previous work describing this process using stem cell models. 

      They suggest a multi-step process initiated by BMP induction of GATA3, followed by expression of TFAP2A, followed by ISL1/HAND1 in parallel with loss of pluripotency markers. This transition was reproduced through IF analysis of CS6/7 NHP embryo. 

      There are significant similarities in the expression of trophectoderm and the amnion. There are also ample manuscripts showing trophoblast induction following BMP stimulation of primed pluripotent stem cells. The authors should ensure that the amnion indeed is only amnion and not trophectoderm (or the amount of contribution to trophectoderm). As an extension, does the amnion character remain after the 48h BMP4 treatment, and is a trophectoderm-like state adopted as suggested by Ohgushi et al 2022?  

      Thank you for this insightful comment. As pointed out, Ohgushi et al. showed that, in their culture method, amnion is first induced, and extended culturing leads to the formation of trophectoderm-like cells (Ohgushi et al., 2022).

      Importantly, we would like to note that our culture system differs substantially from that of Ohgushi et al. in several respects. First our system uses a 3D culture method while Ohgushi et al. employ 2D hPSC monolayers. Second, the two systems are chemically quite distinct. In our Glass-3D+BMP protocol, cells are cultured in mTeSR media (which contains FGF2 and TGFb1) for two days, by which time they generate 3D pluripotent cysts. BMP is then added to the culture medium for 24 hours, followed by another 24 hours without BMP4. In stark contrast, Ohgushi et al. employ A83-01, an Activin/Nodal signaling inhibitor, and PD173074, an FGF signaling inhibitor (a protocol which they call AP). This treatment leads to spontaneous activation of BMP signaling, but it also clearly inhibits Activin/Nodal and FGF signaling pathways, which remain active in our system. As a result of these distinct chemical as well as geometrical culturing protocols, their system produces amnion and trophectoderm, while our system produces exclusively amnion.

      Further analysis of gene expression data provides additional data supporting our contention that our system produces amnion. Though the gene expression profiles of amnion and trophectoderm are quite similar, specific markers of trophectoderm have been identified including GCM1, PSG1, PSG4 and CGB (Blakeley et al., 2015; Meistermann et al., 2021; Ohgushi et al., 2022; Okae et al., 2018; Petropoulos et al., 2016; Yabe et al., 2016). Importantly, while all of these markers are abundantly expressed in the Ohgushi et al. system, bulk RNA sequencing analysis of our Glass-3D+BMP hPSC-amnion cells reveals that none of these markers are detectable. Indeed, SDC1, a marker that Ohgushi et al. claim distinguishes trophoblast from amnion actually decreases (more than 8-fold) as pluripotent cysts transition to amnion in Glass3D+BMP. Finally, Ohgushi et al. report that ISL1, a key marker of specified amnion population, is initially increased in their system, but is reduced to a basal level overtime. In contrast, in Glass3D+BMP hPSC-amnion, ISL1 expression continuously increases with time, and ISL1 protein expression is seen uniformly throughout the amnion cysts. This uniform expression is also seen in CS6/7 cynomolgus macaque amnion. Together, these results support out conclusion that the Glass-3D+BMP system leads to the formation of amniotic cells, and not trophectoderm cells.

      The functional data does not support a direct function of GATA3 prior to TFAP2A and the authors suggest compensatory mechanisms from other GATAs. If so, which GATAs are expressed in this system, with and without GATA3 targeting? Would it not be equally likely that the other early genes could be the key drivers of amnion initiation, such as ID2? 

      We appreciate this helpful comment. We agree that our data do not provide sufficient evidence for the role of GATA3 in early amniogenesis. We also agree that other early genes could be key drivers, and apologize for including our speculation that focuses only on GATA2. GATA2 was selected because, among the other GATAs, GATA2 and GATA3 are the only abundantly expressed GATA factors. This point suggesting a potentially redundant role of GATA2 is now removed from the manuscript (Line#355 of the original manuscript).

      The targeting of TFAP2A displays a very interesting phenotype which suggests that amnion and streak share an initial trajectory but where TFAP2A is necessary to adopt amnion fate. It would again be important to ensure that this alternative fate is indeed in streak and not misannotated alternative lineages, including trophoblast. 

      Is TBXT induced in this setting as well as in the wt situation during amnion induction? This should be displayed as in Figure 3D and would be nice to be complimented by NHP IF analysis.

      We will address these two closely related comments together.

      TFAP2A-KO cysts contain ISL1+ squamous cells as well as SOX2+ pluripotent cells, suggesting that, while the initial focal amniogenesis is seen, subsequent spreading event is not seen. Interestingly, our new data show that TFAP2A-KO cysts display cells with high TBXT expression (Fig. 8E, Line#373-374). This result suggests that, in the absence of TFAP2A, once amnion lineage progression is halted, more primitive streak-like (TBXThigh) lineage emerges. It is important to note that TBXT expression is not seen in the trophectoderm population of cynomolgus macaque peri-gastrula (Sasaki et al., 2016; Yang et al., 2021).

      As suggested, we now include a TBXT expression time course during hPSC-amnion formation in Fig. S2D of the revised manuscript. These data show weak TBXT expression (transcripts) starting at the 24-hr timepoint. However, a clear TBXT protein signal could not be detected using IF (Fig. S2C), likely because TBXT expression is very low (Line#264-265). While statistically significant compared to the 12-hr timepoint, TBXT expression is 31 FPKM +/- 0.8 (standard deviation) at 24-hr and 48 FPKM +/- 6 at 48-hr. These are low expression values compared to, for example, TFAP2A, which displays 572 FPKM +/- 23 at 12-hr and 1169 FPKM +/- 27 at 24-hr, at which TFAP2A is readily detected using IF. While weak nuclear TFAP2A is seen using IF at 6hr (187 FPKM +/- 7), no clear TFAP2A is detected at 3-hr (74 FPKM +/- 7). Another example is ISL1, which displays 758 FPKM +/- 55 at 24-hr and 1505 FPKM +/- 26 at 48-hr, when ISL can be detected using IF. Importantly, we were not able to detect ISL1 protein expression using IF at

      12-hr, at which its expression level is 12 FPKM +/-18. Lastly, we now show that, in the cynomolgus macaque peri-gastrula, while pSMAD1/5+ primitive streak-derived disseminating cells show abundant TBXT expression, no clear TBXT expression is seen in the amnion territory (Fig. S2G, Line#291-293). 

      Together, these results show that while a TBXTlow state clearly emerges during hPSC-amnion development, in wild-type hPSC cultured in Glass-3D+BMP, TBXT levels remain low throughout amnion differentiation. However, in the absence of TFAP2A, a TBXThigh state is seen, suggesting that TFAP2A is critical for suppressing this TBXThigh state in fate spreading cells, perhaps by preventing BMP responding cells from acquiring embryonic lineages (e.g., mesodermal and/or primordial germ cells).

      The authors should address why they get different results from Castillo-Venzor et al 2023 DOI: 10.26508/lsa.202201706  

      Thank you very much for this helpful suggestion, and we now include a section detailing this in the Discussion (Line#410-432). In short, we propose several possibilities. First, culturing conditions are highly distinct. Castillo-Venzor et al. (Castillo-Venzor et al., 2023) utilize initial “pre-mesoderm” conditioning by Activin and CHIR, followed by treating floating embryoid bodies with a growth factor cocktail (BMP, SCF, EGF and LIF). In contrast, our system (Glass-3D+BMP) employs BMP stimulation of pluripotent cysts. Thus, we suspect that, in the PGCLC differentiation condition, cells are conditioned to the pre-mesodermal lineage. Moreover, we propose that amnion fate spreading may not be present in the PGCLC system, perhaps due to differences in geometry (aggregates versus cysts), or due to differing lineage commitment programs. That is, while initial amniogenesis is seen in the PGCLC system, most cells may already be committed to the PGC-like or mesodermal lineages by the time amnion fate spreading can occur. Alternatively, because several cell types (PGC-like, mesodermal and amniotic) co-exist in the culture by Castillo-Venzor et al., PGC-like and/or mesodermal cells may compensate for the loss of TFAP2A.

      Reviewer #2: 

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated. 

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.  

      We are happy that Reviewer #2 agrees that our Glass-3D+BMP model is important for investigating additional roles of amniogenesis, as well as roles of amnion as a signaling hub, due to the purity of the amniotic cell population, and a high degree of synchrony of differentiation.

      We respectfully disagree that the absence of other embryonic tissues in the model is a weakness: rather, we believe it is a strength because this single lineage amnion model allows us to directly (and independently) investigate mechanisms underlying amnion lineage progression. For example, as noted above in our response to Reviewer #1, use of our hPSCamnion model allowed us to see a very specific and interesting phenotype in the absence of TFAP2A (reduced amnion formation and emergence of an alternative lineage), though previous findings by Castilllo-Venzor et al. concluded that amniogenesis is not affected by loss of TFAP2A. We noted that the culture method used by Castillo-Venzor et al. contains several cell types (amniotic, mesodermal and PGC-like), and that amniogenesis may be intact in that model due to compensation by the presence of these other cell types. That is, while cell-cell interactions can indeed be gleaned in culture systems with several cell types, the presence of multiple cell types and their additional signaling inputs can also confound some aspects of mechanistic investigations. We now include a paragraph in the Discussion of the revised manuscript (Line#410-432), in which we detail these ideas, and suggest that, because of the cell purity, our Glass-3D+BMP model enables robust mechanistic examinations, specifically during amnion formation.

      We address Reviewer #2’s point about bulk vs. single cell transcriptomic similarity analysis in Reviewer’s specific point #4 below. We do, however, want to note here that we have performed the same analysis using a 14-day old cynomolgus macaque peri-gastrula single cell RNA sequencing dataset generated by Yang et al. (Yang et al., 2021), and obtained a lineage trajectory (Fig. 4F, Line#265-268) similar to that seen when the Tyser et al. dataset (Tyser et al., 2021) was used (Fig. 4C).

      Importantly, while cynomolgus macaque early embryo samples are limited, we now include additional staining (Fig. S2G). 

      Reviewer #2 (Recommendations For The Authors): 

      Provide more confirmation of key findings in more than one stem cell line. 

      We now confirm key findings in the H7 human embryonic stem cell line (Fig. S1C).

      Provide stronger evidence e.g. scRNA-seq to support the existence of intermediate cells or tone down the conclusions.  

      We agree that this is a very important point. In our recent study (Sekulovski et al., 2023), we performed single cell RNA sequencing of Gel-3D, another hPSC-amnion model. In this study, we comprehensively described the transcriptome associated with the “intermediate” cell types, as well as CLDN10 as a marker of these cell types. Moreover, we now include additional data showing the molecular characteristics of the TBXTlow intermediate cells during amniogenesis in hPSC-amnion (Fig. S2C, S2D) and d14 cynomolgus macaque peri-gastrula (Fig 4G, replot of single cell RNAseq by (Yang et al., 2021), Line#264-268).

      Provide more data on the expression of DLX5 in the model. 

      We now provide a DLX5 staining time course in Fig. 7C. We find that, similar to ISL1, prominent DLX5 staining is seen in the focal cells at 24-hr post-BMP. Interestingly, at 48-hr, while some cells show high levels of DLX5, some cells show low DLX5 levels; this is of an interest for future investigations.

      (1) L159 - the authors should repeat more of the key results in at least one other hPSC line, to ensure reproducibility of the method. Figure S1 contains minimal information (one timepoint, three genes, one biological replicate) on a single different hPSC line. 

      We now include additional validation analysis using the H7 human ESC line (Fig. S1).

      (2) Figure 1- it is a little difficult to appreciate cyst formation from images taken at one level in the stack, can the authors perhaps show a 3D rendering or video to display morphogenesis better? 

      We now provide all optical sections of cysts shown in Movie 1.

      (3) Figure 1-did the authors carry out podocalyxin staining? This is a standard marker for lumenogenesis.  

      We now provide PODXL staining (Fig. 1A,1B).

      (4) L248 onwards and Figure 4-I am a little skeptical concerning conclusions drawn from an overlay of bulk RNA-seq onto scRNA-seq UMAP plots. I think the authors need to provide some strong justification for this approach. I would be particularly careful about concluding that cells depicted in Fig 4D represent an intermediate close to primitive streak and even more careful about claiming any lineage relationship between T-positive "primitive streak like intermediates" and the trajectory of cells in the model. UMAP is a dimension-reduction technique for the visualization of clusters in high-dimensional data. It is not a lineage-tracing methodology. It would have been preferable for the authors to present their own scRNA-seq data from the model.  

      We are sorry that it was not clear that our approach to find similarity between bulk and single cell RNA-seq data is largely based on a published work (Granja et al., Nature Biotechnology 2019, (Granja et al., 2019)) named projectLSI. Please refer to our Methods section for details of the implementation and how we modified it for better visualization (addressed in Line#667-676 of the original manuscript, now in Line#718-730). The performance of projectLSI was extensively evaluated in the original article. Furthermore, as pointed out, UMAP is indeed a dimension reduction method that has been widely used in single cell RNA-seq research. In addition to visualizing clusters, trajectory analysis, such as RNA-velocity (which is used in this study), is another successful and widely adapted application of UMAP to gauge fate progression. Therefore, we believe that UMAP can be effectively used as a lineage prediction methodology, and that our use of bulk to single cell transcriptomic similarity analysis leveraging projectLSI is well justified at conceptual and technical levels.

      As illustrated in Fig. 5A, we performed RNA-velocity analysis of the Tyser et al. dataset, and our result clearly predicts a differentiation trajectory from Epiblast, a part of the TBXTlow population shown in Fig. 4D, and, then, to Ectoderm/Amnion cells. Consistent with this bioinformatic result, we now show that some cells show some but weak TBXT expression (at the transcript level) at the 24-hr post-BMP timepoint in control hPSC-amnion (Fig. S2D, Line#264-265). Importantly, our conclusion is drawn from a trajectory based on our time course (0, 0.5, 1, 3, 6, 12, 24, and 48 hours post-BMP treatment) which shows a clear transition from epiblast cells to TBXTlow and then finally to the ectoderm/amnion population. Moreover, using the transcriptomic similarity analysis, we found that the loss of TFAP2A leads to emergence of more primitive streak-like transcriptional characteristics (Fig. 8D). Indeed, using IF, we now show that several fate spreading cells in the TFAP2A-KO cysts are TBXThigh (Fig. 8E, Line#373-374). Thus, the new data provide additional evidence for the successful implementation of this bulk/single cell transcriptomic similarity analysis.

      Together, our bioinformatic and localization analyses show that the Glass-3D+BMP system recapitulates the trajectory found in our Tyser et al. RNA-velocity analysis, further supporting the validity of this differentiation trajectory. To avoid confusion, however, we now omit the “primitive streak-like” phrase when describing the TBXTlow cells because, while they may show some TBXT expression, they are likely intermediate fate transitioning cells. Indeed, a recent study by Ton et al. (Ton et al., 2023) showed that the Tyser et al. Primitive Streak cells consist of a mix of several lineage progressing cells (e.g., Epiblast, Non-neural ectoderm, Anterior or caudal primitive streak, PGC). Therefore, these cells are now specifically described as “TBXTlow” state; TBXThigh cells are described as primitive streak-like state.

      (5) L276 Tyser data do come from a primate model; the authors mean NHP.  

      We now specifically state that the validation is performed in a non-human primate model (Line#280).

      (6) Figure 5-though the immunostaining of the CS6/7 monkey embryos is excellent, the authors should not overinterpret these images. What is shown is not a time course, and one can only infer that a particular pattern of gene expression exists in a spatial sense from these images. In the model (Figure 2), the epiblast markers gradually fade and overlap for a time with emergent amnion markers, but in Figure 5 the transition between epiblast and amnion in the embryo seems pretty sharp, at least in terms of gene expression. There may be a few cells in D that show overlap of SOX2 and TFAP2A, but if the authors want to claim that a transition zone exists, they need to produce stronger evidence. Figure 7 is more convincing but see the next point. 

      Thank you for this insightful comment. We now address the nature of the transitioning boundary cell population extensively in our other recent study (Sekulovski et al., 2023).

      (7) Figure 7 further confuses the issue. A zone at either end of the epiblast is clearly positive for Sox2 and the two amnion markers, clearer than in Figure 5, but why does the marker DLX5 overlap with SOX2 in the embryo (7d) but not the model (7C)? Arguments regarding intermediate cell populations would be greatly strengthened by scRNA-seq data on the model system. 

      In our original manuscript, our DLX5 staining was performed at 48-hr post-BMP, at which SOX2 expression is absent in all cells. Our new analysis at the 24-hr timepoint now shows that DLX5 is expressed in SOX2+ cells (this is now presented in Fig. 7C).

      As stated in the point #6, our recent study comprehensively describes the transcriptomic and spatial characteristics of the transitioning boundary cell population (Sekulovski et al., 2023).

      (8) L357 TFAP2C KO does not resemble intermediate cysts in Figure 2. In Figure 2, both SOX2 and amnion markers are co-expressed in the same cells. In 8C, SOX2 and ISL1 are mutually exclusive.  

      We agree with this comment, and now removed this statement pointing out the resemblance (Line#359 of the original manuscript).

      (9) Figure 8d-the same caveats noted above regarding the interpretation of superposition of bulk RNA-seq data with scRNA-seq UMAP analysis apply here.  

      Please refer to our explanation in point#4.

      Reviewer #3: 

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.  

      We are happy that Reviewer #3 appreciates that our model can be used effectively to identify previously unrecognized amniotic gene activation cascade, providing a comprehensive timecourse transcriptomic resource.

      As detailed below, we address specific concerns raised by Reviewer #3. We now provide additional mechanistic insights into amnion fate progression, and include additional transcriptomic comparisons with a cynomolgus macaque single cell RNA sequencing dataset.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors generated KO cell lines lacking GATA3 and TFAP2A, respectively. Their results showed some disrupted amnion differentiation only in TFAP2A-KO. Therefore, these data do not provide sufficient evidence to support whether these transcription factors are crucial for amnion fate specification. Perhaps an experiment could be done with overexpression of these markers and testing if they could force hPSC to adopt amnion-like fate.  

      Thank you for this insightful comment. We generated cell lines that enable us to inducibly express GATA3 or TFAP2A, and the transgene expression was induced at d2 (when BMP treatment is normally initiated) until d4. However, this inducible expression did not lead to amniogenesis, and cysts maintained pluripotency. Due to the uninterpretable nature, these results are not included in the revised manuscript.

      As detailed extensively in the manuscript, within each cyst, amniogenesis is initially seen focally, then spreads laterally resulting in fully squamous amnion cysts. This is also seen in our previously published Gel-3D amnion model (extensively described in (Shao et al., 2017)). In the absence of TFAP2A, we showed that the focal amniogenesis is observed, but spreading is not seen, suggesting that TFAP2A controls amnion fate progression. Therefore, while TFAP2A is not critical for the amnion fate specification in the focal cells, our results show that TFAP2A indeed helps to promote amniotic specification of cells neighboring the focal amniotic cells. Moreover, in the revised manuscript, we now show that TFAP2A transgene expression in the TFAP2A-KO background restores formation of fully squamous hPSC-amnion, further establishing the role of TFAP2A in amnion fate progression (Fig. 8C of the revised manuscript, Line#362-364).

      (2) The transcriptomic analysis made by the authors provides some comparison between BMPinduced amnion-like cells in vitro and the amnion-like cells from CS7 human embryo in vivo. However, the data set from the human embryo contains only a limited number of cells, and might not provide a sufficient base for decisive assessment of the true identity of amnion-like cells obtained in vitro. It might help if the authors could integrate their bulk sequencing data with other primate embryo data sets.  

      Thank you for this helpful comment. We have now performed our transcriptional similarity analysis using early (day 14) cynomolgus macaque embryo datasets generated in a study by (Yang et al., 2021), and found that the bulk time-course transcriptome of our hPSC-amnion model overlaps with the cynomolgus macaque amniotic lineage progression (Fig. 4F, Line#265268). We also now provide the expression of key markers within the Yang et al. dataset (GATA3, TFAP2A, ISL1, TBXT, DLX5, Fig. 4G, S2F).

      (3) Following the point above, the authors used transcriptomic analysis to identify several intermediate states of cells during amnion differentiation and claimed that there is a primitivestreak-like intermediate. However, this might be an overstatement. During stem cell culture and differentiation, intermediate states showing a mixture of biomarkers are very common and do not imply that such intermediates have any biological meaning. However, stating that amnion differentiation passes through primitive streak-like intermediates, might imply a certain connection between these two lineages, for which there is a lack of solid support. Instead, a more interesting question might be how amnion and primitive streak differentiation, despite some transcriptomic similarity, diverge from each other during early development. What factors make this difference? The authors might further analyze RNA-seq data to provide some insights.  

      Thank you very much for the insightful comments. 

      We understand Reviewer #3’s concern that the intermediate state that we see may not recapitulate a primitive streak-like state. However, in our original manuscript, we described these cells as “Primitive Streak-like” because those cells were annotated as Primitive Streak in the dataset by Tyser et al. Interestingly, a recent study by Ton et al. showed that the Tyser et al. Primitive Streak cells actually consist of a mixture of different cell lineages (e.g., Epiblast, Nonneural ectoderm, Anterior or caudal primitive streak, PGC (Ton et al., 2023)). Therefore, we agree that it was an overstatement to call them “Primitive Streak-like”, and, to avoid confusions, we now label the TBXTlow sub-population found in the Tyser et al. Primitive Streak population as “TBXTlow state” throughout the manuscript.

      Our data indicate that TFAP2A may play a role in controlling the lineage decision between amnion and primitive streak cells that abundantly express TBXT (TBXThigh). In the original manuscript, we included data showing that 48-hr TFAP2A-KO cysts show transcriptomic characteristics similar to some Primitive Streak cells (Fig. 8D). Intriguingly, our new data show that, in the absence of TFAP2A, some TBXThigh cells are indeed seen (Fig. 8E, Line#373-374). These results provide a body of evidence for the role of TFAP2A in promoting the amniotic lineage, perhaps by suppressing the TBXThigh state. This point is now addressed in the Discussion (Line#401-409).

      Additional new data:

      Using Western blot, we now show that GATA3 is absent in the GATA3-KO lines (Fig. S4C). We noticed that this was lacking in the original manuscript.

      We now show that an inducible expression of TFAP2A in the TFAP2A-KO cysts leads to controllike cysts (Fig. 8C, Line#362-364).

      Additional changes:

      Typos were fixed in Fig. 5I – “boundary” and “disseminating” were not spelled correctly.

      Line#350 – we originally noted “GATA3 expression precedes TFAP2A expression by approximately 12 hours”. This was incorrect, and is changed to 9 hours in the revised manuscript. We apologize for this mistake.

      REFERENCES

      Blakeley, P., Fogarty, N.M., del Valle, I., Wamaitha, S.E., Hu, T.X., Elder, K., Snell, P., Christie, L., Robson, P., and Niakan, K.K. (2015). Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3151-3165.

      Castillo-Venzor, A., Penfold, C.A., Morgan, M.D., Tang, W.W., Kobayashi, T., Wong, F.C., Bergmann, S., Slatery, E., Boroviak, T.E., Marioni, J.C., et al. (2023). Origin and segregation of the human germline. Life Sci Alliance 6.

      Granja, J.M., Klemm, S., McGinnis, L.M., Kathiria, A.S., Mezger, A., Corces, M.R., Parks, B., Gars, E., Liedtke, M., Zheng, G.X.Y., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature biotechnology 37, 1458-1465. Meistermann, D., Bruneau, A., Loubersac, S., Reignier, A., Firmin, J., Francois-Campion, V., Kilens, S., Lelievre, Y., Lammers, J., Feyeux, M., et al. (2021). Integrated pseudotime analysis of human pre-implantation embryo single-cell transcriptomes reveals the dynamics of lineage specification. Cell stem cell 28, 1625-1640 e1626.

      Ohgushi, M., Taniyama, N., Vandenbon, A., and Eiraku, M. (2022). Delamination of trophoblastlike syncytia from the amniotic ectodermal analogue in human primed embryonic stem cellbased differentiation model. Cell reports 39, 110973.

      Okae, H., Toh, H., Sato, T., Hiura, H., Takahashi, S., Shirane, K., Kabayama, Y., Suyama, M., Sasaki, H., and Arima, T. (2018). Derivation of Human Trophoblast Stem Cells. Cell stem cell 22, 50-63 e56.

      Petropoulos, S., Edsgard, D., Reinius, B., Deng, Q., Panula, S.P., Codeluppi, S., Plaza Reyes, A., Linnarsson, S., Sandberg, R., and Lanner, F. (2016). Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos. Cell 165, 1012-1026.

      Sasaki, K., Nakamura, T., Okamoto, I., Yabuta, Y., Iwatani, C., Tsuchiya, H., Seita, Y., Nakamura, S., Shiraki, N., Takakuwa, T., et al. (2016). The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion. Developmental cell 39, 169-185.

      Sekulovski, N., Juga, L.L., Cortez, C.L., Czerwinski, M., Whorton, A.E., Spence, J.R., Schmidt, J.K., Golos, T.G., Gumucio, D.L., Lin, C.-W., et al. (2023). Identification of amnion progenitor-like cells at the amnion-epiblast bounday in the primate peri-gastrula. bioRxiv doi:

      10.1101/2023.09.07.556553.

      Shao, Y., Taniguchi, K., Townshend, R.F., Miki, T., Gumucio, D.L., and Fu, J. (2017). A pluripotent stem cell-based model for post-implantation human amniotic sac development. Nature communications 8, 208.

      Ton, M.N., Keitley, D., Theeuwes, B., Guibentif, C., Ahnfelt-Ronne, J., Andreassen, T.K., Calero-Nieto, F.J., Imaz-Rosshandler, I., Pijuan-Sala, B., Nichols, J., et al. (2023). An atlas of rabbit development as a model for single-cell comparative genomics. Nature cell biology 25, 10611072.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S., Vallier, L., Scialdone, A., and Srinivas, S. (2021). Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285289.

      Yabe, S., Alexenko, A.P., Amita, M., Yang, Y., Schust, D.J., Sadovsky, Y., Ezashi, T., and Roberts, R.M. (2016). Comparison of syncytiotrophoblast generated from human embryonic stem cells and from term placentas. Proceedings of the National Academy of Sciences of the United States of America 113, E2598-2607.

      Yang, R., Goedel, A., Kang, Y., Si, C., Chu, C., Zheng, Y., Chen, Z., Gruber, P.J., Xiao, Y., Zhou, C., et al. (2021). Amnion signals are essential for mesoderm formation in primates. Nature communications 12, 5126.

    1. eLife assessment

      This study investigates plant-microbe interactions for an invasive plant, Ageratina adenophora. The findings are valuable in advancing our understanding of how leaf and soil microbes separately affect its performance, with solid experimental evidence revealing the importance of litter microbes in shaping A. adenophora populations. The work will be of interest to invasion biologists.

    2. Reviewer #1 (Public Review):

      Summary:

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival and seedling growth of an invasive forb, Ageratina Adenophora, and found evidence of stronger adverse effects of leaf microbes on Ageratina compared with soil microbes. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedbacks.

      Strengths:

      (1) The theoretic framework is well-established;<br /> (2) Relating the direction of plant-microbe feedback to certain microbial guild is always hard, but the authors had done a great job in identifying and interpreting such relationships.

      Weaknesses:

      (1) Allelopathic effects can't be directly accounted for;<br /> (2) The fungal strains accumulated in dead seedlings may also accumulate in live seedlings, thus more evidence is needed to validate the claim by the authors that Allophoma and Alternaria can increase seedling mortality.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival, and seedling growth of an invasive forb, Ageratina adenophora, and found evidence of stronger effects of leaf microbes on Ageratina compared with soil microbes, which were negative for seed germination and seedling survival but positive for seedling growth. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedback.

      Thank you very much for your assessment.

      Strengths:

      (1) The theoretic framework is well-established.

      (2) Relating the direction of plant-microbe feedback to certain microbial guilds is always hard, but the authors have done a great job of identifying and interpreting such relationships.

      Thank you very much for your assessment.

      Weaknesses:

      (1) In the G0 and G21 inoculation experiments, allelopathic effects from leaf litters had not been accounted for, while these two experiments happened to be the ones where negative feedback was detected.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301. 

      (2) The authors did not compare the fungal strains accumulated in dead seedlings to those accumulated in live seedlings to prove that the live seedlings indeed accumulated lower abundances of the strains that were identified to increase seedling mortality.

      Thanks for your concerns. We have not isolated fungi from healthy seedlings to make a comparative study. However, our team work previously found that the seedling-killing Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with mature A. adenophora individual; some seedling-killing Alternaria also occur in healthy seedlings inoculated by leaf litter. We thus assumed that these seedling-killing fungi, e.g., Allophoma and Alternaria, likely exist in A. adenophora mature individual by a lifestyle switch from endophytic to pathogenic, and these fungi can kill seedling only at very early life stage of A. adenophora

      Thus, we discussed this point as: “In particular, the numerically dominant Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with A. adenophora (Chen et al., 2022; Kai Fang et al., 2021; Yang et al., 2023). Interestingly, a previous report revealed that the dominant genera in healthy seedlings inoculated with leaf litter were Didymella and Alternaria (Kai Fang et al., 2019). We did not isolate fungi from healthy seedlings to determine whether the live seedlings indeed lacked or accumulated a lower abundance of the seedling-killing strains than did the dead seedlings in this study. We could assume that these fungal genera likely exist in A. adenophora mature individual experiencing a lifestyle switch from endophytic to pathogenic and play an essential role in limiting the population density of A. adenophora monocultures by killing seedlings only at very early stages. Thus, it is worth exploring the dynamic abundance of these strains and host resistance variation during A. adenophora seedling development.” in Line 432-

      444. 

      (3) The data of seed germination and seedling mortality could have been analyzed in the same manner as that of seedling growth, which makes the whole result section more coherent. I don't understand why the authors had not calculated the response index (RI) for germination/mortality rate and conducted analyses on the correlation between these RIs with microbial compositions.

      Thanks so much. Response index (RI) was calculated as:

      (variablenonsterile–variablesterile)/variablesterile)). Because mortality rates of some sterile groups were zero values, it is impossible to calculate their RIs. Relatively, only leaf microbes affect seed germination time (GT), leaf and soil microbes did not affect germination rate (GR) (see Fig. 1a,b). Therefore, we preferred to make a direct comparison of the difference between nonsterile and sterile treatments (also see Figure 1d) to assess microbial effect, and we also conducted a correlation by these values with microbial compositions rather than by RIs (see Fig. 3). We emphasized this point in the Materials and Methods in our resubmitted revision as: “Because the mortality rates of some sterile groups were zero and their RIs were impossible to calculate, we had to directly compare the seedling mortality caused by nonsterile with by sterile samples and perform the analysis of correlation between the mortality rate and microbial composition.” in Line 565-568. 

      (4) The language of the manuscript could be improved to increase clarity.

      We have improved language in the resubmitted version.

      Reviewer #2 (Public Review):

      Summary: 

      The study provides strong evidence that leaf microbes mediate self-limitation at an early life stage. It highlights the importance of leaf microbes in population establishment and community dynamics. 

      Thank you very much for your assessment.

      The authors conducted three experiments to test their hypothesis, elucidating the effects of leaf and soil microbial communities on the seedling growth of A. adenophora at different stages, screening potential microbial sources associated with seed germination and seedling performance, and identifying the fungus related to seedling mortality. The conclusions are justified by their results. Overall, the paper is wellstructured, providing clear and comprehensive information.

      Thank you very much for your assessment.

      Reviewing Editor (Recommendations For The Authors):

      In addition to the assessments from the reviewers, we have the following comments on your paper:

      (1) The experimental design is complicated with regard to the multiple interacting treatments. The statistical analyses show that the interaction terms are important and significant. In this case, it could be more informative to show the detailed results at the sub-level than at the main level in the main text. For example, the main effects of inoculation sources and nutrients shown in Figure 2 are difficult to interpret, because the effects of inoculation sources and nutrients have important dependencies with each other and other factors such as inoculation time as shown in Figure S3. Therefore, Figure S3 is more informative than Figure 2. Please also be cautious that it would be necessary to clarify this context dependence when showing and citing results of the main effect to avoid any possible misunderstanding, such as the case of Figure 2 and S3.

      Thanks for your suggestion. We have deleted Figure 2 and placed Figure S3 in the text as Figure 2. And corresponding results have rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (2) Response index (RI) is already a measure of microbial feedback effect, so that feedback may not be necessary as an explanatory variable in the model with RI as the response variable.

      We are sorry that our writing misunderstood you. Here the word “feedback” (e.g., foliage- or soil feedback) does not represent microbial feedback effect, it means leaf or soil inoculation. We have replaced “feedback” by “inoculation source” in the figures and text for better understanding.

      (3) Mortality rate is a ratio. It is unclear whether assuming a Gaussian error distribution is fine in your case. It would be important to check the residual distribution and to see whether data transformation (e.g., log) or using other error assumptions (e.g., binomial) is necessary.

      Thanks for your suggestion. As you say, it is not appropriate to use generalized linear models (GLMs) with Gaussian error distributions (identity link) to evaluate seedling mortality, because mortality rate is a ratio, which do not meet normality. Thus, we deleted the result of GLM of seedling mortality and directly compared seedling mortality between different microbial treatments, inoculation time, nutrition level and inoculation source by Mann–Whitney U test and Kruskal–Wallis test (see Fig.1 d). All corresponding results have also been rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (4) Please be consistent about the wording of different treatment names throughout the texts, tables, and figures. For example, "feedback" should only be used for microbial treatment, but not for inoculation source treatment (e.g., Figure 2). We can say there is an effect of microbial feedback only if we compare sterile vs. non-sterile groups, otherwise, there could be other effects, for example, the allelopathic effect pointed out by Reviewer #1. When writing inoculation, please be specific about whether it is for inoculation time or inoculation source (e.g., within multiple statistical tables in the appendix).

      Thanks for your good suggestion. We have changed “different feedback” into “different inoculation source” for better understanding our story.

      (5) Please clarify which inoculation periods they are for Figures 1d-g.

      Thanks for your good suggestion. We have added inoculation periods in Fig.1.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:

      Lines 12-15: This sentence is too long and complicated, making it unclear what had been done and what had not in previous studies.

      Thanks a lot. We have reorganized this sentence as: “However, how the phyllosphere and rhizosphere soil microbes distinctively affect seedling mortality and the growth of invasive plants across ontogeny under varying soil nutrient levels remains unclear.”.

      Line 19: is it appropriate to use "enrich" here?

      Thanks. We have changed “Microbial inoculation at different growth stages altered the microbial community and functions enriched in seedlings” into “Microbial inoculation at different growth stages altered the microbial community and functions of seedlings”.

      Line 24-25: "litter exhibited phylogenetic signals"? not clear what this means.

      Thanks. Significant phylogenetic signals represent the seedling-killing effects of fungal strains on A. adenophora were related to phylogenetic relatedness of these strains. So, we have changed “fungal strains isolated from dead seedlings inoculated with litter exhibited significant phylogenetic signals to seedling mortality” into “the A. adenophora seedling-killing effects of fungal strains isolated from dead seedlings by non-sterile leaf inoculation exhibited significant phylogenetic signals, by which strains of Allophoma and Alternaria generally caused high seedling mortality.”

      Line 29: using "in turn" in the first sentence seems weird.

      We deleted this.

      Lines 32-33: PSFs are usually positive because of?

      We have changed “PSFs have positive effects by escaping soil pathogens and recruiting some beneficial microbes” into “PSFs are usually positive because of escaping soil pathogens and recruiting some beneficial microbes”.

      Line 54: why emphasize "a single soil microbe"?

      Although the research of Geisen et al., (2021) assessed the effect of each strain of 34 isolates on seed germination and plant growth, Jevon et al., (2020) focused on the soil microbial community on seedling and adult plants survival. Thus, we changed “a single soil microbe” into “soil microbes”.

      Lines 85-86: "tested their mortality to seedlings"? not clear what this means.

      We are so sorry that our writing misunderstood you. We have changed “we also isolated the fungi associated with the dead seedlings and tested their mortality to seedlings.” into “we also isolated the fungi associated with the dead seedlings and tested their seedling-killing effects on A. adenophora.”.

      Results: no statistics and no references for the statistical tables that could support the results were presented in this section.

      We have deleted the inappropriate generalized linear models (GLMs) with Gaussian error distributions (identity link) for evaluating seedling mortality, and all corresponding results have also described (see Line 109-115 and Fig. 1d).

      Lines 100-102: this subtitle reads more like a summary of the following results than a title. All subtitles in the Result section have similar issues (i.e. Lines 148-150, 207-209).

      Thanks, we subdivided our Results into four sections and we changed these subtitles as:” Effects of leaf litter and rhizosphere soil on the mortality and growth of A. adenophora seedlings”, “Correlations of microbial community composition and potential function with seedling mortality at the early stage”, “Enrichment of microbial community and function by A. adenophora seedlings under different treatments”, and “Correlations of the enriched microbial community and function with A. adenophora seedling growth”.  

      Lines 148-206: since there are a lot of results concerning the microbial composition, I suggest focusing on those that could directly explain the positive or negative feedback. The one concerning diversity (e.g. Figure 3 and corresponding texts) does not seem necessary.

      Thanks for your suggestion. We have moved figure 3 into the supplementary figures as Figure S2. To focus on core microbes that could directly explain the positive or negative feedback, we reordered Figure 3, where firstly showed the core soil and leaf bacteria, bacterial functions, as well as core soil and leaf fungi, fungal function (Fig3 a-h); and then showed the correlations of top 30 bacterial and fungal genera from soil and leaf with seedling mortality rate (Fig3 i-j). 

      Line 180: is it not common sense that ectomycorrhiza can only be found in soil?

      Yeah, it is. We have deleted this sentence.

      Line 199: "the seedling mortality of these strains"? not clear what this means,

      We have changed “The seedling mortality of these strains” into “The seedling-killing of these strains on A. adenophora”.

      Line 291-292: I don't see how the authors can distinguish between allelopathic and pathogenic effects based on their results.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301.

      Lines 383-414: Correlations are not necessarily causations. Sometimes a strong correlation may result from higher-order interaction. The authors should be more cautious about the discussion of microbial function in this section.

      Thanks. We deleted all descriptions of adverse effect or beneficial effect on host plant A. adenophora growth and cautiously used “negative correlation or positive correlation” to discuss the functions of these enriched microbes by A. adenophora. In the last, we also added a sentence to say: “It is necessary to isolate these enriched microbes to test the interactions with the early life stage of A. adeonophora.”

      (see Line 411-413).

      Lines 489-490: I don't really understand why the authors performed a combination treatment. What did they expect from such a combination?

      Thanks. We described our consideration as: “Leaf inoculation at G28 was performed to simulate natural microbial spread from the leaf litter to the above part of the seedlings by suspending the leaf bag over the transplanted seedlings without direct contact all the time (see Zaret et al. (2021)). This method may result in only microbial species with easy air transmission to infect seedlings. Thus, an additional combination inoculation (named G21+28) was performed on both the 21st (with seedling contact) and 28th days (without seedling contact) to ensure that most leaf microbes had the opportunity to reach the seedlings.” see Line 498-505.

      Figure 1: why not use "mortality rate" instead of "death rate"?

      Thanks. We have changed “death rate” into “mortality rate” in all corresponding figures and text.

      Figure 8: This is a very complicated experimental setup. Why did the authors harvest the plants treated with nutrient addition after the 12th day of the experiment and harvest those without nutrient addition after the 16th day? Why the time lag?

      Thanks. We explained this as: “Seedlings were harvested after 8 weeks of growth under high-nutrient conditions because they grew too fast and touched the PTFE cover; however, we harvested those plants grown under low-nutritional conditions after another 4 weeks of growth due to their very small size (see Fig. S6).”

      (see Method in Line 514-517).

    1. Reviewer #1 (Public Review):

      Summary:

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells.

      Strengths:

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC.

      Weaknesses:

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses. Additionally, methods and statistical analyses require further clarification.

      Main comments:

      (1) Novelty<br /> The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234).

      (2) Mouse data<br /> a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneths cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis.<br /> b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was strikingly low (17 cells, Figure 3), and assuming that these cells are driving the differences seems somewhat far-fetched. Adding to this concern is inappropriate gating in the flow plot shown in Figure 6. This should be addressed experimentally and in the interpretation of data.<br /> c) Figures 3, 5, and 6 show the individual gene analyses with unclear statistical data. It seems that the p-values were not adjusted, and it is unclear how they reached significance in several graphs. Additionally, it was not stated how many animals per group and cells per animal/group were included in the analyses.<br /> d) Figure 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis.<br /> e) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.?

      (3) Human data<br /> Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage?

      (4) Bioinformatic analysis<br /> The authors did not provide sufficient information on bioinformatic analyses. The authors did not include information about the software, cutoffs, or scripts used to make their analyses or output those figures in the manuscript, which challenges the interpretation and assessment of the results. Terms like "Quantitative gene expression analyses" (line 136) "visualized in a Uniform Approximation and Projection" (line 178) do not explain what was inputted and the analyses that were executed. There are multiple forms to align, preprocess, and visualize bulk, single cell, ATAC, and ChIP-seq data, and depending on which was used, the results vary greatly. For example, in the single-cell data, the authors did not inform how many cells were sequenced, nor how many cells had after alignment and quality filtering (RNA count, mt count, etc.), so the result on Paneth+ to Goblet+ percent in lines 184 and 185 cannot be reached because it depends on this information. The absence of a clustering cutoff for the single-cell data is concerning since this greatly affects the resulting cluster number (https://www.nature.com/articles/s41592-023-01933-9). The authors should provide a comprehensive explanation of all the data analyses and the steps used to obtain those results.

      (5) Clarity of methods and experimental approaches<br /> The methods were incomplete and they require clarification.

    2. eLife assessment

      This valuable study proposes that protein secreted by colon cancer cells induces cells with Paneth-like properties that favor colon cancer metastasis. The evidence supporting the conclusions is incomplete and would benefit from more direct experiments to test the functional role of Paneth-like cells and to monitor metastasis from colon tumors. The work will be of interest to researchers studying colon cancer metastasis.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9.

      Strengths:

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context.

      Weaknesses:

      In this work by Shin et al., the authors expand upon prior work regarding the role of Dickkopf-2 in colorectal cancer (CRC) progression and the necessity of a Paneth-like population in driving CRC metastasis. The general topic of metastatic requirements for colon cancer is of general interest. However, much of the work focuses on characterizing cell populations in a mouse model of hepatic outgrowth via splenic transplantation. In particular, the concept of Paneth-like cells is primarily based on transcriptional programs seen in single-cell RNA sequencing data and needs more validation. Although including human samples is important for potential generality, the strength could be improved by doing immunohistochemistry in primary and metastatic lesions for Lyz+ cancer cells. Experiments that further bolster the causal role of Paneth-like CRC cells in metastasis are needed.

    1. eLife assessment

      Through a genome-wide screen for functional alternative transcription start sites (TSS) in Arabidopsis, the authors provide evidence for widespread transcription of potential microproteins from previously annotated protein-coding genes. Functional analysis of AtHB2-miP, derived from the C-terminal region of transcription factor AtHB2 and predicted to form non-productive dimers with ATHB2, suggested that this microprotein could affect AtHB2 functions in shade responses, root growth, and iron homeostasis. The work is valuable as a case study of how new microproteins could act to modulate gene regulation in response to environmental change, but the focus on a single gene, the lack of precision in AtHB2-miP measurement and missing controls, and the relatively minor phenotypic effects mean that data supporting microprotein production as a vital regulatory strategy are incomplete.

    1. eLife assessment

      This valuable study reports a novel function of ATG14 in preventing pyroptosis and inflammation in oviduct cells, thus allowing smooth transport of the early embryo to the uterus and implantation. However, the data supporting the main conclusion remain incomplete. This work will be of interest to reproductive biologists and physicians practicing reproductive medicine.

    2. Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+;Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles of ATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 is knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide the verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers to better understand your experiments. Alternatively, you can add it in the introduction.

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using PrCre and also Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provides partial evidence about a new mechanism linking Atg14 to pyropotosis.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so does that contribute to the observation?

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Difference in membrane potential etc. for a definitive conclusion.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

    5. Author response:

      Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+; Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      We thank the reviewer for insightful comments and helpful suggestions. We will address majority of the concerns. Specifically, we will evaluate whether loss of Atg14 leads pyroptosis in other reproductive tract tissue, uterus, and ovary. To determine the ATG14 spatiotemporal expression, we will assess the ATG14 expression in oviducts of WT, and cKO mouse models. Further, to understand the impact of Atg14 loss on different regions of oviduct, we would provide additional images from cKO mice and will quantify FOXJ1 positive cells. To address the concerns on cyclicity and steroid hormone levels, we will measure the E2 or P4 levels and assess E2-target genes in uterus from control and cKO mice. We will also include the ampullary section images from the oviducts of Atg14 cKO and control females.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles ofATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      We agree with the reviewer on this, we will take a cautious approach and will modify the statements that ATG14 dependent autophagy might be critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport.

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 has knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      To address Atg14 action in oviduct, we will perform ATG14 IHC staining in the oviduct and also evaluate the GSDMD expression in uteri and ovary, wherein PR-cre expression is active. Further, we will provide literature-based evidence for PR-cre expression in the oviduct, which is well-established. However, generating a secretory Pax-8 cell cre mice model will require a substantial amount of time and effort and we respectfully argue that this is currently out of the scope of this manuscript.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide the verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      We will perform expression analysis for ATG14 in Foxj1/Atg14 cKO mice to determine the effective ablation in cilia.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in the control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      We will correct the labels and include high-magnification images to explain the morphological differences in HESC cells..

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      Dr. Ramya Masand, Chief Pathologist in our department and a contributing author, critically evaluated the stained sections from Figure 3 and provided the pathological assessment as outlined in lines 332-336. We will consult Dr. Masand and will modify the statements accordingly.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      We will calculate the average number of embryos from the oviduct and uterus and provide numerical data.

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      We will perform the GSDMD staining to determine whether Polyphyllin VI treatment resulted in oviductal pyroptosis activation and lumen disruption.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers better understand your experiments. Alternatively, you can add it in the introduction.

      We will include more literature-based discussion on pyroptosis and its mechanism.

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      We will provide definitions for the acronyms such as FRT, HESCs, and GSDMD.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

      We will correct the statement.

      A few of the additional comments for authors to consider improving the manuscript are listed below.

      (1) Some of the figures are missing scale bars, while others have inconsistent scale bars. It would be better to be consistent.

      (2) On a couple of occasions, the DAPI signal cannot be seen, such as in Figure 2B and Figure 3D.

      (3) Overall, the figure legends can be improved to provide more detailed information to help the reader to interpret the data.

      As suggested, we will include the scale bars with high quality images and will elaborate the figure legends text.

      (4) In Figure 2D, the Y-axis showed the stimulated/unstimulated uterine weight ratio, why did the author put "Atg14" at the top of the graph? At the same time, the X-axis title is missing in Figure 2D.

      (5) In the left panel of Figure 2G, "ATG14" at the top should be "Atg14" to be consistent.

      (6) In line 559, there miss "(A)" in front of Immunofluorescence analysis of GSDMD.

      We will make these necessary changes.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using Pr Cre and Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provides spatial evidence about a new mechanism linking Atg14 to pyroptosis.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      We will carry out GSDMD staining in uterine tissues and discuss the findings.

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so, does that contribute to the observation?

      We will assess the expression of autophagy-related markers in response to Atg14 loss and will discuss the findings. 

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      We agree with reviewer on cellular plasticity aspect, we will remove the plasticity word, instead will mention only integrity.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Diff erence in membrane potential etc. for a definitive conclusion.

      We will perform additional mitochondrial staining to determine the mitochondrial morphology in both the oviduct and uterus. Based on the results, we would consider measuring the Feret's diameters. However, we respectfully argue that performing complex membrane potential studies will take time and are beyond the scope of current focus.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      As shown in Figure 3E, staining the oviduct epithelia with KRT8 clearly showed a disorganized oviduct with abnormally fused cells leaving no lumen space.  We could provide higher magnification images in supplementary figures to highlight this observation.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

      Autophagy has been shown to inhibit pyroptosis by either inhibiting the cleavage of GSDMD or by suppressing various pyroptosis-related factors, including NFLRs and STING proteins. We found that the loss of Atg14 results in elevated GSDMD levels, a potential mechanism through which Atg14 suppresses pyroptosis in the oviduct. Importantly, Atg14 may regulate GSDMD through several intermediary factors, and resolving this intricate nexus necessitates conducting complex biochemical, cellular, and molecular screenings, which is one of the focus of our future investigations.

    1. eLife assessment

      This study is a computational analysis using publicly available deep sequencing datasets and the findings support the models that propose widespread gene transfer amongst DNA viruses. The evidence supporting the claims of the authors is solid, but reproducing the analysis based only on the information as presented in the Materials and Methods would be difficult as the data are currently presented. A Flow chart that details the process would help. This is an almost entirely computational study without experimental evidence but one that has the potential to become a fundamental resource for virus hunters - an activity of increasing importance.

    2. Reviewer #1 (Public Review):

      This paper discusses the identification of viral genes in publicly available DNA and RNA sequencing datasets. In many cases, these datasets have been assembled into contigs. Many viral genes were identified and contigs containing genes from more than one type of virus were more common than expected. The analysis appears to be sound and the results presented will be of great interest to the community.

      The strengths of the paper are in the analysis itself, which is detailed, complex, and on a very large scale. To my knowledge, the identification of DNA viral proteins in sequencing datasets not deliberately infected with viruses has not previously been performed on this scale. Many proteins were identified which are at the limit of our current capacity to detect divergent proteins. I think the use of multiple methodologies strengthens the study, as it increases the depth of the results. The authors are also clear about the limitations of their study and give many caveats about their results, which is excellent.

      I have two major concerns about the study. The first is the presentation, which in places makes it difficult to tell exactly how and why the analysis has been performed. I do not think it would be possible to reproduce this analysis based only on the information presented in the Materials and Methods section. This makes it difficult to assess the exact details of the method and whether they are appropriate. I would appreciate something like a flow chart to show, for each SRA dataset and each assembled contig, the exact steps taken for classification and the hierarchy of tools, plus the threshold values, applied to the results. An overview of the results at the beginning of the results section would also be helpful - how many proteins were identified, what were their host species, how many contigs were assembled and how many of these were chimeric, etc.

      My second concern is that it is not clear how each protein was determined to be either viral or non-viral or how contigs were assigned as chimeric or non-chimeric. Positive and negative controls are not mentioned and false positive or negative rates are not calculated. Given that many of the identified proteins are highly divergent from known viral proteins, it would be good to see how likely it is that a random protein would be assigned as viral, or a viral protein as non-viral. Chimeric contigs could occur due to misassembly or endogenous viral elements, it seems like viruses in these categories may have been filtered using Cenote Taker but no checks are described to confirm that the filtering was successful.

      Overall, I think that the study is useful and of interest, but I think more clarity in the presentation of the results would increase the value of the paper for many readers.

    3. Reviewer #2 (Public Review):

      Summary:

      A large-scale computational analysis of published sequences of various animal species provides evidence for extensive gene transfer amongst DNA viruses.

      Strengths:

      The study provides evidence for a large number of previously uncharacterized DNA viruses and supports a model whereby DNA viruses have evolved by combining distinct shared replication modules and some of these evolutionary oddities likely remain in the biosphere. The work provides a useful repository and potential framework for additional virus discovery efforts.

      Weaknesses:

      This is an entirely computational story, with very limited experimental validation. A large number of often confusing new acronyms are introduced that may be "cute" (such as the reference to the delicious half-smoke sausage) but are not particularly useful. This is not helped by the somewhat "telegraphic" presentation of the data that is sometimes difficult to digest. Not all paragraphs deliver what they promise. For example under the title "Polyomaviruses and papillomaviruses" there is no discussion of papillomaviruses. Overall, however, these weaknesses do not diminish my enthusiasm for this paper, which will be an important resource for computational and non-computational virus hunters.

    4. Reviewer #3 (Public Review):

      Summary:

      Buck et al., set out to characterize small DNA tumor viruses through the generation and analysis of ~100,000 public sequencing datasets from the SRA and other databases. Using a variety of powerful bioinformatic methods including alignment-based searches, statistical modelling, and structure-aware detection, the authors successfully classify novel protein sequences which support the occurrence of evolutionary gene transfer between DNA virus families. The authors propose a naming scheme to better capture viral diversity and uncover novel chimeric viruses, those containing genes from multiple established virus families. Additional analysis using the generated dataset was performed to search for DNA and RNA viruses of interest, demonstrating the utility of generated datasets for exploratory screens. The assembled sequencing datasets are publicly available, providing invaluable resources for current and future investigations within this subfield.

      Strengths:

      The scope of data analysis (100,000+ SRA records and additional libraries) is substantial, and the authors have contributed to further insight into the modularity of previously uncharacterized viral genomes, through computationally demanding advanced bioinformatics analyses in addition to extensive manual inspection.

      The publicly available resources generated as a result of these analyses provide useful data for further experiments to inspect viral diversity and modularity. Other scanning experiments and further investigation of biologically relevant viruses using these contigs may uncover, for example, animal reservoirs or novel recombinant viruses of significance.

      Novel instances of genomic modularity provide excellent starting points for understanding virus evolutionary pathways and gene transfer events.

      Weaknesses:

      Overall, the methods section of this paper requires more detail.

      The inclusion criteria for which "SRA" datasets were or were not utilized within this study are poorly defined. This means the comprehensiveness of the study for a given search space of the SRA is not defined, and the results are ultimately not reproducible, or expandable. For example, are all vertebrate RNA-seq samples processed? Or just aquatic vertebrate RNA-seq? Were samples randomly sampled from a more comprehensive data set? What is the make-up of the search space and how much was DNA-seq or RNA-seq? This section should be expanded and explicit accounting provided for how dataset selection was performed. This would provide additional confidence in the results and conclusions, as well as allow for future analysis to be conducted.

      Hallmark virus genes require further clarification, as it is unclear what genes are utilized as bait, or in the initial search process. The reported "Hallmark gene sets" are not described in a systematic way. What is the sensitivity and specificity of these gene sets? Was there a validation of the performance characteristics (ROC) for this gene set with different tools? How is this expected to be utilized? Which kinds of viruses are excluded/missed? Are viroids included?

      For the Tailtomavirus, additional information is needed for sufficient confidence. Was this "chimeric" genomic arrangement detected in a single library? This raises a greater issue of how technical artifacts, which may appear as chimeric assemblies, are ruled out in the workflow. If two viral genomes share a k-mer of length greater than the assembly k, the graph may become merged. Are there read pairs that span all regions of the genome? Is there evidence for multiple homologous viruses with synteny between them that supports the combination of these genes as an evolving genome, or is this an anomalous observation? Read alignments should be included and Bandage graph visualization for all cases of chimeric assemblies and active steps to disprove the baseline hypotheses that these are technical artifacts of genome assembly.

      Justification for exclusion of endogenized sequences is not included and must be described, as small DNA tumor viruses may endogenize into the host genome as part of their life cycle. How is such an integration resolved from an evolutionary "endogenization"? What's the biological justification for this step?

      Additional supporting information, clear presentation, and context are needed to strengthen results and conclusions.

      Basic reporting of global statistics, such as the total number of viruses found per family, should be included in the main text to better support the scope of the results. How many viruses (per family) were previously known, and therefore what is the magnitude of the expansion performed here?

      Additional parameters and information should be included in bioinformatic tool outputs to provide greater clarity and interpretation of results. For example, reporting the "BLASTp E-val", as for the PolB homology (BLASTp 6E-12) is not informative, and does not tell the reader this is (we assume) an expectancy value. For each such case please report, the top database hit accession, percent identity, query coverage, and E-value. Otherwise, a judgment cannot be adequately made regarding the quality of evidence for homology. Similarly, for HHpred what does the number represent - confidence, identity, or coverage?

      Some findings described in the Results section may require revision. Several of the Nidoviruses (Nidovirus takifugu, Nidovirus hypomesus, Nidovirus ambystoma, etc...) have been previously described by three groups, first by Edgar et al., (https://www.nature.com/articles/s41586-021-04332-2), then Miller et al., (https://academic.oup.com/ve/article/7/2/veab050/6290018) and then Lauber et al., (https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1012163). This is now the 4th description of the same set of viruses. These sequences are in GenBank (https://www.ncbi.nlm.nih.gov/nuccore/OV442424.1), although it is unclear why they're not returned as BLAST hits. Miller also described the Togavirus co-segment previously.

      It is also uncertain what is being described with HelPol/maldviruses which was not previously described in distantly similar relatives. How many were described in the previous literature and how many are described by this work?

      Co-phylogenies should be used to convey gene transfer and flow clearly to support the conclusions made in the text.

      Statements such as, "The group encompasses a surprising degree of genomic diversity...", should be supported by additional information to strengthen conclusions (e.g., what the expected diversity is). What is the measurement for genomic diversity here, and why is this surprising? There is overall a lack of quantification to support the conclusions made throughout the paper.

    1. eLife assessment

      This study investigates the role of queuosine (Q) tRNA modification in aminoglycoside tolerance in Vibrio cholerae and presents convincing evidence to conclude that Q is essential for the efficient translation of TAT codons, although this depends on the context. The absence of Q reduces aminoglycoside tolerance potentially by reprogramming the translation of an oxidative stress response gene, rxsA. Overall, the findings point to an important mechanism whereby changes in Q modification levels control the decoding of mRNAs enriched in TAT codons under antibiotic stress.

    2. Reviewer #1 (Public Review):

      Summary of the work: In this work, Fruchard et. al. study the enzyme Tgt and how it modifies guanine in tRNAs to queuosine (Q), essential for Vibrio cholerae's growth under aminoglycoside stress. Q's role in codon decoding efficiency and its proteomic effects during antibiotic exposure is examined, revealing Q modification impacts tyrosine codon decoding and influences RsxA translation, affecting the SoxR oxidative stress response. The research proposes Q modification's regulation under environmental cues reprograms the translation of genes with tyrosine codon bias, including DNA repair factors, crucial for bacterial antibiotic response.

      The experiments are well-designed and conducted and the conclusions, for the most part, are well supported by the data. However, a few clarifications will significantly strengthen the manuscript.

      Major:<br /> Figure S4 A-D. These growth curves are important data and should be presented in the main figures. Moreover, given that it is not possible to make a rsxA mutant, I wonder if it would be possible to connect rsx and tgt using the following experiment: expression of tgt results in resistance to TOB (in B), while expression of only rsx lower resistance to TOB (in D). Then simultaneous overexpression of both tgt/rsx in the WT strain should have either no effect on TOB resistance or increased resistance, relative to the WT. Perhaps the authors have done this, and if so, the data should be included as it will significantly strengthen their model.

      Figure S4 - Is there a rationale for why it is possible to make rsx mutants in E. coli, but not in V. cholerae? For example, does E. coli have a second gene/protein that is redundant in function to rsxA, while V. cholerae does not? I think your data hint at this, since in the right panel growth data, your double mutant does not fully rescue back to rsx single mutant levels, suggesting another factor in tgt mutant also acts to lower resistance to TOB. If so, perhaps a line or two in text will be helpful for readers.

      -For growth curves in Figure 2 and relative comparisons like in Figure 5D and Figure S4 (and others in the paper), statistics and error bars, along with replicate information should be provided.

      -Figure 6A - Is the transcript fold change in linear or log? If linear, then tgt expression should not be classified as being upregulated in TOB. It is barely up by ~2-fold with TOB- 0.6....which is a mild phenotype, at best.

      -Line 779- 780: "This indicates that sub-MIC TOB possibly induces tgt expression through the stringent response activation." To me, the data presented in this figure, do not support this statement. The experiment is indirect.

      -Figure 3B and D. - These samples only have tobramycin, correct? The legend says both carbenicillin and tobramycin.

      -Figure 5. The color schemes in bars do not match up with the color scheme in cartoons below panels B and C. That makes it confusing to read. Please fix.

      -A lot of abbreviations have been used. This makes reading a bit cumbersome. Ideally, less abbreviations will be used.

    3. Reviewer #2 (Public Review):

      Fruchard et al. investigate the role of the queuosine (Q) modification of the tRNA (Q-tRNA) in the human pathogen Vibrio cholerae. First, the authors state that the absence of Q-modified tRNAs (tgt mutant) increases the translation of TAT codons and proteins with a high TAT codon bias. Second, the absence of Q increases rsxA translation, because rsxA gene has a high TAT codon bias. Third, increased RsxA in the absence of Q inhibits SoxR response, reducing resistance towards the antibiotic tobramycin (TOB). Authors also predict in silico which genes harbor a higher TAT bias and found that among them are some involved in DNA repair, experimentally observing that a tgt mutant is more resistant to UV than the wt strain. It is worth noting that authors employ a wide variety of techniques, both experimental and bioinformatic. However, some aspects of the work need to be clarified or reevaluated.

      (1) The statement that the absence of Q increases the translation of TAT codons and proteins encoded by TAT-enriched genes presents the following problems that should be addressed:

      (1.1) The increase in TAT codon translation in the absence of Q is not supported by proteomics, since there was no detected statistical difference for TAT codon usage in proteins differentially expressed. Furthermore, there are some problems regarding the statistics of proteomics. Some proteins shown in Table S1 have adjusted p-values higher than their p-values, which makes no sense. Maybe there is a mistake in the adjusted p-value calculation. In addition, it is not common to assume that proteins that are quantitatively present in one condition and absent in another are differentially abundant proteins. Proteomics data software typically addresses this issue and applies some corrections. It would be advisable to review that.

      (1.2) Problems with the interpretation of Ribo-seq data (Figure 4D). On the one hand, the Ribo-seq data should be corrected (normalized) with the RNA-seq data in each of the conditions to obtain ribosome profiling data, since some genes could have more transcription in some of the conditions studied. In other articles in which this technique is used (such as in Tuorto et al., EMBO J. 2018; doi: 10.15252/embj.201899777), it is interpreted that those positions in which the ribosome moves most slowly and therefore less efficiently translated), are the most abundant. Assuming this interpretation, according to the hypothesis proposed in this work, the fragments enriched in TAT codons should have been less abundant in the absence of Q-tRNA (tgt mutant) in the Rib-seq experiment. However, what is observed is that TAT-enriched fragments are more abundant in the tgt mutant, and yet the Ribo-seq results are interpreted as RNA-seq, stating that this is because the genes corresponding to those sequences have greater expression in the absence of Q. On the other hand, it would be interesting to calculate the mean of the protein levels encoded by the transcripts with high and low ribosome profiling data.

      (1.3) This statement is contrary to most previously reported studies on this topic in eukaryotes and bacteria, in which ribosome profiling experiments, among others, indicate that translation of TAT codons is slower (or unaffected) than translation of the TAC codons, and the same phenomenon is observed for the rest of the NAC/T codons. This is completely opposed to the results showed in Figure 4. However, the results of these studies are either not mentioned or not discussed in this work. Some examples of articles that should be discussed in this work:<br /> - "Queuosine-modified tRNAs confer nutritional control of protein translation" (Tuorto et al., 2018; 10.15252/embj.201899777)<br /> - "Preferential import of queuosine-modified tRNAs into Trypanosoma brucei mitochondrion is critical for organellar protein synthesis" (Kulkarni et al., 2021; doi:10.1093/nar/gkab567.<br /> - "Queuosine-tRNA promotes sex-dependent learning and memory formation by maintaining codon-biased translation elongation speed" (Cirzi et al., 2023; 10.15252/embj.2022112507)<br /> - "Glycosylated queuosines in tRNAs optimize translational rate and post-embryonic growth" (Zhao et al., 2023; 10.1016/j.cell.2023.10.026)<br /> - "tRNA queuosine modification is involved in biofilm formation and virulence in bacteria" (Diaz-Rullo and Gonzalez-Pastor, 2023; doi: 10.1093/nar/gkad667). In this work, the authors indicate that Q-tRNA increases NAT codon translation in most bacterial species. Could the regulation of TAT codon-enriched proteins by Q-tRNAs in V. cholerae an exception? In addition, authors use a bioinformatic method to identify genes enriched in NAT codons similar to the one used in this work, and to find in which biological process are involved the genes whose expression is affected by Q-tRNAs (as discussed for the phenotype of UV resistance). It will be worth discussing all of this.

      (1.4) It is proposed that the stress produced by the TOB antibiotic causes greater translation of genes enriched in TAT codons. On the one hand, it is shown that the GFP-TAT version (gene enriched in TAT codons) and the RsxA-TAT-GFP protein (native gene naturally enriched in TAT) are expressed more, compared to their versions enriched in TAC in a tgt mutant than in a wt, in the presence of TBO (Fig. 5C). However, in the absence of TOB, and in a wt context, although the two versions of GFP have a similar expression level (Fig. 3SD), the same does not occur with RsxA, whose RsxA-TAT form (the native one) is expressed significantly more than the RsxA-TAC version (Fig. 3SA). How can it be explained that in a wt context, in which there are also tRNA Q-modification, a gene naturally enriched in TAT is translated better than the same gene enriched in TAC? It would be expected that in the presence of Q-tRNAs the two versions would be translated equally (as happens with GFP) or even the TAT version would be less translated. On the other hand, in the presence of TOB the fluorescence of WT GFP(TAT) is higher than the fluorescence of WT GFP(TAC) (Figure S3E) (mean fluorescence data for RsxA-GFP version in the presence of TOB is not shown). These results may indicate that the apparent better translation of TAT versions could be due to indirect effects rather from TAT codon translation.

      (2) Another problem is related to the already known role of Q in prevention of stop codon readthrough, which is not discuss at all in the work. In the absence of Q, stop codon readthrough is increased. In addition, it is known that aminoglycosides (such as tobramycin) also increase stop codon readthrough ("Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides"; Wanger and Green, 2023; 10.7554/eLife.52611). Absence of Q and presence of aminoglycosides can be synergic, producing devastating increases in stop codon readthrough and a large alteration of global gene expression. All of these needs to be discussed in the work. Moreover, it is known that stop codon readthrough can alter gene expression and mRNA sequence context all influence the likelihood of stop codon readthrough. Thus, this process could also affect to the expression of recoded GFP and RsxA versions.

      (3) The statement about that the TOB resistance depends on RsxA translation, which is related to the presence of Q, also presents some problems:

      (3.1) It is observed that the absence of tgt produces a growth defect in V. cholerae when exposed to TOB (Figure 1A), and it is stated that this is mediated by an increase in the translation of RsxA, because its gene is TAT enriched. However, in Figure S4F, it is shown that the same phenotype is observed in E. coli, but its rsxA gene is not enriched in TAT codons. Therefore, the growth defect observed in the tgt mutant in the presence of TOB may not be due to the increase in the translation of TAT codons of the rsxA gene in the absence of Q. This phenotype is very interesting, but it may be related to another molecular process regulated by Q. Maybe the role of Q in preventing stop codon readthrough is important in this process, reducing cellular stress in the presence of TOB and growing better.

      (3.2) All experiments related to the effect of Q on the translation of TAT codons have been performed with the tgt mutant strain. Considering that the authors have a pSEVA-tgt plasmid to overexpress this gene, they would have to show whether tgt overexpression in a wt strain produces a decrease in the translation of proteins encoded by TAT-enriched genes such as RsxA. This experiment would allow them to conclude that Q reduces RsxA levels, increasing resistance to TOB.

      (3.3) On the other hand, Fig. 1B shows that when the wt and tgt strains compete, both overexpressing tgt, the tgt mutant strain grows better in the presence of TOB. This result is not very well understood, since according to the hypothesis proposed, the absence of modification by Q of the tRNA would increase the translation of genes enriched in TAT, therefore, a strain with a higher proportion of Q-modified tRNAs as in the case of the wt strain overexpressing tgt would express the rsxA gene less than the tgt strain overexpressing tgt and would therefore grow better in the presence of TOB. For all these reasons, it would be necessary to evaluate the effect of tgt overexpression on the translation of RsxA.

      (3.4) According to Figure 1I, the overexpression of tRNA-Tyr(GUA) caused a better growth of tgt mutant in comparison to WT. If the growth defect observed in tgt mutant in the presence of TOB is due to a better translation of the TAT codons of rsxA gene, the overexpression of tRNA-Tyr(GUA) in the tgt mutant should have resulted in even better RsxA translation a worse growth, but not the opposite result.

      (4) It cannot be stated that DNA repair is more efficient in the tgt mutant of V. cholerae, as indicated in the text of the article and in Fig 7. The authors only observe that the tgt mutant is more resistant to UV radiation and it is suggested that the reason may be TAT bias of DNA repair genes. To validate the hypothesis that UV resistance is increased because DNA repair genes are TAT biased, it would be necessary to check if DNA repair is affected by Q. UV not only produces DNA damage, but also oxidative stress. Therefore, maybe this phenotype is due to the increase in proteins related to oxidative stress controlled by RsxA, such as the superoxide dismutase encoded by sodA. It is also stated that these repair genes were found up for the tgt mutant in the Ribo-seq data, with unchanged transcription levels. Again, it is necessary to clarify this interpretation of the Ribo-seq data, since the fact that they are more represented in a tgt mutant perhaps means that translation is slower in those transcripts. Has it been observed in proteomics (wt vs tgt in the absence of TOB) whether these proteins involved in repair are more expressed in a tgt mutant?

      (5) The authors demonstrate that in E. coli the tgt mutant does not show greater resistance to UV radiation (Fig. 7D), unlike what happens in V. cholerae. It should be discussed that in previous works it has been observed that overexpression in E. coli of the tgt gene or the queF gene (Q biosynthesis) is involved in greater resistance to UV radiation (Morgante et al., Environ Microbiol, 2015 doi: 10.1111/1462-2920.12505; and Díaz-Rullo et al., Front Microbiol. 2021 doi: 10.3389/fmicb.2021.723874). As an explanation, it was proposed (Diaz-Rullo and Gonzalez-Pastor, NAR 2023 doi: 10.1093/nar/gkad667) that the observed increase in the capacity to form biofilms in strains that overexpress genes related to Q modification of tRNA would be related to this greater resistance to UV radiation.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript the authors begin with the interesting phenotype of sub-inhibitory concentrations of the aminoglycoside tobramycin proving toxic to a knockout of the tRNA-guanine transglycosylase (Tgt) of the important human pathogen, Vibrio cholerae. Tgt is important for incorporating queuosine (Q) in place of guanosine at the wobble position of GUN codons. The authors go on to define a mechanism of action where environmental stressors control expression of tgt to control translational decoding of particularly tyrosine codons, skewing the balance from TAC towards TAT decoding in the absence of the enzyme. The authors use advanced proteomics and ribosome profiling to reveal that the loss of tgt results in increased translation of proteins like RsxA and a cohort of DNA repair factors, whose genes harbor an excess of TAT codons in many cases. These findings are bolstered by a series of molecular reporters, mass spectrometry, and tRNA overexpression strains to provide support for a model where Tgt serves as a molecular pivot point to reprogram translational output in response to stress.

      Strengths:

      The manuscript has many strengths. The authors use a variety of strains, assays, and advanced techniques to discover a mechanism of action for Tgt in mediating tolerance to sub-inhibitory concentrations of tobramycin. They observe a clear phenotype for a tRNA modification in facilitating reprogramming of the translational response, and the manuscript certainly has value in defining how microbes tolerate antibiotics.

      Weaknesses:

      The conclusions of the manuscript are mostly very well-supported by the data, but in some places control experiments or peripheral findings cloud precise conclusions. Some additional clarification, discussion, or even experimental extension could be useful in strengthening these areas.

      (1) The authors have created and used a variety of relevant molecular tools. In some cases, using these tools in additional assays as controls would be helpful. For example, testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain, Figure 5C with the rxsA-GFP fusion, and/or Figure 7B with UV stress would provide additional information of the ability of tRNA overexpression to compensate for the defect in these situations.<br /> (2) The authors present a clear story with a reprogramming towards TAT codons in the knockout strain, particularly regarding tobramycin treatment. The control experiments often hint at other codons also contributing to the observed phenotypes (e.g., His or Asp), yet these effects are mostly ignored in the discussion. It would be helpful to discuss these findings at a minimum in the discussion section, or possibly experimentally address the role of His or Asp by overexpression of these tRNAs together with Tyrosine tRNA(GUA) in an experiment like that of Figure 1I to see if a more "wild type" phenotype would present. In fact, the synergy of Tyr, His, and/or Asp codons likely helps to explain the effects observed with the DNA repair genes in later experiments.<br /> (3) Regarding Figure 6D, the APB northern blot feels like an afterthought. It was loaded with different amounts of RNA as input and some samples are repeated three times, but Δcrp only once. Collectively, it makes this experiment very difficult to assess.

      Minor Points:<br /> (4) Fig S2B, do the authors have a hypothesis why the Asp and Phe tRNAs lead to a growth decrease in the untreated samples? It appears like Phe(GAA) partially compensates for the defect.<br /> (5) Lines 655 to 660 seem more appropriate as speculation in the discussion rather than as a conclusion in the results, where no direct experiments are performed. The authors might take advantage of the "Ideas and Speculation" section that eLife allows.

    1. eLife assessment

      This study provides valuable new insights into insect cognition and problem-solving in bumblebees. The authors present convincing evidence that bumblebees lack causal understanding in a string-pulling task, although evidence that bumblebees instead use image-matching for this task, which would benefit from further experiments, is currently incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in experiment 3 and then tested where the string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on a string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into this study and think it contains a nice, thorough set of experiments. I enjoyed reading the paper and felt that overall it was well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      Weaknesses:

      (1) I think there are 2 key pieces of information that can be taken from the test phase - the bees' first choice and then their behavior across the whole test. I think the first choice is critical in terms of what the bee has learned from the training phase - then their behavior from this point is informed by the feedback they obtain during the test phase. I think both pieces of information are worth considering, but their behavior across the entire test phase is giving different information than their first choice, and this distinction could be made more explicit.

      In addition, while the bees' first choice is reported, no statistics are presented for their preferences.

      (2) It seemed to me that the bees might not only be using visual feedback but also motor feedback. This would not explain their behavior in the first test choice, but could explain some of their subsequent behavior. For example, bees might learn during training that there is some friction/weight associated with pulling the string, but in cases where the string is separated from the flower, this would presumably feel different to the bee in terms of the physical feedback it is receiving. I'd be interested to see some of these test videos (perhaps these could be shared as supplementary material, in addition to the training videos already uploaded), to see what the bees' behavior looks like after they attempt to pull a disconnected string.

      (3) I think the statistics section needs to be made clearer (more in private comments).

      (4) I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors wanted to see if bumblebees could succeed in the string-pulling paradigm with broken strings. They found that bumblebees can learn to pull strings and that they have a preference to pull on intact strings vs broken ones. The authors conclude that bumblebees use image matching to complete the string-pulling task.

      Strengths:

      The study has an excellent experimental design and contributes to our understanding of what information bumblebees use to solve a string-pulling task.

      Weaknesses:

      Overall, I think the manuscript is good, but it is missing some context. Why do bumblebees rely on image matching rather than causal reasoning? Could it have something to do with their ecology? And how is the task relevant for bumblebees in the wild? Does the test translate to any real-life situations? Is pulling a natural behaviour that bees do? Does image matching have adaptive significance?

    4. Reviewer #3 (Public Review):

      Summary:

      This paper presents bees with varying levels of experience with a choice task where bees have to choose to pull either a connected or unconnected string, each attached to a yellow flower containing sugar water. Bees without experience of string pulling did not choose the connected string above chance (experiment 1), but with experience of horizontal string pulling (as in the right-hand panel of Figure 4) bees did choose the connected string above chance (experiments 2-3), even when the string colour changed between training and test (experiments 4-5). Bees that were not provided with perceptual-motor feedback (i.e they could not observe that each pull of the string moved the flower) during training still learned to string pull and then chose the connected string option above chance (experiment 6). Bees with normal experience of string pulling then failed to discriminate between connected and unconnected strings when the strings were coiled or looped, rather than presented straight (experiments 7-8).

      Weaknesses:

      The authors have only provided video of some of the conditions where the bees succeeded. In general, I think a video explaining each condition and then showing a clip of a typical performance would make it much easier to follow the study designs for scholars. Videos of the conditions bees failed at would be highly useful in order to compare different hypotheses for how the bees are solving this problem. I also think it is highly important to code the videos for switching behaviours. When solving the connected vs unconnected string tasks, when bees were observed pulling the unconnected string, did they quickly switch to the other string? Or did they continue to pull the wrong string? This would help discriminate the use of perceptual-motor feedback from other hypotheses.

      The experiments are also not described well, for my below comments I have assumed that different groups of bees were tested for experiments 1-8, and that experiment 6 was run as described in line 331, where bees were given string-pulling training without perceptual feedback rather than how it is described in Figure 4B, which describes bees as receiving string pulling training with feedback.

      The authors suggest the bees' performance is best explained by what they term 'image matching'. However, experiment 6 does not seem to support this without assuming retroactive image matching after the problem is solved. The logic of experiment 6 is described as "This was to ensure that the bees could not see the familiar "lollipop shape" while pulling strings....If the bees prefer to pull the connected strings, this would indicate that bees memorize the arrangement of strings-connected flowers in this task." I disagree with this second sentence, removing perceptual feedback during training would prevent bees memorising the lollipop shape, because, while solving the task, they don't actually see a string connected to a yellow flower, due to the black barrier. At the end of the task, the string is now behind the bee, so unless the bee is turning around and encoding this object retrospectively as the image to match, it seems hard to imagine how the bee learns the lollipop shape.

      Despite this, the authors go on to describe image matching as one of their main findings. For this claim, I would suggest the authors run another experiment, identical to experiment 6 but with a black panel behind the bee, such that the string the bee pulls behind itself disappears from view. There is now no image to match at any point from the bee's perspective so it should now fail the connectivity task.

      Strengths:

      Despite these issues, this is a fascinating dataset. Experiments 1 and 2 show that the bees are not learning to discriminate between connected and unconnected stimuli rapidly in the first trials of the test. Instead, it is clear that experience in string pulling is needed to discriminate between connected and unconnected strings. What aspect of this experience is important? Experiment 6 suggests it is not image matching (when no image is provided during problem-solving, but only afterward, bees still attend to string connectivity) and casts doubt on perceptual-motor feedback (unless from the bee's perspective, they do actually get feedback that pulling the string moves the flower, video is needed here). Experiments 7 and 8 rule out means-end understanding because if the bees are capable of imagining the effect of their actions on the string and then planning out their actions (as hypotheses such as insight, means-end understanding and string connectivity suggest), they should solve these tasks.

      If the authors can compare the bees' performance in a more detailed way to other species, and run the experiment suggested, this will be a highly exciting paper

    1. eLife assessment

      This study provides a single-cell atlas for syngnathid fishes (seahorses, pipefishes, and seadragons), a valuable new resource to investigate the molecular basis of the many unique characters that define the pipefish embryo. The findings are generally supported by solid arguments, but whereas the single-cell RNA-sequencing analysis appears to be of good quality, the spatiotemporal expression data only incompletely support the authors' arguments. Additional computational analyses on cell identity and developmental trajectories would allow a deeper examination of the current data from these unconventional model organisms, to provide new insights into understanding the extraordinary adaptations of the Syngnathidae family. If appropriately improved, the work could be of broad interest for evolutionary developmental biology, particularly for fishes.

    2. Reviewer #1 (Public Review):

      Syngnathid fishes (seahorses, pipefishes, and seadragons) present very particular and elaborated features among teleosts and a major challenge is to understand the cellular and molecular mechanisms that permitted such innovations and adaptations. The study provides a valuable new resource to investigate the morphogenetic basis of four main traits characterizing syngnathids, including the elongated snout, toothlessness, dermal armor, and male pregnancy. More particularly, the authors have focused on a late stage of pipefish organogenesis to perform single-cell RNA-sequencing (scRNA-seq) completed by in situ hybridization analyses to identify molecular pathways implicated in the formation of the different specific traits.

      The first set of data explores the scRNA-seq atlas composed of 35,785 cells from two samples of gulf pipefish embryos that authors have been able to classify into major cell types characterizing vertebrate organogenesis, including epithelial, connective, neural, and muscle progenitors. To affirm identities and discover potential properties of clusters, authors primarily use KEGG analysis that reveals enriched genetic pathways in each cell types. While the analysis is informative and could be useful for the community, some interpretations appear superficial and data must be completed to confirm identities and properties. Notably, supplementary information should be provided to show quality control data corresponding to the final cell atlas including the UMAP showing the sample source of the cells, violin plots of gene count, UMI count, and mitochondrial fraction for the overall dataset and by cluster, and expression profiles on UMAP of selected markers characterizing cluster identities.

      The second set of data aims to correlate the scRNA-seq analysis with in situ hybridizations (ISH) in two different pipefish (gulf and bay) species to identify and characterize markers spatially, and validate cell types and signaling pathways active in them. While the approach is rational, the authors must complete the data and optimize labeling protocols to support their statements. One major concern is the quality of ISH stainings and images; embryos show a high degree of pigmentation that could hide part of the expression profile, and only subparts and hardly detectable tissues/stainings are presented. The authors should provide clear and good-quality images of ISH labeling on whole-mount specimens, highlighting the magnification regions and all other organs/structures (positive controls) expressing the marker of interest along the axis. Moreover, ISH probes have been designed and produced on gulf pipefish genome and cDNA respectively, while ISH labeling has been performed indifferently on bay or gulf pipefish embryos and larvae. The authors should specify stages and species on figure panels and should ensure sequence alignment of the probe-targeted sequences in the two species to validate ISH stainings in the bay pipefish. Moreover, spatiotemporal gene expression being a very dynamic process during embryogenesis, interpretations based on undefined embryonic and larval stages of pipefish development and compared to 3dpf zebrafish are insufficient to hypothesize on developmental specificities of pipefish features, such as on the absence of tooth primordia that could represent a very discrete and transient cell population. The ISH analyses would require a clean and precise spatiotemporal expression comparison of markers at the level of the entire pipefish and zebrafish specimens at well-defined stages, otherwise, the arguments proposed on teleost innovations and adaptations turn out to be very speculative.

      To conclude, whereas the scRNA-seq dataset in this unconventional model organism will be useful for the community, the spatiotemporal and comparative expression analyses have to be thoroughly pushed forward to support the claims. Addressing these points is absolutely necessary to validate the data and to give new insights to understand the extraordinary evolution of the Syngnathidae family.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors present the first single-cell atlas for syngathid fishes, providing a resource for future evolution & development studies in this group.

      Strengths:

      The concept here is simple and I find the manuscript to be well written. I like the in situ hybridization of marker genes - this is really nice. I also appreciate the gene co-expression analysis to identify modules of expression. There are no explicit hypotheses tested in the manuscript, but the discovery of these cell types should have value in this organism and in the determination of morphological novelties in seahorses and their relatives.

      Weaknesses:

      I think there are a few computational analyses that might improve the generality of the results.

      (1) The cell types: The authors use marker gene analysis and KEGG pathways to identify cell types. I'd suggest a tool like SAMap (https://elifesciences.org/articles/66747) which compares single-cell data sets from distinct organisms to identify 'homologous' cell types -- I imagine the zebrafish developmental atlases could serve as a reasonable comparative reference.

      (2) Trajectory analyses: The authors suggest that their analyses might identify progenitor cell states and perhaps related differentiated states. They might explore cytoTRACE and/or pseudotime-based trajectory analyses to more fully delineate these ideas.

      (3) Cell-cell communication: I think it's very difficult to identify 'tooth primordium' cell types, because cell types won't be defined by an organ in this way. For instance, dental glia will cluster with other glia, and dental mesenchyme will likely cluster with other mesenchymal cell types. So the histology and ISH is most convincing in this regard. Having said this, given the known signaling interactions in the developing tooth (and in development generally) the authors might explore cell-cell communication analysis (e.g., CellChat) to identify cell types that may be interacting.

    4. Reviewer #3 (Public Review):

      Summary:

      This study established a single-cell RNA sequencing atlas of pipefish embryos. The results obtained identified unique gene expression patterns for pipefish-specific characteristics, such as fgf22 in the tip of the palatoquadrate and Meckel's cartilage, broadly informing the genetic mechanisms underlying morphological novelty in teleost fishes. The data obtained are unique and novel, potentially important in understanding fish diversity. Thus, I would enthusiastically support this manuscript if the authors improve it to generate stronger and more convincing conclusions than the current forms.

      Weaknesses:

      Regarding the expression of sfrp1a and bmp4 dorsal to the elongating ethmoid plate and surrounding the ceratohyal: are their expression patterns spatially extended or broader compared to the pipefish ancestor? Is there a much closer species available to compare gene expression patterns with pipefish? Did the authors consider using other species closely related to pipefish for ISH? Sfrp1a and bmp4 may be expressed in the same regions of much more closely related species without face elongation. I understand that embryos of such species are not always accessible, but it is also hard to argue responsible genes for a specific phenotype by only comparing gene expression patterns between distantly related species (e.g., pipefish vs. zebrafish). Due to the same reason, I would not directly compare/argue gene expression patterns between pipefish and mice, although I should admit that mice gene expression patterns are sometimes helpful to make a hypothesis of fish evolution. Alternatively, can the authors conduct ISH in other species of pipefish? If the expression patterns of sfrp1a and bmp4 are common among fishes with face elongation, the conclusion would become more solid. If these embryos are not available, is it possible to reduce the amount of Wnt and BMP signal using Crispr/Cas, MO, or chemical inhibitor? I do think that there are several ways to test the Wnt and/or BMP hypothesis in face elongation.

    1. eLife assessment

      This study makes a connection between cellular metabolism and proteostasis through MAGIC, a previously proposed protein quality control pathway of clearance of cytosolic misfolded and aggregated proteins by importing into mitochondria. The authors reveal the role of Snf1, a yeast AMPK, in preventing the import of misfolded proteins to mitochondria for MAGIC controlled by the transcription factor Hap4, depending on the cellular metabolic status. The key message is important, although the evidence for physiological relevance of MAGIC for overall cellular proteostasis and its molecular regulation by Snf1 remains incomplete.

    1. eLife assessment

      This useful paper addresses a novel exercise mimetic agent on muscle exercise and performance. While the data provided are interesting, the evidence is incomplete, as much of it is correlative.

    1. eLife assessment

      The paper presents valuable insights into the success of the parasitoid Trichopria drosophilae on Drosophila suzukii, elucidating the importance of both molecular adaptations, such as specialized venom proteins and unique cell types, ecological strategies, including tolerance of intraspecific competition and avoidance of interspecific competition. Through convincing methodological approaches, the authors demonstrate how these adaptations optimize nutrient uptake and enhance parasitic success, highlighting the intricate coordination between molecular and ecological factors in driving parasitization success.

    1. eLife assessment

      The authors discuss an effect, "diffusive lensing", by which particles would accumulate in high-viscosity regions – for instance in the intracellular medium. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention. The "lensing effect" discussed is a direct consequence of the choice of the Ito convention without spurious drift which has been discussed before and its adequacy for the intracellular medium is insufficiently discussed and relatively doubtful. Consequently, the relevance of the presented results for biology remain unclear and based on incomplete evidence.

  3. May 2024
    1. eLife assessment

      This important study provides deep insight into a ubiquitous, but poorly understood, phenomenon: synaptic noise (primarily due to failures). Through a combination of theoretical analysis, simulations, and comparison to existing experimental data, this paper makes a compelling case that synapses are noisy because reducing noise is expensive. It touches on probably the most significant feature of living organisms -- their ability to learn -- and will be of broad interest to the neuroscience community.

    2. Reviewer #1 (Public Review):

      Summary:

      Given the cost of producing action potentials and transmitting them along axons, it has always seemed a bit strange that there are synaptic failures: when a spike arrives at a synapse, about half the time nothing happens. This paper proposes a perfectly reasonable explanation: reducing failures (or, more generally, reducing noise) is costly. Four possible mechanisms are proposed, each associated with a different cost, with costs of the form 1/sigma_i^rho where sigma_i is the failure-induced variability at synapse i and rho is an exponent. The four different mechanisms produce four different values of rho.

      What is interesting about the study is that the model makes experimental predictions about the relationship between learning rate, variability and presynaptic firing rate. Those predictions are consistent with experimental data, making it a strong candidate model. The fact that the predictions come from reasonable biological mechanisms make it a very strong candidate model and suggest several experiments to test it further.

      Interestingly, the predictions made by this model are nearly indistinguishable from the predictions made by a normative model (Synaptic plasticity as Bayesian inference. Aitchison it al., Nature Neurosci. 24:565-571 (2021). As pointed out by the authors, working out whether the brain is using Bayesian inference to tune learning rules, or it just looks like it's Bayesian inference but the root cause is cost minimization, will be an interesting avenue for future research.

      Finally, the authors relate their cost of reliability to the cost used in variational Bayesian inference. Intriguingly, the biophysical cost provides an upper bound on the variational cost. This is intellectually satisfying, as it answers a "why" question: why would evolution evolve to produce the kind of costs seen in the brain?

      Strengths:

      This paper provides a strong mix of theoretical analysis, simulations and comparison to experiments. And the extended appendices, which are very easy to read, provide additional mathematical insight.

      Weaknesses:

      None.

    3. Reviewer #2 (Public Review):

      Summary

      This manuscript argues about the similarity between two frameworks describing synaptic plasticity. In the Bayesian inference perspective, due to the noise and the limited available pre- and postsynaptic information, synapses can only have an estimate of what should be their weight. The belief about those weights is described by their mean and variance. In the energy efficient perspective, synaptic parameters (individual means and variances) are adapted such that the neural network achieves some task while penalizing large mean weights as well as small weight variances. Interestingly, the authors show both numerically and analytically the strong link between those two frameworks. In particular, both frameworks predict that (a) synaptic variances should decrease when the input firing rate increases and (b) that the learning rate should increase when the weight variances increase. Both predictions have some experimental support.

      Strengths

      (1) Overall, the paper is very well written and the arguments are clearly presented.

      (2) The tight link between the Bayesian inference perspective and the energy efficiency perspective is elegant and well supported, both with numerical simulations as well as with analytical arguments.

      (3) I also particularly appreciate the derivation of the reliability cost terms as a function of the different biophysical mechanisms (calcium efflux, vesicle membrane, actin and trafficking). Independently of the proposed mapping between the Bayesian inference perspective and the energy efficiency perspective, those reliability costs (expressed as power-law relationships) will be important for further studies on synaptic energetics.

      Weaknesses

      (1) As recognised by the authors, the correspondence between the entropy term in the variational inference description and the reliability cost in the energetic description is strong, but not perfect. Indeed, the entropy term scales as -log(sigma) while reliability cost scales as sigma^(-rho).

      (2) Even though this is not the main point of the paper, I appreciate the effort made by the authors to look for experimental data that could in principle validate the Bayesian/energetic frameworks. A stronger validation will be an interesting avenue for future research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses

      (1) The authors face a technical challenge (which they acknowledge): they use two numbers (mean and variance) to characterize synaptic variability, whereas in the brain there are three numbers (number of vesicles, release probability, and quantal size). Turning biological constraints into constraints on the variance, as is done in the paper, seems somewhat arbitrary. This by no means invalidates the results, but it means that future experimental tests of their model will be somewhat nuanced.

      Agreed. There are two points to make here.

      First, the mean and variance are far more experimentally accessible than n, p and q. The EPSP mean and variance is measured directly in paired-patch experiments, whereas getting n, p and q either requires far more extensive experimentation, or making strong assumptions. For instance, the data from Ko et al. (2013) gives the EPSP mean and variance, but not (directly) n, p and q. Thus, in some ways, predictions about means and variances are easier to test than predictions about n, p and q.

      That said, we agree that in the absence of an extensive empirical accounting of the energetic costs at the synapse, there is inevitably some arbitrariness as we derive our energetic costs. That was why we considered four potential functional forms for the connection between the variance and energetic cost, which covered a wide range of sensible forms for this energetic cost. Our results were robust to this wide range functional forms, indicating that the patterns we describe are not specifically due to the particular functional form, but arise in many settings where there is an energetic cost for reliable synaptic transmission.

      (2) The prediction that the learning rate should increase with variability relies on an optimization scheme in which the learning rate is scaled by the inverse of the magnitude of the gradients (Eq. 7). This seems like an extra assumption; the energy efficiency framework by itself does not predict that the learning rate should increase with variability. Further work will be needed to disentangle the assumption about the optimization scheme from the energy efficiency framework.

      Agreed. The assumption that learning rates scale with synapse importance is separate. However, it is highly plausible as almost all modern state-of-the-art deep learning training runs use such an optimization scheme, as in practice it learns far faster than other older schemes. We have added a sentence to the main text (line 221), indicating that this is ultimately an assumption.

      Major

      (1) The correspondence between the entropy term in the variational inference description and the reliability cost in the energetic description is a bit loose. Indeed, the entropy term scales as −log(σ) while reliability cost scales as σ−ρ. While the authors do make the point that σ−ρ upper bounds −log(σ) (up to some constant), those two cost terms are different. This raises two important questions:

      a. Is this difference important, i.e. are there scenarios for which the two frameworks would have different predictions due to their different cost functions?

      b. Alternatively, is there a way to make the two frameworks identical (e.g. by choosing a proposal distribution Q(w) different from a Gaussian distribution (and tuneable by a free parameter that could be related to ρ) and therefore giving rise to an entropy term consistent with the reliability cost of the energy efficiency framework)?

      To answer b first, there is no natural way to make the two frameworks identical (unless we assume the reliability cost is proportional to log_σsyn_, and we don’t think there’s a biophysical mechanism that would give rise to such a cost). Now, to answer a, in Fig. 7 we extensively assessed the differences between the energy efficient σsyn and the Bayesian σpost. In Fig.7bc, we find that σsyn and σpost are positively correlated in all models. This positive correlation indicates that the qualitative predictions made by the two frameworks (Bayesian inference and energy efficiency) are likely to be very similar. Importantly though, there are systematic differences highlighted by Fig. 7ab. Specifically, the energy efficient σsyn tends to vary less than the Bayesian σpost. This appears in Fig. 7b which shows the relationship between σsyn (on the y-axis) and σpost (on the x-axis). Specifically, this plot has a slope that is smaller than one for all our models of the biophysical cost. Further, the pattern also appears in the covariance ellipses in Fig. 7a, in that the Bayesian covariance ellipses tend to be long and thin, while the energy efficient covariance ellipsis are rounder. Critically though both covariance ellipses show the same pattern in that there is more noise along less important directions (as measured by the Hessian).

      We have added a sentence (line 273) noting that the search for a theoretical link is motivated by our observations in Fig. 7 of a strong, but not perfect link between the pattern of variability predicted by Bayesian and energy-efficient synapses.

      (2) Even though I appreciate the effort of the authors to look for experimental evidence, I still find that the experimental support (displayed in Fig. 6) is moderate for three reasons.

      a. First, the experimental and simulation results are not displayed in a consistent way. Indeed, Fig 6a displays the relative weight change |Dw|/w as a function of the normalised variability σ_2/|_µ| in experiments whereas the simulation results in Fig 5c display the variance σ_2 as a function of the learning rate. Also, Fig 6b displays the normalised variability _σ_2/|_µ| as a function of the input rate whereas Fig 5b displays the variance _σ_2 as a function of the input rate. As a consequence the comparison between experimental and simulation results is difficult.

      b. Secondly, the actual power-law exponents in the experiments (see Fig 6a resp. 6b) should be compared to the power-law exponents obtained in simulation (see Fig 5c resp. Fig 5b). The difficulty relies here on the fact that the power-law exponents obtained in the simulations directly depend on the (free) parameter ρ. So far the authors precisely avoided committing to a specific ρ, but rather argued that different biophysical mechanisms lead to different reliability exponents ρ. Therefore, since there are many possible exponents ρ (and consequently many possible power-law exponents in simulation results in Fig 5), it is likely that one of them will match the experimental data. For the argument to be stronger, one would need to argue which synaptic mechanism is dominating and therefore come up with a single prediction that can be falsified experimentally (see also point 4 below).

      c, Finally, the experimental data presented in Fig6 are still “clouds of points". A coefficient of r \= 0_.52 (in Fig 6a) is moderate evidence while the coefficient of _r \= −0_._26 (in Fig 6b) is weak evidence.

      The key thing to remember is that our paper is not about whether synapses are “really" Bayesian or energy efficient (or both/neither). Instead, the key point of our paper, as expressed in the title, is to show that the experimental predictions of Bayesian synapses are very similar to the predictions from energy efficient synapses. And therefore energy efficient synapses are very difficult to distinguish experimentally from Bayesian synapses. In that context, the two plots in Fig. 6 are not really intended to present evidence in favour of the energy efficiency / Bayesian synapses. In fact, Fig. 6 isn’t meant to constitute a contribution of the paper at all, instead, Fig. 6 serves merely as illustrations of the kinds of experimental result that have (Aitchison et al. 2021) or might (Schug et al. 2021) be used to support Bayesian synapses. As such, Fig. 6 serves merely as a jumping-off point for discussing how very similar results might equally arise out of Bayesian and energy-efficiency viewpoints.

      We have modified our description of Fig. 6 to further re-emphasise that the panels in Fig. 6 is not our contribution, but is taken directly from Schug et al. 2021 and Aitchison et al. 2021 (we have also modified Fig 6 to be precisely what was plotted in Schug et al. 2021, again to re-emphasise this point). Further, we have modified the presentation to emphasise that these plots serve merely as jumping off points to discuss the kinds of predictions that we might consider for Bayesian and energy efficient synapses.

      This is important, because we would argue that the “strength of support" should be assessed for our key claim, made in the title, that “Signatures of Bayesian inference emerge from energy efficient synapses".

      a) To emphasise that these are previously published results, we have chosen axes to matchthose used in the original work (Aitchison et al. 2021) and (Schug et al. 2021).

      b) We agree that a close match between power-law exponents would constitute strong evidencefor energy-efficiency / Bayesian inference, and might even allow us to distinguish them. We did consider such a comparison, but found it was difficult for two reasons. First, while the confidence intervals on the slopes exclude zero, they are pretty broad. Secondly, while the slopes in a one-layer network are consistent and match theory (Appendix 5) the slopes in deeper networks are far more inconsistent. This is likely to be due to a number of factors such as details of the optimization algorithm and initialization. Critically, if details of the optimization algorithm matter in simulation, they may also matter in the brain. Therefore, it is not clear to us that a comparison of the actual slopes is can be relied upon.

      To reiterate, the point of our article is not to make judgements about the strength ofevidence in previously published work, but to argue that Bayesian and energy efficient synapses are difficult to distinguish experimentally as they produce similar predictions. That said, it is very difficult to make blanket statements about the strength of evidence for an effect based merely on a correlation coefficient. It is perfectly possible to have moderate correlation coefficients along with very strong evidence of an effect (and e.g. very strong p-values), e.g. if there is a lot of data. Likewise, it is possible to have a very large correlation coefficient along with weak evidence of an effect (e.g. if we only have three or four datapoints, which happen to lie in a straight line). A small correlation coefficient is much more closely related to the effect-size. Specifically, the effect-size, relative to the “noise", which usually arises from unmeasured factors of variation. Here, we know there are many, many unmeasured factors of variation, so even in the case that synapses are really Bayesian / energy-efficient, the best we can hope for is low correlation coefficients

      As mentioned in the public review, a weakness in the paper is the derivation of the constraints on σi given the biophysical costs, for two reasons.

      a.First, it seemed a bit arbitrary whether you hold n fixed or p fixed.

      b.Second, at central synapses, n is usually small – possibly even usually 1: REF(Synaptic vesicles transiently dock to refill release sites, Nature Neuroscience 23:1329-1338, 2020); REF(The ubiquitous nature of multivesicular release Trends Neurosci. 38:428-438, 2015). Fixing n would radically change your cost function. Possibly you can get around this because when two neurons are connected there are multiple contacts (and so, effectively, reasonably large n). It seems like this is worth discussing.

      a) Ultimately, we believe that the “real” biological cost function is very complex, and most likely cannot be written down in a simple functional form. Further, we certainly do not have the experimental evidence now, and are unlikely to have experimental evidence for a considerable period into the future to pin down this cost function precisely. In that context, we are forced to resort to two strategies. First, using simplifying assumptions to derive a functional form for the cost (such as holding n or p fixed). Second, considering a wide range of functional forms for the cost, and ensuring our argument works for all of them.

      b) We appreciate the suggestion that the number of connections could be used as a surrogate where synapses have only a single release site. As you suggest we can propose an alternative model for this case where n represents the number of connections between neurons. We have added this alternative interpretation to our introduction of the quantal model under title “Biophysical costs". For a fixed PSP mean we could either have many connections with small vesicles or less connections with larger vesicles. Similarly for the actin cost we would certainly require more actin if the number of connections were increased.

      Minor

      (1) A few additional references could further strengthen some claims of the paper:

      Davis, Graeme W., and Martin Muller. “Homeostatic Control of Presynaptic Neurotransmitter Release." Annual Review of Physiology 77, no. 1 (February 10, 2015): 251-70. https://doi.org/10.1146/annurev-physiol-021014-071740. This paper provides elegant experimental support for the claim (in line 538 now 583) that µ is kept constant and q acts as a compensatory variable.

      Jegminat, Jannes, Simone Carlo Surace, and Jean-Pascal Pfister. “Learning as Filtering: Implications for Spike-Based Plasticity." Edited by Blake A Richards. PLOS Computational Biology 18, no. 2 (February 23, 2022): e1009721. https://doi.org/10.1371/journal.pcbi.1009721.

      This paper also showed that a lower uncertainty implies a lower learning rate (see e.g. in line 232), but in the context of spiking neurons.

      Figure 1 of the the first suggested paper indeed shows that quantal size is a candidate for homeostatic scaling (fixing µ). This review also references lots of further evidence of quantal scaling and evidence for both presynaptic and postsynaptic scaling of q leaving space for speculation on whether vesicle radius or postsynaptic receptor number is the source of a compensatory q. On line 583 we have added a few lines pointing to the suggested review paper.

      The second reference demonstrates Bayesian plasticity in the context of STDP, proposing learning rates tuned to the covariance in spike timing. We have added this as extra support for assuming an optimisation scheme that tunes learning rates to synapse importance and synapse variability (line 232).

      In the numerical simulations, the reliability cost is implemented with a single power-law expression (reliability cost ). However, in principle, all the reliability costs will play in conjunction, i.e. reliability cost . While I do recognise that it may be difficult to estimate the biophysical values of the various ci, it might be still relevant to comment on this.

      Agreed. Limitations in the literature meant that we could only form a cursory review of the relative scale of each cost using estimates by Atwell, (2001), Engl, (2015). On line 135 we have added a paragraph explaining the rationale for considering each cost independently.

      (3) In Eq. 8: σ_2 doesn’t depend on variability in _q, which would add another term; barring algebra mistakes, it’s . It seems worth mentioning why you didn’t include it. Can you argue that it’s a small effect?

      Agreed. Ultimately, we dropped this term because we expected it to be small relative to variability in vesicle release, and because it would be difficult to quantify In practice, the variability is believed to be contributed mostly by variability in vesicle release. The primary evidence for this is histograms of EPSP amplitudes which show classic multi-peak structure, corresponding to one, two three etc. EPSPs. Examples of these plots include:

      - “The end-plate potential in mammalian muscle”, Boyd and Martin (1956); Fig. 8.

      - “Structure and function of a neocortical synapse”, Holler-Rickauer et al. (2019); Extended Figure 5.

      (3) On pg. 7 now pg. 8, when the Hessian is introduced, why not say what it is? Or at least the diagonal elements, for which you just sum up the squared activity. That will make it much less mysterious. Or are we relying too much on the linear model given in App 2? If so, you should tell us how the Hessian was calculated in general. Probably in an appendix.

      With the intention of maintaining the interest of a wide audience we made the decision to avoid a mathematical definition of the Hessian, opting instead for a written definition i.e. line 192 - “Hii; the second derivatives of the objective with respect to wi.” and later on a schematic (Fig. 4) for how the second derivative can be understood as a measure of curvature and synapse importance. Nonetheless, this review point has made us aware that the estimated Hessian values plotted in Fig. 5a have been insufficiently explained so we have added a reference on line 197 to the appendix section where we show how we estimated the diagonal values of the Hessian.

      (4) Fig. 5: assuming we understand things correctly, Hessian ∝ |x|2. Why also plot σ_2 versus |_x|? Or are we getting the Hessian wrong?

      The Hessian is proportional to . If you assume that time steps are small and neurons spike, then , and . it is difficult to say what timestep is relevant in practice.

      (5) To get Fig. 6a, did you start with Fig. Appendix 1-figure 4 from Schug et al, and then use , drop the q, and put 1 − p on the x-axis? Either way, you should provide details about where this came from. It could be in Methods.

      We have modified Fig. 6 to use the same axes as in the original papers.

      (6) Lines 190-3: “The relationship between input firing rate and synaptic variability was first observed by Aitchison et al. (2021) using data from Ko et al. (2013) (Fig. 6a). The relationship between learning rate and synaptic variability was first observed by Schug et al. (2021), using data from Sjostrom et al. (2003) as processed by Costa et al. (2017) (Fig. 6b)." We believer 6a and 6b should be interchanged in that sentence.

      Thank you. We have switched the text appropriately.

      (7) What is posterior variance? This seems kind of important.

      This refers to the “posterior variance" obtained using a Bayesian interpretation of the problem of obtaining good synaptic weights (Aitchison et al. 2021). In our particular setting, we estimate posterior variances by setting up the problem as variational inference: see Appendix 4 and 5, which is now referred to in line 390.

      (8) Lines 244-5: “we derived the relationships between the optimized noise, σi and the posterior variable, σpost as a function of ρ (Fig. 7b;) and as a function of c (Fig. 7c)." You should tell the reader where you derived this. Which is Eq. 68c now 54c. Except you didn’t actually derive it; you just wrote it down. And since we don’t know what posterior variance is, we couldn’t figure it out.

      If H is the Hessian of the log-likelihood, and if the prior is negligable relative to the the likelihood, then we get Eq. 69c. We have added a note on this point to the text.

      (9) We believe Fig. 7a shows an example pair of synapses. Is this typical? And what about Figs. 7b and c. Also an example pair? Or averages? It would be helpful to make all this clear to the reader.

      Fig. 7a shows an illustrative pair of synapses, chosen to best display the relative patterns of variability under energy efficient and Bayesian synapses. We have noted this point in the legend for Fig. 7. Fig. 7bc show analytic relationships between energy efficient and Bayesian synapses, so each line shows a whole continuum of synapses(we have deleted the misleading points at the ends of the lines in Fig. 7bc).

      (10)  The y-axis of Fig 6a refers to the synaptic weight as w while the x-axis refers to the mean synaptic weight as mu. Shouldn’t it be harmonised? It would be particularly nice if both were divided by µ, because then the link to Fig. 5c would be more clear.

      We have changed the y-axis label of Fig. 6a from w to µ. Regarding the normalised variance, we did try this but our Gaussian posteriors allowed the mean to become small in our simulations, giving a very high normalised variance. To remedy this we would likely need to assume a log- posterior, but this was out of scope for the present work.

      (11) Line 250 (now line 281): “Finally, in the Appendix". Please tell us which Appendix. Also, why not point out here that the bound is tightest at small ρ?

      We have added the reference to the the section of the appendix with the derivation of the biological cost as a bound on the ELBO. We have also referenced the equation that gives the limit of the biological cost as ρ tends to zero.

      (12) When symbols appear that previously appeared more than about two paragraphs ago, please tell us where they came from. For instance, we spent a lot of time hunting for ηi. And below we’ll complain about undefined symbols. Which might mean we just missed them; if you told us where they were, that problem would be eliminated.

      We have added extra references for the symbols in the text following Eq. 69.

      (13) Line 564, typo (we think): should be σ−2.

      Good spot. This has been fixed.

      (14)  A bit out of order, but we don’t think you ever say explicitly that r is the radius of a vesicle. You do indicate it in Fig. 1, but you should say it in the main text as well.

      We have added a note on this to the legend in Fig. 1.

      (15) Eq. 14: presumably there’s a cost only if the vesicle is outside the synapse? Probably worth saying, since it’s not clear from the mechanism.

      Looking at Pulido and Ryan (2021) carefully, it is clear that they are referring to a cost for vesicles inside the presynaptic side of the synapse. (Importantly, vesciles don’t really exist outside the synapse; during the release process, the vesicle membrane becomes part of the cell membrane, and the contents of the vesicle is ejected into the synaptic cleft).

      (16) App. 2: why solve for mu, and why compute the trace of the Hessian? Not that it hurts, but things are sort of complicated, and the fewer side points the better.

      Agreed, we have removed the solution for μ, and the trace, and generally rewritten Appendix 2 to clarify definitions, the Hessian etc.

      (17) Eq. 35: we believe you need a minus sign on one side of the equation. And we don’t believe you defined p(d|w). Also, are you assuming g = partial log p(d|w)/partial w? This should be stated, along with its implications. And presumably, it’s not really true; people just postulate that p(d|w) ∝ exp(−log_loss_)?

      We have replaced p(d|w) with p(y, x|w), and we replaced “overall cost” with log P(y|w, x). Yes, we are also postulating that p(y|w, x) ∝ exp(−log loss), though in our case that does make sense as it corresonds to a squared loss.

      As regards the minus sign, in the orignal manuscript, we had the second derivative of the cost. There is no minus sign for the cost, as the Hessian of the cost at the mode is positive semi-definite. However, once we write the expression in terms of a log-likelihood, we do need a minus sign (as the Hessian of the log-likelihood at a mode is negative semi-definite).

      (18) Eq. 47 now Eq. 44: first mention of CBi;i?

      We have added a note describing CB around these equations.

      (19) The “where" doesn’t make sense for Eqs. 49 and 50; those are new definitions.

      We have modified the introduction of these equations to avoid the problematic “where”.

      (20) Eq. 57 and 58 are really one equation. More importantly: where does Eq. 58 come from? Is this the H that was defined previously? Either way, you should make that clear.

      We have removed the problematic additional equation line number, and added a reference to where H comes from.

      (21) In Eq. 59 now Eq. 60 aren’t you taking the trace of a scalar? Seems like you could skip this.

      We have deleted this derivation, as it repeats material from the new Appendix 2.

      (22) Eq. 66 is exactly the same as Eq. 32. Which is a bit disconcerting. Are they different derivations of the same quantity? You should comment on this.

      We have deleted lots of the stuff in Appendix 5 as, we agree, it repeats material from Appendix 2 (which has been rewritten and considerably clarified).

      (23) Eq. 68 now 54, left column: please derive. we got:

      gai = gradient for weight i on trial

      where the second equality came from Eq. 20. Thus

      Is that correct? If so, it’s a lot to expect of the reader. Either way, a derivation would

      be helpful.

      We agree it was unnecessary and overly complex, so we have deleted it.

      (24) App 5–Figure 2: presumably the data for panel b came from Fig. 6a, with the learning rate set to Δw/w? And the data for panel c from Fig. 6b? This (or the correct statement, if this is wrong) should be mentioned.

      Yes, the data for panel c came from Fig. 6b. We have deleted the data in panel b, as there are some subtleties in interpretation of the learning rates in these settings.

      (25) line 952 now 946: typo, “and the from".

      Corrected to “and from".

    1. eLife assessment

      This important study reveals the use of an allocentric spatial reference frame in the updating perception of the location of a dimly lit target during locomotion. The evidence supporting this claim is compelling, based on a series of cleverly and carefully designed behavioral experiments. The results will be of interest not only to scientists who study perception, action and cognition but also to engineers who work on developing visually guided robots and self-driving vehicles.

    2. Reviewer #1 (Public Review):

      This study conducted a series of experiments to comprehensively support the allocentric rather than egocentric visual spatial reference updating for the path-integration mechanism in the control of target-oriented locomotion. Authors firstly manipulated the waiting time before walking to tease apart the influence from spatial working memory in guiding locomotion. They demonstrated that the intrinsic bias in perceiving distance remained constant during walking and that the establishment of a new spatial layout in the brain took a relatively longer time beyond the visual-spatial working memory. In the following experiments, the authors then uncovered that the strength of the intrinsic bias in distance perception along the horizontal direction is reduced when participants' attention is distracted, implying that world-centered path integration requires attentional effort. This study also revealed horizontal-vertical asymmetry in a spatial coding scheme that bears a resemblance to the locomotion control in other animal species such as desert ants.

      The revised version of the study effectively situates the research within the broader context of terrestrial navigation, focusing on the movement of land-based creatures and offers a clearer explanation for the potential neurological basis of the human brain's allocentric odometer. Previous feedback has been thoroughly considered, and additional details have been incorporated into the presentation of the results.

    3. Reviewer #3 (Public Review):

      This study investigated what kind of reference (allocentric or egocentric) frame we used for perception in darkness. This question is essential and was not addressed much before. The authors compared the perception in the walking condition with that in the stationary condition, which successfully separated the contribution of self-movement to the spatial representation. In addition, the authors also carefully manipulated the contribution of the waiting period, attentional load, vestibular input, testing task, and walking direction (forward or backward) to examine the nature of the reference frame in darkness systematically.

      I am a bit confused by Figure 2b. Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer. In Figure 2, however, the authors assumed that the perceived target was located on the interception between the intrinsic bias curve and the viewing line from the NEW eye position to the target. This suggests that the perceived object depends on the observer's new location, which seems odd with the allocentric coordinate hypothesis.

      According to Fig 2b, the perceived size should be left-shifted and lifted up in the walking condition compared to that in the stationary condition. However, in Figure 3C and Fig 4, the perceived size was the same height as that in the baseline condition.

      Is the left-shifted perceived distance possibly reflecting a kind of compensation mechanism? Participants could not see the target's location but knew they had moved forward. Therefore, their brain automatically compensates for this self-movement when judging the location of a target. This would perfectly predict the left-shifted but not upward-shifted data in Fig 3C. A similar compensation mechanism exists for size constancy in which we tend to compensate for distance in computing object size.

      According to Fig 2a, the target, perceived target, and eye should be aligned in one straight line. This means that connecting the physical targets and the corresponding perceived target results in straight lines that converge at the eye position. This seems, however, unlikely in Figure 3c.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) Authors need to acknowledge the physical effort in addition to visual information for the spatial coding and may consider the manipulation of physical efforts in the future to support the robustness of constant intrinsic bias in ground-based spatial coding during walking.

      Whether one’s physical effort can affect spatial coding for visual perception is not a settled issue.  Several empirical studies have not been able to obtain evidence to support the claim.  For example, empirical studies by Hutchison & Loomis (2009) and Durgin et al. (2009) did not find wearing a heavy backpack significantly influenced distance perception, in contrast to the findings by Proffitt et al (2003).  We respectfully request not to discuss this issue in our revision since it is not closely related to the focus of the current study.

      (2) Furthermore, it would be more comprehensive and fit into the Neuroscience Section if the authors can add in current understandings of the spatial reference frames in neuroscience in the introduction and discussion, and provide explanations on how the findings of this study supplement the physiological evidence that supports our spatial perception as well.  For instance, world-centered representations of the environment, or cognitive maps, are associated with hippocampal formation while self-centered spatial relationships, or image spaces, are associated with the parietal cortex (see Bottini, R., & Doeller, C. F. (2020). Knowledge Across Reference Frames: Cognitive Maps and Image Spaces. Trends in Cognitive Sciences, 24(8),606-619. https://doi.org/10.1016/j.tics.2020.05.008 for details)

      We have now added this important discussion in the revision on pages 12-13.

      We thank the reviewer for the helpful comments.

      Reviewer 2:

      (1) ….As a result, it is unclear to what extent this "allocentric" intrinsic bias is involved in our everyday spatial perception. To provide more context for the general audience, it would be beneficial for the authors to address this issue in their discussion.

      We have clarified this on pages 3-4.  In brief, our hypothesis is that during self-motion, the visual system constructs an allocentric ground surface representation (reference frame) by integrating the allocentric intrinsic bias with the external depth cues on the natural ground surface.  Supporting this hypothesis, we recently found that when there is texture cue on the ground, the representation of the ground surface is influenced by the allocentric intrinsic bias (Zhou et al, unpublished results).

      (2) The current findings on the "allocentric" coding scheme raise some intriguing questions as to why such a mechanism would be developed and how it could be beneficial. The finding that the "allocentric" coding scheme results in less accurate object localization and requires attentional resources seems counterintuitive and raises questions about its usefulness. However, this observation presents an opportunity for the manuscript to discuss the potential evolutionary advantages or trade-offs associated with this coding mechanism.

      The revision has discussed these important issues on page 12.

      (3) The manuscript lacks a thorough description of the data analysis process, particularly regarding the fitting of the intrinsic bias curve (e.g., the blue and gray dashed curve in Figure 3c) and the calculation of the horizontal separation between the curves. It would be beneficial for the authors to provide more detailed information on the specific function and parameters used in the fitting process and the formula used for the separation calculation to ensure the transparency and reproducibility of the study's results.

      The results of the statistical analysis were presented in the supplementary materials.  We had stated in the original manuscript that we fitted the intrinsic bias curve by eye (obtained by drawing the curve to transcribe the data points as closely as possible) (page 26).  This is because we do not yet have a formula for the intrinsic bias. A challenge is the measured intrinsic bias in the dark can be affected by multiple factors.  One factor is related to individual differences as the intrinsic bias is shaped by the observer’s past experiences and their eye height relative to the ground surface.  However, it is certainly our goal to develop a quantitative model of the intrinsic bias in the future.

      We thank the reviewer for the helpful comments.

      Reviewer 3:

      (1) I am a bit confused by Figure 2b. Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer. In Figure 2, however, the authors assumed that the perceived target was located on the interception between the intrinsic bias curve and the viewing line from the NEW eye position to the target. This suggests that the perceived object depends on the observer's new location, which seems odd with the allocentric coordinate hypothesis.

      We respectively disagree with the Reviewer’s statement that “Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer.”  The statement conflates the definitions of allocentric representation with exocentric representation.  We respectfully maintain that the observer’s body location, as well as observer-object distance, can be represented with the allocentric coordinate system.

      (2) According to Fig 2b, the perceived size should be left-shifted and lifted up in the walking condition compared to that in the stationary condition. However, in Figure 3C and Fig 4, the perceived size was the same height as that in the baseline condition.

      We assume by “target size”, the Reviewer actually meant, “target location”.  It is correct that figure 3c and figure 4 showed judged distance changed as predicted, while the change in judged height was not significant.  One explanation for this is that the magnitude of the height change was much smaller than the distance change and could not be revealed by our blind walking-gesturing method.  Please also note our figures used difference scales for the vertical height and horizontal distance.

      (3) Is the left-shifted perceived distance possibly reflecting a kind of compensation mechanism?  Participants could not see the target's location but knew they had moved forward.  Therefore, their brain automatically compensates for this self-movement when judging the location of a target.  This would perfectly predict the left-shifted but not upward-shifted data in Fig 3C.  A similar compensation mechanism exists for size constancy in which we tend to compensate for distance in computing object size.

      We assume the Reviewer suggested that the path-integration mechanism first estimates the traveled distance in the dark, and then the brain subtracts the estimated distance from the perceived target distance.  We respectfully maintain that this explanation is unlikely because it does not account for our empirical findings.  We found that walking in the dark did not uniformly affect perceived target distance, as the Reviewer’s explanation would predict.  As shown in figures 3 and 4, walking affected the near targets less than the far targets (i.e., the horizontal distance difference between walking and baseline-stationary conditions was smaller for the near target than far target).

      (4) According to Fig 2a, the target, perceived target, and eye should be aligned in one straight line. This means that connecting the physical targets and the corresponding perceived target results in straight lines that converge at the eye position. This seems, however, unlikely in Figure 3c.

      We have added in the revision, the averaged eye positions on the y-axes of figures 3 and 4.  To reveal the impact of the judged angular declination, we also added graphs that plotted the estimated angular declination as a function of the physical declination of the target.  In general, the slopes are close to unity.

      We thank the reviewer for the helpful comments.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) This study is very well-designed and written. One minor comment is that anisotropy usually refers to the perceptual differences along cardinal (horizontal + vertical) and oblique directions. It might be clearer if the authors changed the "horizontal-vertical anisotropy" to "horizontal/vertical asymmetry”.

      The Reviewer is correct, and we have changed it to horizontal/vertical asymmetry (pages 8 and 11).

      Reviewer 2 (Recommendations For The Authors):

      (1) Providing more details about the "path integration mechanism" when it is first introduced in line 44 would be helpful for readers to better understand the concept.

      The revision has expanded on the path integration mechanism (page 4).

      Adding references for the statement starting with "In fact, previous findings" in lines 218 and would be helpful to provide readers with a basis for comparison between the current study and previous studies that reported an egocentric coding system.

      We have added the references and elaborated on this important issue (pages 10-11).

      (2) There appears to be a discrepancy between the Materials and Methods section, which states that 14 observers participated in Experiments 1-4, and the legends of Figures 3 and 4, which indicates a sample size of "n=8." It would be helpful if the authors could clarify this discrepancy and provide an explanation for the difference in the sample size reported.

      We have clarified the number of observers on page 14.

      (3) While reporting statistical significance is essential in the Results section, there are several instances where the manuscript only mentions a "statistically significant separation" with it p-value without providing the mean and standard deviation of the separation values (e.g., line 100 and 120). This can make it difficult for readers to fully grasp the quantitative nature of the results.

      The statistical analysis and outcomes were presented in the supplementary information document in our original submission.

      Reviewer 3 (Recommendations For The Authors):

      (1) Figure 1 is not significantly related to the current manuscript.

      We feel that retaining figure 1 in the manuscript would help readers to quickly grasp the background literature without having to refer extensively to our previous publications.

      (2) Add eye position to the results figures.

      We have added eye positions in the figures.

      (3) Fig 4c requires a more detailed explanation. The authors stated that Figures 4a and 4c showed consistent results.  However, because 4a and 4c used different horizontal axis, it is different to compare them directly.

      We have modified the sentence in the revision (page 8).

    1. Reviewer #2 (Public Review):

      Summary:

      The goal of this study is to clarify how the brain simultaneously represents item-specific temporal information and item-independent boundary information. The authors report spectral EEG data from intracranial patients performing a delayed free recall task. They perform cosine similarity analyses on principal components derived from gamma band power across stimulus duration. The authors find that similarity between items in serial position 1 (SP1) and all other within-list items decreases as a function of serial position, consistent with temporal context models. The authors find that across-list item similarity to SP1 is greatest for SP1 items relative to items from other serial positions, an effect that is greater in medial parietal lobe compared to lateral temporal cortex and hippocampus. The authors conclude that their findings suggest that perceptual boundary information is represented in medial parietal lobe. Despite a robust dataset, the methodological limitations of the study design prevent strong interpretations from being made from these data. The same-serial position across-list similarity may be driven by attentional mechanisms that are distinct from boundary information.

      Strengths:

      (1) The motivation of the study is strong as how both temporal contextual drift and event boundaries contribute to memory mechanisms is an important open question.

      (2) The dataset of spectral EEG data from 99 intracranial patients provides the opportunity for precise spatiotemporal investigation of neural memory mechanisms.

      Weaknesses:

      The goal of reconciling temporal context and event boundary mechanisms is timely and would be of interest; however, an attentional account can still be used to explain the findings. This alternative account is not considered in the manuscript.

      (1) The issue related to interpreting the SP1 similarity effects as reflecting boundary specific representations remains in the revised manuscript. The authors suggest that because cross-list SP1 similarity is found in recalled items that this supports the boundary interpretation. However, the effects could still be explained by variability in attention that is not specific to an event-boundary per se. As both subsequently recalled items and primacy items tend to recruit more gamma power than non-recalled and non-primacy items, recalled items will tend to have greater similarity with one another. It does not necessarily follow though that that this similarity is due to a "boundary representation."

      (2) The authors partly addressed my concern regarding the comparison of recalled pairs. How did the authors account for the fact that the same participants do not contribute equally to all ROIs? If only participants who have electrodes in all ROIs are included, are the effects consistent?

    2. eLife assessment

      This valuable study presents a novel analysis of a large human intracranial electrophysiological recording dataset. The study challenges the traditional view that neural responses to word lists exhibit smoothly drifting contexts over time, showing that items just after a boundary have a characteristic response that occurs repeatedly. The evidence is incomplete, however, leaving open the possibility for alternative explanations.

    3. Reviewer #1 (Public Review):

      Summary:

      This study applied pattern similarity analyses to intracranial EEG recordings to determine how neural drift is related to memory performance in a free recall task. The authors compared neural similarity within and across lists, in order to contrast signals related to contextual drift vs. the onset of event boundaries. They find that within-list neural differentiation in the lateral temporal cortex correlates with probability of word recall; in contrast, across-list pattern similarity in the medial parietal lobe correlates with recall for items near event boundaries (early-list serial positions). This primacy effect persists for the first three items of a list. Medial parietal similarity is also enhanced across lists for end-of-list items, however this effect then predicts forgetting. The authors do not find that within- or across-list pattern similarity in the hippocampus is related to recall probability.

      Strengths:

      The authors use a large dataset of human intracranial electrophysiological recordings, which gives them high statistical power to compare neural activity and memory across three important memory encoding regions. In so doing, the authors seek to address a timely and important question about the neural mechanisms that underlie the formation of memories for events.

      The use of both within and across event pattern similarity analyses, combined with linear mixed effects modeling, is a marriage of techniques that is novel and translatable in principle to other types of data.

      Weaknesses:

      In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth in order to reconcile it with the previous literature and with the motivating theoretical model.

      The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). In other words, this is analogous to presenting the same item at the start of every single list, in which case it is not surprising that the parietal (or any neural) representation would be similar to itself at the start of every list. So, a qualitatively unique boundary representation would not be necessary to explain this result. The authors do not include analyses to rule this out, which makes it difficult to interpret a key finding.

      There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors' interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances.

      The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue.

    4. Reviewer #3 (Public Review):

      Summary:

      In this study, the authors analyzed data from 99 individuals with implanted electrodes who were performing a word-list recall task. Because the task involves successively encoding and then recalling 25 lists in a row, they were able to measure the similarity in neural responses for items within the same list as well as items across different lists, allowing them to test hypotheses about the impact of between-list boundaries on neural responses. They find that, in addition to slow drift in responses across items within a list and changes across lists, there is boundary-related structure in the medial parietal lobe such that early items in each list show similarity (for recalled items) and late items in each list show similarity (for not recalled items).

      Strengths:

      The dataset used in this paper is substantially larger than most iEEG datasets, allowing for the detection of nuanced differences between item positions and for analyses of individual differences in boundary-related responses. There are excellent visualizations of the similarity structure between items for each region, and this work connects to a growing literature on the role of event boundaries in structuring neural responses.

      Weaknesses:

      (1) The visualization in Fig 1B claims that the prediction of the temporal context model is that nearby items in the presented sequence should have similar representations; that is, nearby items within a list should be similar, and the end of a list should look similar to the beginning of the next list. First, it's unclear to me if this is exactly what TCM would predict for this dataset, since lists are separated by ~60 seconds of distractor and retrieval tasks, rather than simply by a brief event boundary. Second, the authors do not actually test this model of continuous similarity across lists. After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with a "list distance" regressor that predicts discrete changes between lists. The authors state that it is not possible to replace this list distance regressor with an item distance regressor (which would be a straight line in Fig 3D rather than stair-steps) because this would be too collinear with the boundary proximity regressor, but I do not understand why these regressors would be collinear at all (since the boundary proximity regressor does not systematically increase or decrease across items).

      (2) There is no theoretical or quantitative justification for the specific forms of the boundary proximity models, For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different linear model of d/#items is used, which seems to have a somewhat different interpretation, since it changes at a constant rate across all items rather than only modeling items near the final boundary. Confusingly, the schematic in Fig 1B shows symmetric effects at initial and final boundaries, despite two different models being used and the authors' assertion in their response that they do not believe these processes are symmetric.

      (3) It is unclear to me whether the authors believe that the observed similarity after boundaries is due to an active process in which "the medial parietal lobe uses drift-resets" to reinstate a boundary-related context, or that this similarity is simply because "the context for the first item may be the boundary itself", and therefore this effect would emerge naturally from a temporal context model that incorporates the full task structure as the "items."

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. Reviewer #3 (Public Review):

      Summary:

      The authors food-deprived male and female mice and observed a much stronger reduction of leptin levels, energy consumption in the visual cortex, and visual coding performance in males than females. This indicates a sex-specific strategy for the regulation of the energy budget in the face of low food availability.

      Strengths:

      This study extends a previous study demonstrating the effect of food deprivation on visual processing in males, by providing a set of clear experimental results, demonstrating the sex-specific difference. It also provides hypotheses about the strategy used by females to reduce energy budget based on the literature.

      Weaknesses:

      The authors do not provide evidence that females are not impacted by visually guided behaviors contrary to what was shown in males in the previous study.

    2. Reviewer #1 (Public Review):

      Padamsey et al. followed up on their previous study in which they found that male mice sacrifice visual cortex computation precision to save energy in periods of food restriction (Padamsey et al. 2021, Neuron). In the present study, the authors find that female mice show much lower levels of adaptation in response to food restriction on the level of metabolic signaling and visual cortex computation. This is an important finding for understanding sex differences in adaptation to food scarcity and also impacts the interpretation of studies employing food restriction in behavioral analyses and learning paradigms.

      Strengths:

      The manuscript is, in general, very clear and the conclusions are straightforward. The experiments are performed in the same conditions for males and females and the authors did not find differences in the behavioral states of male and female mice that could explain differences in energy consumption. Moreover, they show that visual cortex in both males and females does not change its baseline energy consumption in the dark, therefore the adjustment of energy budget in males only targets visual processing.

      Weaknesses:

      The number of experiments is insufficient to compare the effects of food restriction in males and females directly, which is discussed by the authors: to address this point they use Bayes factor analysis to provide an estimate of the likelihood that females and males indeed differ in terms of energy metabolism and sensory processing adaptions during food restriction.

    3. Reviewer #2 (Public Review):

      Summary:

      Padamsey et al build up on previous significant work from the same group which demonstrated robust changes in the visual cortex in male mice from long-term (2-3 weeks) food restriction. Here, the authors extend this finding and reveal striking sex-specific differences in the way the brain responds to food restriction. The measures included the whole-body measure of serum leptin levels, and V1-specific measures of activity of key molecular players (AMPK and PPARα), gene expression patterns, ATP usage in V1, and the sharpness of visual stimulus encoding (orientation tuning). All measures supported the conclusion that the female mouse brain (unlike in males) does not change its energy usage and cortical functional properties on comparable food restriction.

      While the effect of food restriction on more peripheral tissue such as muscle and bones has been well studied, this result contributes to our understanding of how the brain responds to food restriction. This result is particularly significant given that the brain consumes a large fraction of the body's energy consumption (20%), with the cortex accounting for half of that amount. The sex-specific differences found here are also relevant for studies using food restriction to investigate cortical function.

      Strengths:

      The study uses a wide range of approaches mentioned above which converge on the same conclusion, strengthening the core claim of the study.

      Weaknesses:

      Since the absence of a significant effect does not prove the absence of any changes, the study cannot claim that the female mouse brain does not change in response to food restriction. However, the authors do not make this claim. Instead, they make the well-supported claim that there is a sex-specific difference in the response of V1 to food restriction.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) For a number of experiments the authors use their new data set on females and compare that with the data set previously published on males. In how far are these data sets comparable? Have they been performed originally in parallel for example using siblings of different sexes or have the experiments been conducted several years apart from each other? What is the expected variability, if one repeated these experiments with the same sex considering the differences/similarities between experimental setups, housing conditions, interindividual differences, etc.? 

      This is an important point. We did our best to collect the data in similar conditions (same set-ups; same animal housing conditions) and in experimental cohorts including both males and females. While some data from males were published first, the acquisition of male and female data was done in the same time period.

      Specifically, all results shown in Figure 1 and Figure 2 (Serum leptin, PPARalpha, AMPK, RNAseq) come from samples (from both males and females) that were processed at the same time and in similar conditions, by the same authors (Z.P. and P. M.).

      For the in vivo data (Figure 3, Supplementary figure 1), the male and female data were collected within a 1–2-year timeframe, in the same setups, by the same two authors (Z.P., D.K.). The males and females were housed under similar conditions (same room, same cage type, in groups of 25). We did not use siblings of different sexes. Independent cohorts (1-12 months apart), including both males and females, went into each data set. The within cohort variability does not obviously differ from between cohort variability, however the n number of animals is too small to confirm this with sufficient statistical power. 

      Altogether, the differences observed between male and female data cannot be explained by the timing and conditions of data acquisition from both sexes.

      (2) Energy consumption and visual processing may differ between periods in which animals are in different behavioral states. Is there a possibility that male and female mice differed in behavioral state during measurements? Were animals running or resting during visual stimulation and during ATP measurements? 

      We thank the reviewer for this suggestion. We have now edited the text and included a new supplementary figure. All in vivo experiments were done in stationary animals that were resting in a cardboard tube both during 2-photon imaging and ATP measurements. Animals were also well habituated to the setup. In addition, we have imaged pupil diameters during in vivo imaging session. We have quantified pupil diameter during visual stimulation and do not find a sex difference (Supplemental Figure 2). Thus, we did not find a significant difference in behavioural or attentional state between sexes, in our experimental conditions.

      We have edited the text to include this information (lines 183-185).

      (3) Related to the previous point: the authors show that ATP consumption was reduced in male mice during visual stimulation. What about visual cortex ATP consumption in the absence of visual stimulation? Do food-deprived males and/or females show lower ATP consumption in the visual cortex e.g. during sleep? 

      We have repeated V1 ATP imaging experiments in the dark, in the absence of visual stimulation, in both males and females (Supplementary figure 1). ATP consumption rates are slower in the dark vs. during visual stimulation. Moreover, we find that in the dark, there is no difference in ATP consumption rate between control and food restricted animals of either sex. Thus, the reduced ATP consumption we found with food restriction in males is related specifically to the active processing of visual information.

      We have edited the text to include this information (lines 158-159).

      Reviewer 2:

      (1) It appears that the authors have the data for doing decoding analysis, similar to Fig 6D in their previous paper. However, this analysis has not been done for this study. This would be good to include.  If the authors have attempted the behavioural discrimination tests on female mice as in the previous study, this would also be useful to include. 

      The first point of the reviewer is about datasets acquired in males that are included in our previous publication (Padamsey et al., 2022) but not compared to female data in the present manuscript.

      Whilst we fully agree that these results would be very useful, we did not have the resources (in terms of skilled researcher and funding) to perform these experiments in female mice. That is why these results are not included in this manuscript.

      (2) There appears to be an inconsistency in the methods of reporting OSI. It states that the OSI of grating-responsive neurons was calculated as 1 - circular variance. But then OSI is defined as simply abs(). Also, it would be good to be consistent about reporting medians as the median without confounding with the average (which is the mean). Sentences such as the following do not make sense: The average OSI for an animal was taken as the median OSI value calculated across neurons. This should be corrected throughout the manuscript, where the average is mentioned but the median is measured. 

      We thank the reviewer for noting this issue and we apologize for the confusion. We have now clarified the above in the manuscript (lines 587-603) and insert the following reference for the detailed explanation of OSI and DSI calculation: Mazurek M, Kager M, Van Hooser SD. Robust quantification of orientation selectivity and direction selectivity. Front Neural Circuits. 2014. https://doi.org/10.3389/fncir.2014.00092

      In the figure showing the orientation tuning, the authors have collapsed the two directions of each orientation together. However, if I understand correctly, the calculation of OSI does not do this step of collapsing. In this case, and in the interest of revealing more useful features of the data instead of averaging them out, it would be good to show the average tuning curves with and without FR for all directions, not collapsed. 

      As with orientation tuning, we found that direction tuning is reduced with food restriction, and that this is significant in males, but not in females. These results are now included in the text, with statistics (lines 179-180) and in Supplemental Figure 3.

      Reviewer 3:

      l. 183-187 The discussion based on the idea that "The Bayes factor analysis helps to differentiate the absence of evidence from the evidence of absence." does not seem very helpful. Using a statistical criterium makes less sense than providing the reader with an estimate largest effect size (if there is any) that is compatible with the observation. If there would be a significant effect but of a very small size would it change the authors' conclusion? That seems unlikely. I recommend removing the sentence on line 184, which is in fact not used afterwards. 

      We agree with the reviewer. We have now removed the sentence and rephrased (lines 202-208).  

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We now provide exact p-values alongside the summary statistics (test statistic and df) and 95% confidence intervals for all key results.

    1. Reviewer #2 (Public Review):

      Summary:

      The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.

      Strengths:

      (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.

      (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.

      (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.

      Weaknesses:

      While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.

      The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. For example, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at the neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.

      While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category or by each acoustic feature, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing. This goes back to my first comment that the selected set of stimuli may not fully exploit the entire space of speech and music, and there are possible exemplars that violate the preference map here. For example, this study only considered a specific set of multi-instrumental music, it is not clear to me if other types of music would result in different response profiles in individual channels. It is also not clear if a foreign language that the listeners cannot comprehend would evoke similar response profiles. On the contrary, breaking down into the neural coding of more fundamental feature representations that constitute speech and music, and analyzing the unique contribution of each feature would give a more comprehensive understanding.

      The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.

    2. eLife assessment

      This study presents valuable intracranial findings on how two types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.

    3. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors examined the extent to which processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.

      Strengths:

      The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.

      Weaknesses:

      (1) The study employed longer speech and music stimuli, thereby promising improved ecological validity as compared to prior research, a point emphasized by the authors. However, it failed to differentiate between neural responses to the diverse content or local structures within speech and music. The authors considered the potential limitation of treating these extensive speech and music stimuli as stationary signals, neglecting their complex musical or linguistic structural details and temporal variations across local structures such as sentences and phrases. This balanced perspective offered by the authors aids readers in better understanding the context of the study and highlights potential areas for expansion and further considerations.

      (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music. However, this should not deter readers from recognizing the study's strengths, namely, the use of iEEG recordings that offer high spatial resolution and extensive cortical coverage.

      (3) The concept of selectivity - shared, preferred, and domain-selective - may not present sufficient theoretical accuracy. It is appreciated that the authors put effort into clearly defining their operational measurement on 'selectivity'. Later, the authors further mentioned the specific indication of their analyses. However, the authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with posthoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not present a strong case that a region is specifically selective to a type of stimulus like speech. The narrative of the manuscript could potentially lead to an overgeneralized interpretation of their findings as being broadly applicable to speech or music, if a reader does not delve into the details.

      (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents potential issues. If cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.

    4. Reviewer #3 (Public Review):

      Summary:

      Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain specific or rather domain general and shared. To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.

      Strengths:

      I found this manuscript to be rigorous providing compelling and clear evidence towards shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis on the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.

      Weaknesses:

      I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We have specifically addressed the points of uncertainty highlighted in eLife's editorial assessment, which concerned the lack of low-level acoustics control, limitations of experimental design, and in-depth analysis. Regarding “the lack of low-level acoustics control, limitations of experimental design”, in response to Reviewer #1, we clarify that our study aimed to provide a broad perspective —which includes both auditory and higher-level processes— on the similarities and distinctions in processing natural speech and music within an ecological context. Regarding “the lack of in-depth analysis”, in response to Reviewer #1 and #2, we have clarified that while model-based analyzes are valuable, they pose fundamental challenges when comparing speech and music. Non-acoustic features inherently differ between speech and music (such as phonemes and pitch), making direct comparisons reliant on somewhat arbitrary choices. Our approach mitigates this challenge by analyzing the entire neural signal, thereby avoiding potential pitfalls associated with encoding models of non-comparable features. Finally, we provide some additional analyzes suggested by the Reviewers.

      We sincerely appreciate your thoughtful and thorough consideration throughout the review process.

      eLife assessment

      This study presents valuable intracranial findings on how two important types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid but somewhat incomplete since although the data analysis is thorough, the results are robust and the stimuli have ecological validity, important considerations such as low-level acoustics control, limitations of experimental design, and in-depth analysis, are lacking. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors examined the extent to which the processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.

      Strengths:

      The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.

      Weaknesses:

      The weakness of this study, in my view, lies in its experimental design and reasoning:

      (1) Despite using longer stimuli, the study does not significantly enhance ecological validity compared to previous research. The analyses treat these long speech and music stimuli as stationary signals, overlooking their intricate musical or linguistic structural details and temporal variation across local structures like sentences and phrases. In previous studies, short, less ecological segments of music were used, maintaining consistency in content and structure. However, this study, despite employing longer stimuli, does not distinguish between neural responses to the varied contents or structures within speech and music. Understanding the implications of long-term analyses, such as spectral and connectivity analyses over extended periods of around 10 minutes, becomes challenging when they do not account for the variable, sometimes quasi-periodical or even non-periodical, elements present in natural speech and music. When contrasting this study with prior research and highlighting its advantages, a more balanced perspective would have been beneficial in the manuscript.

      Regarding ecological validity, we respectfully hold a differing perspective from the reviewer. In our view, a one-second music stimulus lacks ecological validity, as real-world music always extends much beyond such a brief duration. While we acknowledge the trade-off in selecting longer stimuli, limiting the diversity of musical styles, we maintain that only long stimuli afford participants an authentic musical listening experience. Conversely, shorter stimuli may lead participants to merely "skip through" musical excerpts rather than engage in genuine listening.

      Regarding the critique that we "did not distinguish between neural responses to the varied contents or structures within speech and music," we partly concur. Our TRF (temporal response function) analyzes incorporate acoustic content, particularly the acoustic envelope, thereby addressing this concern to some extent. However, it is accurate to note that we did not model non-acoustic features. In acknowledging this limitation, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      Finally, we did take into account the reviewer’s remark and did our best to give a more balanced perspective of our approach and previous studies in the discussion.

      “While listening to natural speech and music rests on cognitively relevant neural processes, our analytical approach, extending over a rather long period of time, does not allow to directly isolate specific brain operations. Computational models -which can be as diverse as acoustic (Chi et al., 2005), cognitive (Giordano et al., 2021), information-theoretic (Di Liberto et al., 2020), or self-supervised neural network (Donhauser & Baillet, 2019 ; Millet et al., 2022) models- are hence necessary to further our understanding of the type of computations performed by our reported frequency-specific distributed networks. Moreover, incorporating models accounting for musical and linguistic structure can help us avoid misattributing differences between speech and music driven by unmatched sensitivity factors (e.g., arousal, emotion, or attention) as inherent speech or music selectivity (Mas-Herrero et al., 2013; Nantais & Schellenberg, 1999).”

      (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music.

      We acknowledge the reviewer's concern. Indeed, speech and music differ on various levels, including acoustic and cognitive aspects, and our analyzes do not explicitly distinguish them. The aim of this study was to provide an overview of the similarities and differences between natural speech and music processing, in ecological context. Future work is needed to explore further the different hierarchical levels or networks composing such listening experiences. Of note, however, we report whole-brain results with high spatial resolution (thanks to iEEG recordings), enabling the distinction between auditory, superior temporal gyrus (STG), and higher-level responses. Our findings clearly highlight that both auditory and higher-level regions predominantly exhibit shared responses, challenging the interpretation that our results can be attributed solely to differences in 'basic acoustic characteristics'.

      We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The concept of selectivity - shared, preferred, and domain-selective - increases the risks of potentially overgeneralized interpretations and theoretical inaccuracies. The authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with post hoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not necessarily imply that a region is specifically selective to a type of stimulus like speech. The manuscript's narrative might lead to an overgeneralized interpretation that their findings apply broadly to speech or music. However, identifying differences in neural responses to a few sets of specific stimuli in one brain region does not robustly support such a generalization. This is because speech and music are inherently diverse, and specificity often relates more to the underlying functions than to observed neural responses to a limited number of examples of a stimulus type. See the next point.

      Exactly! Here, we present a precise operational definition of these terms, implemented with clear and rigorous statistical methods. It is important to note that in many cognitive neuroscience studies, the term "selective" is often used without a clear definition. By establishing operational definitions, we identified three distinct categories based on statistical testing of differences from baseline and between conditions. This approach provides a framework for more accurate interpretation of experimental findings, as now better outlined in the introduction:

      “Finally, we suggest that terms should be operationally defined based on statistical tests, which results in a clear distinction between shared, selective, and preferred activity. That is, be A and B two investigated cognitive functions, “shared” would be a neural population that (compared to a baseline) significantly and equally contributes to the processing of both A and B; “selective” would be a neural population that exclusively contributes to the processing of A or B (e.g. significant for A but not B); and “preferred” would be a neural population that significantly contributes to the processing of both A and B, but more prominently for A or B (Figure 1A).”

      Regarding the risk of over-generalization, we want to clarify that our manuscript does not claim that a specific region or frequency band is selective to speech or music. As indeed we focus on testing excerpts of speech and music, we employ the reverse logical reasoning: "if 10 minutes of instrumental music activates a region traditionally associated with speech selectivity, we can conclude that this region is NOT speech-selective." Our conclusions revolve around the absence of selectivity rather than the presence of selective areas or frequency bands. In essence, "one counterexample is enough to disprove a theory." We now further elaborated on this point in the discussion section:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyzes. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents issues. For instance, in the cochlea, different stimuli activate different parts of the basilar membrane due to the distinct spectral contents of speech and music, with each part being selective to certain frequencies. However, this phenomenon reflects the frequency selectivity of the basilar membrane - an important function, not an inherent selectivity for speech or music. Similarly, if cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.

      We completely agree with the last statement, as our primary goal was not to investigate the functional mechanisms underlying speech and music processing. However, the finding of a substantial portion of the cortical network as being shared between the two domains constrains our understanding of the underlying common operations. Regarding the initial part of the comment, we would like to clarify that in the framework we propose, if cortical regions show heightened responses to one type of stimulus over another, this falls into the ‘preferred’ category. The ‘selective’ (exclusive) category, on the other hand, would require that the region be unresponsive to one of the two stimuli.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.

      Strengths:

      (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.

      (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.

      (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.

      Weaknesses:

      While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.

      The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. The selected musical stimuli, incorporating both vocals and multiple instrumental sounds, raise questions about the specificity of neural activation. For instance, it's unclear if the vocal elements in music and speech engage identical neural circuits. Additionally, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at a neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.

      We appreciate the reviewer's acknowledgment that delving into the intricate details of neural coding of speech and music was beyond the scope of this work. To address some of the more precise issues raised, we have clarified in the manuscript that our musical stimuli do not contain vocals and are purely instrumental. We apologize if this was not clear initially.

      “In the main experimental session, patients passively listened to ~10 minutes of storytelling (Gripari, 2004); 577 secs, La sorcière de la rue Mouffetard, (Gripari, 2004) and ~10 minutes of instrumental music (580 secs, Reflejos del Sur, (Oneness, 2006) separated by 3 minutes of rest.”

      Furthermore, we now acknowledge the importance of modeling melodic, phonetic, or linguistic features in the discussion, and we have referenced the work of Sankaran et al. (2024) and McCarty et al. (2023) in this regard. However, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      “These selective responses, not visible in primary cortical regions, seem independent of both low-level acoustic features and higher-order linguistic meaning (Norman-Haignere et al., 2015), and could subtend intermediate representations (Giordano et al., 2023) such as domain-dependent predictions (McCarty et al., 2023; Sankaran et al., 2023).”

      References:

      McCarty, M. J., Murphy, E., Scherschligt, X., Woolnough, O., Morse, C. W., Snyder, K., Mahon, B. Z., & Tandon, N. (2023). Intraoperative cortical localization of music and language reveals signatures of structural complexity in posterior temporal cortex. iScience, 26(7), 107223.

      Sankaran, N., Leonard, M. K., Theunissen, F., & Chang, E. F. (2023). Encoding of melody in the human auditory cortex. bioRxiv. https://doi.org/10.1101/2023.10.17.562771

      The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.

      We appreciate the reviewer's concern, but we do not view this as a weakness for our study's purpose. Every method inherently has limitations, and intracranial recordings currently offer the best possible spatial specificity and temporal resolution for studying the human brain. Studying cell assemblies thoroughly in humans is ethically challenging, and examining speech and music in non-human primates or rats raises questions about cross-species analogy. Therefore, despite its limitations, we believe intracranial recording remains the best option for addressing these questions in humans.

      Regarding the granularity of neural representation, while understanding how computations occur in the central nervous system is crucial, we question whether the single neuron scale provides the most informative insights. The single neuron approach seem more versatile (e.g., in term of cell type or layer affiliation) than the local circuitry they contribute to, which appears to be the brain's building blocks (e.g., like the laminar organization; see Mendoza-Halliday et al.,2024). Additionally, the population dynamics of these functional modules appear crucial for cognition and behavior (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023). Therefore, we emphasize the need for multi-scale research, as we believe that a variety of approaches will complement each other's weaknesses when taken individually. We clarified this in the introduction:

      “This approach rests on the idea that the canonical computations that underlie cognition and behavior are anchored in population dynamics of interacting functional modules (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023) and bound to spectral fingerprints consisting of network- and frequency-specific coherent oscillations (Siegel et al., 2012).”

      Importantly, we focus on the macro-scale and conclude that, at the anatomical region level, no speech or music selectivity can be observed during natural stimulation. This is stated in the discussion, as follow:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyses. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      References :

      Mendoza-Halliday, D., Major, A.J., Lee, N. et al. A ubiquitous spectrolaminar motif of local field potential power across the primate cortex. Nat Neurosci (2024).

      Safaie, M., Chang, J.C., Park, J. et al. Preserved neural dynamics across animals performing similar behaviour. Nature 623, 765–771 (2023).

      Buzsáki, G., & Vöröslakos, M. (2023). Brain rhythms have come of age. Neuron, 111(7), 922-926.

      While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category, or ratio-based statistics, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing.

      To clarify, the metrics we are investigating (coherence, power, linear correlations) are continuous. Additionally, we conduct a comprehensive statistical analysis of these results. The statistical testing, which includes assessing differences from baseline and between the speech and music conditions using a statistical threshold, yields three categories. Of note, ratio-based statistics (a continuous metric) are provided in Figures S9 and S10 (Figures S8 and S9 in the original version of the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain-specific or rather domain-general and shared? To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low-frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of the regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.

      Strengths:

      I found this manuscript to be rigorous providing compelling and clear evidence of shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity, and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches, and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence, and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis of the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.

      Weaknesses:

      I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.

      Thank you for this positive review of our work. We added these points as limitations and future directions in the discussion section:

      “Finally, in adopting here a comparative approach of speech and music – the two main auditory domains of human cognition – we only investigated one type of speech and of music also using a passive listening task. Future work is needed to investigate for instance whether different sentences or melodies activate the same selective frequency-specific distributed networks and to what extent these results are related to the passive listening context compared to a more active and natural context (e.g. conversation).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The concepts of activation and deactivation within the study's context of selectivity are not straightforward to comprehend. It would be beneficial for the authors to provide more detailed explanations of how these phenomena relate to the selectivity of neural responses to speech and music. Such elaboration would aid readers in better understanding the nuances of how certain brain regions are selectively activated or deactivated in response to different auditory stimuli.

      The reviewer is right that the reported results are quite complex to interpret. The concepts of activation and deactivation are generally complex to comprehend as they are in part defined by an approach (e.g., method and/or metric) and the scale of observation (Pfurtscheller et al., 1999). The power (or the magnitude) of time-frequency estimate is by definition a positive value. Deactivation (or desynchronization) is therefore related to the comparison used (e.g., baseline, control, condition). This is further complexified by the scale of the measurement, for instance, when it comes to a simple limb movement, some brain areas in sensory motor cortex are going to be activated, yet this phenomenon is accompanied at a finer scale by some desynchonization of the mu-activity, and such desynchronization is a relative measure (e.g., before/after motor movement). At a broader scale it is not rare to see some form of balance between brain networks, some being ‘inhibited’ to let some others be activated like the default mode network versus sensory-motor networks. In our case, when estimating selective responses, it is the strength of the signal that matters. The type of selectivity is then defined by the sign/direction of the comparison/subtraction. We now provide additional details about the sign of selectivity between domains and frequencies in the Methods and Results section:

      Methods:

      “In order to explore the full range of possible selective, preferred, or shared responses, we considered both responses greater and smaller than the baseline. Indeed, as neural populations can synchronize or desynchronize in response to sensory stimulation, we estimated these categories separately for significant activations and significant deactivations compared to baseline.”

      Results:

      “We classified, for each canonical frequency band, each channel into one of the categories mentioned above, i.e. shared, selective, or preferred (Figure 1A), by examining whether speech and/or music differ from baseline and whether they differ from each other. We also considered both activations and deactivations, compared to baseline, as both index a modulation of neural population activity, and have been linked with cognitive processes (Pfurtscheller & Lopes da Silva, 1999; Proix et al., 2022). However, because our aim was not to interpret specific increase or decrease with respect to the baseline, we here simply consider significant deviations from the baseline. In other words, when estimating selectivity, it is the strength of the response that matters, not its direction (activation, deactivation).”

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      References :

      J.P. Lachaux, J. Jung, N. Mainy, J.C. Dreher, O. Bertrand, M. Baciu, L. Minotti, D. Hoffmann, P. Kahane,Silence Is Golden: Transient Neural Deactivation in the Prefrontal Cortex during Attentive Reading, Cerebral Cortex, Volume 18, Issue 2, February 2008, Pages 443–450

      Pfurtscheller, G., & Da Silva, F. L. (1999). Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical neurophysiology, 110(11), 1842-1857

      (2) The manuscript doesn't easily provide information about the control conditions, yet the conclusion significantly depends on these conditions as a baseline. It would be beneficial if the authors could clarify this information for readers earlier and discuss how their choice of control stimuli influences their conclusions.

      We added information in the Results section about the baseline conditions:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Of note, while the choice of different ‘basic auditory stimuli’ as baseline can change the reported results in regions involved in low-level acoustical analyzes (auditory cortex), it will have no impact on the results observed in higher-level regions, which predominantly also exhibit shared responses. We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The spectral analyses section doesn't clearly explain how the authors performed multiwise correction. The authors' selectivity categorization appears similar to ANOVAs with posthoc tests, implying the need for certain corrections in the p values or categorization. Could the authors clarify this aspect?

      We apologize that this was not in the original version of the manuscript. In the spectral analyzes, the selectivity categorization depended on both (1) the difference effects between the domains and the baseline, and (2) the difference effect between domains. Channels were marked as selective when there was (1) a significant difference between domains and (2) only one domain significantly differed from the baseline. All difference effects were estimated using the paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the build-in tmax method to correct for the multiple comparisons over channels (Nichols & Holmes, 2002; Groppe et al. 2011). We have now more clearly explained how we controlled family-wise error in the Methods section:

      “For each frequency band and channel, the statistical difference between conditions was estimated with paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the tmax method to control the family-wise error rate (Nichols and Holmes 2002; Groppe et al. 2011). In tmax permutation testing, the null distribution is estimated by, for each channel (i.e. each comparison), swapping the condition labels (speech vs music or speech/music vs baseline) between epochs. After each permutation, the most extreme t-scores over channels (tmax) are selected for the null distribution. Finally, the t-scores of the observed data are computed and compared to the simulated tmax distribution, similar as in parametric hypothesis testing. Because with an increased number of comparisons, the chance of obtaining a large tmax (i.e. false discovery) also increases, the test automatically becomes more conservative when making more comparisons, as such correcting for the multiple comparison between channels.”

      References :

      Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., Parkkonen, L., & Hämäläinen, M. S. (2014). MNE software for processing MEG and EEG data. NeuroImage, 86, 446–460.

      Groppe, D. M., Bickel, S., Dykstra, A. R., Wang, X., Mégevand, P., Mercier, M. R., Lado, F. A., Mehta, A. D., & Honey, C. J. (2017). iELVis: An open source MATLAB toolbox for localizing and visualizing human intracranial electrode data. Journal of Neuroscience Methods, 281, 40–48.

      Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.

      Reviewer #2 (Recommendations For The Authors):

      Other suggestions:

      (1) The authors need to provide more details on how the sEEG electrodes were localized and selected. Are all electrodes included or only the ones located in the gray matter? If all electrodes were used, how to localize and label the ones that are outside of gray matter? In Figures 1C & 1D it seems that a lot of the electrodes were located in depth locations, how were the anatomical labels assigned for these electrodes

      We apologize that this was not clear in the original version of the manuscript. Our electrode localization procedure was based on several steps described in detail in Mercier et al., 2022. Once electrodes were localized in a post-implant CT-scan and the coordinates projected onto the pre-implant MRI, we were able to obtain the necessary information regarding brain tissues and anatomical region. That is, first, the segmentation of the pre-impant MRI with SPM12 provided both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (csf) probabilities) and the indexed-binary representations (i.e., either gray, white, csf, bone, or soft tissues) that allowed us to dismiss electrodes outside of the brain and select those in the gray matter. Second, the individual's brain was co-registered to a template brain, which allowed us to back project atlas parcels onto individual’s brain and assign anatomical labels to each electrode. The result of this procedure allowed us to group channels by anatomical parcels as defined by the Brainnetome atlas (Figure 1D), which informed the analyses presented in section Population Prevalence (Methods, Figures 4, 9-10, S4-5). Because this study relies on stereotactic EEG, and not Electro-Cortico-Graphy, recording sites include both gyri and sulci, while depth structures were not retained.

      We have now updated the “General preprocessing related to electrodes localisation” section in the Methods. The relevant part now states:

      “To precisely localize the channels, a procedure similar to the one used in the iELVis toolbox and in the fieldtrip toolbox was applied (Groppe et al., 2017; Stolk et al., 2018). First, we manually identified the location of each channel centroid on the post-implant CT scan using the Gardel software (Medina Villalon et al., 2018). Second, we performed volumetric segmentation and cortical reconstruction on the pre-implant MRI with the Freesurfer image analysis suite (documented and freely available for download online http://surfer.nmr.mgh.harvard.edu/). This segmentation of the pre-implant MRI with SPM12 provides us with both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (CSF) probabilities) and the indexed-binary representations (i.e., either gray, white, CSF, bone, or soft tissues). This information allowed us to reject electrodes not located in the brain. Third, the post-implant CT scan was coregistered to the pre-implant MRI via a rigid affine transformation and the pre-implant MRI was registered to MNI152 space, via a linear and a non-linear transformation from SPM12 methods (Penny et al., 2011), through the FieldTrip toolbox (Oostenveld et al., 2011). Fourth, applying the corresponding transformations, we mapped channel locations to the pre-implant MRI brain that was labeled using the volume-based Human Brainnetome Atlas (Fan et al., 2016).”

      Reference:

      Mercier, M. R., Dubarry, A.-S., Tadel, F., Avanzini, P., Axmacher, N., Cellier, D., Vecchio, M. D., Hamilton, L. S., Hermes, D., Kahana, M. J., Knight, R. T., Llorens, A., Megevand, P., Melloni, L., Miller, K. J., Piai, V., Puce, A., Ramsey, N. F., Schwiedrzik, C. M., … Oostenveld, R. (2022). Advances in human intracranial electroencephalography research, guidelines and good practices. NeuroImage, 260, 119438.

      (2) From Figures 5 and 6 (and also S4, S5), is it true that aside from the shared response, lower frequency bands show more music selectivity (blue dots), while higher frequency bands show more speech selectivity (red dots)? I am curious how the authors interpret this.

      The reviewer is right in noticing the asymmetric selective response to music and speech in lower and higher frequency bands. However, while this effect is apparent in the analyzes wherein we inspected stronger synchronization (activation) compared to baseline (Figures 2 and S1), the pattern appears to reverse when examining deactivation compared to baseline (Figures 3 and S2). In other words, there seems to be an overall stronger deactivation for speech in the lower frequency bands and a relatively stronger deactivation for music in the higher frequency bands.

      We now provide additional details about the sign of selectivity between domains and frequencies in the Results section:

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      Note, however, that this pattern of results depends on only a select number of patients, i.e. when ignoring regional selective responses that are driven by as few as 2 to 4 patients, the pattern disappears (Figures 5-6). More precisely, ignoring regions explored by a small number of patients almost completely clears the selective responses for both speech and music. For this reason, we do not feel confident interpreting the possible asymmetry in low vs high frequency bands differently encoding (activation or deactivation) speech and music.

      Minor:

      (1) P9 L234: Why only consider whether these channels were unresponsive to the other domain in the other frequency bands? What about the responsiveness to the target domain?

      We thank the reviewer for their interesting suggestion. The primary objective of the cross-frequency analyzes was to determine whether domain-selective channels for a given frequency band remain unresponsive (i.e. exclusive) to the other domain across frequency bands, or whether the observed selectivity is confined to specific frequency ranges (i.e.frequency-specific). In other words, does a given channel exclusively respond to one domain and never—in whichever frequency band—to the other domain? The idea behind this question is that, for a channel to be selectively involved in the encoding of one domain, it does not necessarily need to be sensitive to all timescales underlying that domain as long as it remains unresponsive to any timescale in the other domain. However, if the channel is sensitive to information that unfolds slowly in one domain and faster in the other domain, then the channel is no longer globally domain selective, but the selectivity is frequency-specific to each domain.

      The proposed analyzes answer a slightly different, albeit also meaningful, question: how many frequencies (or frequency bands) do selective responses span? From the results presented below, the reviewer can appreciate the overall steep decline in selective response beyond the single frequency band with only few channels remaining selectively responsive across maximally four frequency bands. That is, selective responses globally span one frequency band.

      Author response image 1.

      Cross-frequency channel selective responses. The top figure shows the results for the spectral analyzes (baselined against the tones condition, including both activation and deactivation). The bottom figure shows the results for the connectivity analyzes. For each plot, the first (leftmost) value corresponds to the percentage (%) of channels displaying a selective response in a specific frequency band. In the next value, we remove the channels that no longer respond selectively to the target domain for the following frequency band. The black dots at the bottom of the graph indicate which frequency bands were successively included in the analysis.

      (2) P21 L623: "Population prevalence." The subsection title should be in bold.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      The authors chose to use pure tone and syllables as baseline, I wonder if they also tried the rest period between tasks and if they could comment on how it differed and why they chose pure tones, (above and beyond a more active auditory baseline).

      This is an interesting suggestion. The reason for not using the baseline between speech and music listening (or right after) is that it will be strongly influenced by the previous stimulus. Indeed, after listening to the story it is likely that patients keep thinking about the story for a while. Similarly after listening to some music, the music remains in “our head” for some time.

      This is why we did not use rest but other auditory stimulation paradigms. Concerning the choice of pure tones and syllables, these happen to be used for clinical purposes to assess functioning of auditory regions. They also corresponded to a passive listening paradigm, simply with more basic auditory stimuli. We clarified this in the Results section:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Discussion - you might want to address phase information in contrast to power. Your encoding models map onto low-frequency (bandpassed) activity which includes power and phase. However, the high-frequency model includes only power. The model comparison is not completely fair and may drive part of the effects in Figure 7a. I would recommend discussing this, or alternatively ruling out the effect with modeling power separately for the low frequency.

      We thank the reviewer for their recommendation. First, we would like to emphasize that the chosen signal extraction techniques that we used are those most frequently reported in previous papers (e.g. Ding et al., 2012; Di Liberto et al., 2015; Mesgarani and Chang, 2012).

      Low-frequency (LF) phase and high-frequency (HFa) amplitude are also known to track acoustic rhythms in the speech signal in a joint manner (Zion-Golumbic et al., 2013; Ding et al., 2016). This is possibly due to the fact that HFa amplitude and LF phase dynamics have a somewhat similar temporal structure (see Lakatos et al., 2005 ; Canolty and Knight, 2010).

      Still, the reviewer is correct in pointing out the somewhat unfair model comparison and we appreciate the suggestion to rule out a potential confound. We now report in Supplementary Figure S8, a model comparison for LF amplitude vs. HFa amplitude to complement the findings displayed in Figure 7A. Overall, the reviewer can appreciate that using LF amplitude or phase does not change the results: LF (amplitude or phase) always better captures acoustic features than HFa amplitude.

      Author response image 2.

      TRF model comparison of low-frequency (LF) amplitude and high-frequency (HFa) amplitude. Models were investigated to quantify the encoding of the instantaneous envelope and the discrete acoustic onset edges (peakRate) by either the low frequency (LF) amplitude or the high frequency (HFa) amplitude. The ‘peakRate & LF amplitude’ model significantly captures the largest proportion of channels, and is, therefore, considered the winning model. Same conventions as in Figure 7A.

      References:

      Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends in Cognitive Sciences, 14(11), 506–515.

      Di Liberto, G. M., O’sullivan, J. A., & Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reflects phoneme-level processing. Current Biology, 25(19), 2457-2465.

      Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859.

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164.

      Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., ... & Schroeder, C. E. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77(5), 980-991.

      Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.

      Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233-236.

      Similarly, the Coherence analysis is affected by both power and phase and is not dissociated. i.e. if the authors wished they could repeat the coherence analysis with phase coherence (normalizing by the amplitude). Alternatively, this issue could be addressed in the discussion above

      We agree with the Reviewer. We have now better clarified our choice in the Methods section:

      “Our rationale to use coherence as functional connectivity metric was three fold. First, coherence analysis considers both magnitude and phase information. While the absence of dissociation can be criticized, signals with higher amplitude and/or SNR lead to better time-frequency estimates (which is not the case with a metric that would focus on phase only and therefore would be more likely to include estimates of various SNR). Second, we choose a metric that allows direct comparison between frequencies. As, at high frequencies phase angle changes more quickly, phase alignment/synchronization is less likely in comparison with lower frequencies. Third, we intend to align to previous work which, for the most part, used the measure of coherence most likely for the reasons explained above.“

    1. eLife assessment

      This important work substantially advances our understanding of episodic memory in individuals with aphantasia, and sheds light on the neural underpinnings of episodic memory and mental imagery. The evidence supporting the conclusions is convincing, including evidence from a well-established interview paradigm complemented with fMRI to assess neural activation during memory recall. The work will be of broad interest to memory researchers and mental imagery researchers alike.

    2. Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight to their spatial navigation abilities (they could have been overconfident or underconfident in their abilities). Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls to use their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

    3. Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in the activity in visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation. However, given that the most important comparisons are between groups of participants, this does not diminish the main conclusions about aphantasia.

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between the hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      We thank the reviewer for highlighting our contributions and suggesting that the relationship between visual imagery and autobiographical memory recall is an exciting future avenue.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' - a question which is ungrammatical, but potentially reflects a typo in the manuscript) could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight into their spatial navigation abilities (they could have been overconfident or underconfident in their abilities).

      Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls in using their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

      The main goal of our study was to examine autobiographical memory recall. Therefore, we used the gold standard Autobiographical Interview, or AI (Levine et al. 2002) and an fMRI paradigm to explore autobiographical memory recall as standardised, precisely, and objectively as possible.

      In addition to these experimentally rigorous tasks, we employed some loosely formulated questions with the intention for people to reflect on how they perceive their own abilities to recall autobiographical memories, navigate spatially, and use their imagination. We agree with the reviewer that these questions are vague and did not have the experimental standard for an investigation into spatial cognition or imagination associated with aphantasia. Nonetheless, we believe that these questions provide important additional insights into what participants think about their own cognitive abilities. In order to set these questions into perspective, we argue in the discussion that spatial cognition and other cognitive functions should be investigated in more depth in individuals with aphantasia in the future.

      As an additional note, all tasks were conducted in German. Thus, we were able to correct the wording of the debriefing question in our revision. We thank the reviewer for bringing this to our attention.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      We much appreciate the acknowledgment of our work into autobiographical memory employing both the autobiographical interview and fMRI. Furthermore, we hope that our work inspires future research in the way the reviewer outlines and in the way we describe in our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in activity in the visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between the visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      We thank the reviewer for highlighting the importance of our findings.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Once again, we thank the reviewer for highlighting the quality of our methods and our results.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation.

      We agree with the reviewer that our control task differs from autobiographical memory in many different ways. In fact, for this first investigation of the neural correlates of autobiographical memory in aphantasia, this is precisely the reason why we chose this mental arithmetic (MA) task. We know from previous studies, that MA is, as much as possible, not dependent on hippocampal memory processes (Addis, et al. 2007, McCormick et al. 2015, 2017, Leelaarporn et al., 2024). The main goal of the current study was to establish whether there are any differences between individuals with aphantasia and controls. In the next investigation, we can now build on these findings to disentangle in more detail what this difference reflects. 

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

      This highly positive conclusion is much appreciated.

      References

      Addis, D. R., Wong, A. T., & Schacter, D. L. (2007). Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration. Neuropsychologia45(7), 1363-1377.

      Kriegeskorte, N., Simmons, W., Bellgowan, P. et al. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci 12, 535–540 (2009). https://doi.org/10.1038/nn.2303

      Leelaarporn, P., Dalton, M. A., Stirnberg, R., Stöcker, T., Spottke, A., Schneider, A., & McCormick, C. (2024). Hippocampal subfields and their neocortical interactions during autobiographical memory. Imaging Neuroscience.

      Levine, B., Svoboda, E., Hay, J. F., Winocur, G., & Moscovitch, M. (2002). Aging and autobiographical memory: dissociating episodic from semantic retrieval. Psychology and aging17(4), 677.

      McCormick, C., St-Laurent, M., Ty, A., Valiante, T. A., & McAndrews, M. P. (2015). Functional and effective hippocampal–neocortical connectivity during construction and elaboration of autobiographical memory retrieval. Cerebral cortex25(5), 1297-1305.

      McCormick, C., Moscovitch, M., Valiante, T. A., Cohn, M., & McAndrews, M. P. (2018). Different neural routes to autobiographical memory recall in healthy people and individuals with left medial temporal lobe epilepsy. Neuropsychologia110, 26-36.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting article that makes a substantial contribution to the field of the study of aphantasia as well as the neural mechanisms of autobiographical memory. I would strongly recommend this manuscript to be accepted (with these minor revisions), as it makes a substantial and well-evidenced contribution to the research, and it opens up many interesting avenues for researchers to explore. I was especially excited to see that the Autobiographical Interview had been paired with an fMRI paradigm, something which this field of research highly benefits from, as there are yet so few fMRI studies into aphantasia. I understand that it is the authors' decision whether to accept or reject any of the revisions I recommend here, but I would like to stress that I encourage accepting the recommended revisions, especially as there are some minor inaccuracies in the manuscript as it currently stands. Finally, I would like to stress that though I am based in the area of cognitive science, am not trained in fMRI imaging techniques, and therefore do not stand in a position where I can comment on the methodology pertaining to this part of the study - I encourage the Editors to seek a second reviewer's opinion on this.

      Thank you for the positive evaluation of our manuscript as well as your comments. We have revised our manuscript according to your important suggestions as further explained below.

      Line 33: "aphantasia prohibits people from experiencing visual imagery". This  characterisation of aphantasia is too strong, especially as the authors use 32 as a cut-off point on the VVIQ, which represents weak and dim imagery. I would recommend using language like 'people with aphantasia have reduced visual imagery abilities', as this more accurately captures the group of people studied. Please revise throughout the manuscript. Please consult Blomkvist and Marks (2023) on this point who have discussed this problem in the aphantasia literature.

      We agree that aphantasics may experience reduced visual imagery abilities. We have revised our wording throughout the manuscript.

      Line 49: The authors conclude that their results 'indicate that visual mental imagery is essential for detail-rich, vivid AM', but this seems to be a bit too strong, for example since AM can be detail-rich with external (rather than internal) detail, and a person could potentially use mnemonic tricks such as keeping a detail-rich diary in order to boost their memory. That visual imagery is 'essential' implies that it is the only way to achieve detail-rich vivid AM, and this does not seem to be supported by the findings. I would recommend rephrasing it as 'visual mental imagery plays an important role in detail-rich, vivid AM' or 'visual mental imagery mediated detail-rich vivid AM'.

      We altered the sentence in Line 49 using one of the recommended phrases:

      ‘Our results indicate that visual mental imagery plays an important role in detail-rich, vivid AM, and that this type of cognitive function is supported by the functional connection between the hippocampus and the visual-perceptual cortex.’

      Line 69: Blomkvist and Marks (2023) have warned against calling aphantasia a 'condition' and this moreover seems to fit with the authors' previous research (Monzel, 2022). Please consider instead calling aphantasia an 'individual difference' in mental imagery abilities.

      Thank you for the suggestion. We have revised our wording throughout the manuscript, avoiding the term ‘condition’.

      Line 72: Add reference for emotional strength which has also been researched (Wicken et al. 2021, https://doi.org/10.1016/j.cortex.2020.11.014).

      We have added the suggested reference in Line 75:

      ‘Indeed, a handful of previous studies report convergent evidence that aphantasics report less sensory AM details than controls (Bainbridge et al., 2021; Dawes et al., 2020, 2022; Milton et al., 2020; Zeman et al., 2020), which may also be less emotional (Monzel et al., 2023; Wicken et al., 2021).’

      72-73: 'absence of voluntary imagery' - too strong as many people with aphantasia report having weak/dim mental imagery on the VVIQ.

      We agree that aphantasics may experience reduced visual imagery. We have revised this notion throughout the manuscript.

      74: Add reference to Bainbridge study which found a difference between recall of object vs spatial memory. This would be relevant here.

      We have added the suggested reference in Line 76:

      ‘Spatial accuracy, on the other hand, was not found to be impaired (Bainbridge et al., 2021).’

      Lines 94-97: The authors mention 'a prominent theory' but it is unclear which theory is referred to here. The article cited by Pearson (2019) does not suggest the possibility that aphantasia is due to altered connectivity between the hippocampus and visual-perceptual cortices. It suggests that aphantasia is due to impairment in the ventral stream, and in fact says that the hippocampus is unlikely to be affected due to spared spatial abilities in people with aphantasia. Specifically, Pearson claims: "Accordingly, memory areas of the brain that process spatial properties, including the hippocampus, may not be the underlying cause of aphantasia." (page 631). The authors further come back to this point in the discussion section (see comment below), saying that the hypothesis attributed to Pearson is supported by their study. I do not disagree with the point that the hypothesis is supported by the data, but it is unclear to me why the hypothesis is attributed to Pearson.

      Thank you for pointing out this inaccuracy. We have edited the text to spell out our entire train of thought (see Lines 96-102):

      ‘A prominent theory posits that because of this hyperactivity, small signals elicited during the construction of mental imagery may not be detected (Pearson, 2019, Keogh et al., 2020). Pearson further speculates that since spatial abilities seem to be spared, the hippocampus may not be the underlying cause of aphantasia. In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Line 97: Blomkvist reference should be 2022 (when first published online).

      The article ‘Aphantasia: In search of a theory’ by Blomkvist was first published on 1st July 2022. However, a correction was added on 13th March 2023. Therefore, we had cited the corrected version in this manuscript. However, we agree that the first publication date should be used and edited the reference accordingly.

      Line 116: 'one aphantasic' could be seen as offensive. I would suggest 'one aphantasic participant'.

      We have altered the paragraph according to your suggestion.

      Line 138: In line with the recommendations put forward by Blomkvist and Marks (2023), I would suggest removing the word 'diagnosed', as this medicalises aphantasia in a way that is not consistent with its not being a kind of mental disorder (Monzel et al., 2022). I would say that aphantasia is instead operationalised as a score between 16-32. However, note that Blomkvist (2022) and Blomkvist and Marks (2023, https://doi.org/10.1016/j.cortex.2023.09.004 ) point out that there is also a lot of inconsistency in this score and how it is used in different studies. In your manuscript, I would recommend removing all wording that indicates that people with aphantasia have no experience of mental imagery, as you have operationalised for a score up to 32 which indicates vague and dim imagery. Describing vague and dim imagery as no imagery/absence of imagery is inconsistent (but common practice in the literature).

      Thank you for your suggestion. We have revised the entire manuscript to eliminate any ambiguous meanings regarding the definition of aphantasia. Moreover, we replaced the word ‘diagnosed’ with ‘identified’ in Line 146.

      Line 153: maybe 'correlated with imagery strength' rather than 'measures imagery strength'?

      We have altered the sentence according to your suggestion in Line 160:

      ‘Previous studies have shown that the binocular rivalry task validly correlated with mental imagery strength.’

      Line 162: "For participants who were younger than 34 years, the middle-age memory was replaced by another early adulthood memory". Is there precedence for this? Please add one sentence to explain/justify for the reader why a memory from this time period was chosen.

      To maintain the homogeneous data set of acquiring five episodic autobiographical memories from five different periods of life per one individual, we asked the participants who were at the time of the interview, younger than 34 years old, to provide another early adulthood memory instead of middle age memory, as they had not reached the age range of middle age. According to Levine et al. (2002), younger adults (age < 34 years old) selected 2 events from the early adulthood period. Hence, all participants provided the last time period with memories from their previous year. We have added an additional explanation in this section in Line 170:

      ‘In order to acquire five AMs in every participant, the middle age memory was replaced by another early adulthood memory for participants who were younger than 34 years old (see Levine et al., 2002). Hence, all participants provided the last time period with memories from their previous year.’

      Line 169: "During the general probe, the interviewer asked the participant encouragingly to promote any additional details." Consider a different word choice, 'promote' sounds odd.

      We have altered the sentence according to your suggestion in Line 180:

      ‘During the general probe, the interviewer asked the participant encouragingly to provide any additional details.’

      Line 196-198: the phrasing of these questions could have biased participants toward reporting it being more difficult. Did the authors control for this possibility in any way? The phrasing ‘How easy is it for you to [x]?’ might also be considered in a future study.

      Thank you for pointing this out. These debriefing questions were thought of as open questions to get people to talk about their experiences. They were not meant as rigorous scientific experiments. Framing it in a positive way is a good idea for future research.

      We have edited the manuscript on Line 394-396:

      ‘The debriefing questions were employed as a way for participants to reflect on their own cognitive abilities. Of note, these were not meant to represent or replace necessary future experiments.’

      Line 197: This question is ungrammatical. Is this a typo, or was this how the question was actually posed? What language was the study conducted in?

      All interviews within this study were conducted in German. Hence, the questions listed in this current manuscript were all translated from German into English. We have added this information in the Materials and Methods section in Line 169 as well as restructured the referred questions from Line 208-210:

      ‘All interviews were conducted in German.’

      (1) Typically, how difficult is it for you to recall autobiographical memories?

      (2) Typically, how difficult is it for you to orient yourself spatially? 

      (3) Typically, how difficult is it for you to use your imagination?’

      Line 211: The authors write that participants were asked to "re-experience the chosen AM and elaborate as many details as possible in their mind's eye" was this the instruction used? I think stating the explicit instruction here would be relevant for the reader. If this is the word choice, it is also interesting as the autobiographical interview does not normally specify to re-experience details 'in one's mind's eye'.

      The instructions gi‘en to ’he par’Icipa’ts were to choose an AM and re-experience/elaborate it in their mind with as many details as possible without explaining them out loud. We have clarified this in Lines 221-223.

      ‘For the rest of the trial duration, participants were asked to re-experience the chosen AM and try to recall as many details as possible without speaking out loud.’

      Line 213: Were ‘vivid’ and ‘faint’ the only two options? Why was a 5-point scale (like the VVIQ scale) not used to better be able to compare?

      During the scanning session, the participants were given a button box which contained two buttons with 'vivid' by pressing the index finger and 'faint' by pressing the middle finger. The 5-point scale was not used to avoid confusion with the buttons during the scanning session. We have clarified this in Line 224:

      ‘We chose a simple two-button response in order to keep the task as easy as possible.’

      Line 347: Do the authors mean the same thing by 'imagery strength' and 'imagery vividness'? This would be good to clarify as it is not clear that these words mean the same thing.

      Imagery strength is often used to describe the results of the Binocular Rivalry Task, whereas vividness of mental imagery is often used to describe the results of the VVIQ. Although both tasks are correlated, the VVIQ measures vividness, whereas the dimension of the Binocular Rivalry Task is not clearly defined. We added this information in a footnote on page 10.

      Lines 353 - 356: When the authors first say that aphantasics described fewer memory details than controls, does this refer to external + internal details? Please clarify.

      Lines 353-360: The authors first say that aphantasics report "internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94)" (line 355). But then they say: "a 2-way interaction was found between the type of memory details and group, F(1, 27)= 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b)" (line 358). This seems to first say that aphantasics didn't report fewer details than controls, but then that they did report fewer internal details than controls. Please clarify if this is correct.

      Line 383: Results from controls are not reported in this section.

      We have first reported the main effects of the different factors; thus, aphantasics reported less details than controls (no matter of group and type of memory details), the internal details were reported more often than external details (no matter of group and memory period), and more details were reported for recent than remote memories (no matter of group and type of memory details). Subsequently, we report the simple effects for aphantasics and controls separately. To further clarify, we added the following segment in line 360:

      ‘Regarding the AI, we found significant main effects of memory period, F(1, 27) = 11.88, p = .002, ηp2 = .31, type of memory details, F(1, 27) = 189.03, p < .001, ηp2 = .88, and group, F(1, 27) = 9.98, p = .004, ηp2 = .27. When the other conditions were collapsed, aphantasics (M = 26.29, SD = 9.58) described less memory details than controls (M = 38.36, SD = 10.99). For aphantasics and controls combined, more details were reported for recent (M = 35.17, SD = 14.19) than remote memories (M = 29.06, SD = 11.12), and internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94). More importantly, a 2-way interaction was found between type of memory details and group, F(1, 27) = 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b).’

      Overall, the results were reported for aphantasics and controls separately in Lines 368-372.

      Line 386: The question does not specify that it's asking about using imagination in daily life, even though this is what results report. I'm not sure that the question implies the use of imagination in daily life, so I would recommend removing this reference here.

      We have removed the “in daily life” since this was not part of the original debriefing question.

      Line 394: Could this slowness in response reflect uncertainty about the vividness?

      Since the reason for this slowness is not known, we have refrained from adding this to the discussion. However, we added this as a short insertion in line 406:

      ‘Moreover, aphantasics responded slower (M = 1.34 s, SD = 0.38 s) than controls (M = 1.00 s, SD = 0.29 s) when they were asked whether their retrieved memories were vivid or faint, t(28) = 2.78, p = .009, possibly reflecting uncertainty in their response.’

      Line 443: Graph E, significance not indicated on the graph.

      After preprocessing, the fMRI data were statistically analyzed using the GLM contrast AM versus MA. The resulting images were then thresholded at p < 0.001, so that the illuminated voxels in Fig. 3 A, B, C, and D show only voxel in which we know already that there is a statistical difference between our conditions. Graph E illustrates only the descriptive means and variance of the significant differences in Fig. 3 C and D. This display is useful since the reader can more easily assess the difference between two conditions and two groups at a glance. For a general discussion on this topic, please also see circular analysis in fMRI (Kriegeskorte et al. 2009)

      Line 521-522: The authors claim that Pearson (2019) forwards the hypothesis that heightened activity of visual-perceptual cortices hinders aphantasics from detecting small imagery-related signals. However, I find no statement of this hypothesis in Pearson (2019). It is unclear to me why this hypothesis is attributed to Pearson (2019). Please remove this reference or provide a correct citation for where the hypothesis is stated. Further, it is not clear from what is written how the results support this hypothesis as this is rather brief - please elaborate on this.

      We attributed this hypothesis to Pearson (2019) according to his Fig. 4, which states: ‘A strong top-down signal and low noise (bottom left) gives the strongest mental image (square), whereas a high level of neural noise and a weak top-down imagery signal would produce the weakest imagery experience (top right).’

      We have edited our manuscript to reflect Pearson better in Lines 543-550:

      ‘In a prominent review, Pearson synthesizes evidence about the neural mechanism of imagery strength (Pearson, 2019). Indeed, activity metrics in the visual cortex predict imagery strength (Cui et al., 2007; Dijkstra et al., 2017). Interestingly, lower resting activity and excitability result in stronger imagery, and reducing cortical activity in the visual cortex via transcranial direct current stimulation (tDCS) increases visual imagery strength (Keogh et al., 2020). Thus, one potential mechanism of aphantasia-related AM deficits is that the heightened activity of the visual-perceptual cortices observed in our and previous work hinders aphantasics to detect weaker imagery-related signals.’

      Line 575: Consider citing Blomkvist (2022) who has argued that aphantasia is an episodic memory condition

      We added the suggested reference in Line 601.

      Line 585: Consider citing Bainbridge et al (2021) https://doi.org/10.1016/j.cortex.2020.11.014

      We have added the suggested reference in Line 612.

      Line 581: It might be relevant here to also discuss non-visual details, which have indeed been investigated in your present study. E.g. the lower emotional details, temporal details, place details, etc.

      We have edited our discussion to reflect the non-visual details better in Line 605:

      ‘In fact, previous and the current study show that aphantasics and individuals with hippocampal damage report less internal details across several memory detail subcategories, such as emotional details and temporal details (Rosenbaum et al., 2008; St-Laurent et al., 2009; Steinvorth et al., 2005), and these deficits can be observed regardless of the recency of the memory (Miller et al., 2020). These similarities suggest that aphantasics are not merely missing the visual-perceptual details to specific AM, but they have a profound deficit associated with the retrieval of AM.’

      Place details are discussed on page 37 onwards.

      Line 605: I agree with this interesting suggestion for future research. It would also be relevant to reference Bainbridge (2021) here who tested spatial cognition in a drawing task and found that aphantasic participants correctly recalled spatial layouts of rooms but reported fewer objects than controls. It might also be worth pointing out that the present study does not actually test for accuracy in spatial cognition, so it could be the case that people with aphantasia feel confident that they can navigate well, but they might in fact not. Future studies relying on objective measures should test this possibility.

      We have added the suggested reference in Line 625.

      Lines 609-614: Is there any evidence that complex decision-making and complex empathy tasks depend on constructed scenes with visual-perceptual details? This hypothesis seems a bit far-fetched without any supporting evidence. In fact, it seems unlikely to be supported as we also know that people with aphantasia generally live normal lives, and often have careers that we can assume involve complex decision-making (see Zeman 2020 who report aphantasics who work as computer scientists, managers, etc). I would recommend that the authors provide evidence of the role of mental imagery in complex decision-making and complex empathy tasks, mediated by scene construction, to support this hypothesis as viable to test for future research. It is also unclear how this point connects to the argument made by Bergmann and Ortiz-Tudela (2023). In fact, Bergmann and Ortiz-Tudela seem to make the same argument as Pearson (2019) does - that aphantasia results from impairments in the ventral stream, but that the dorsal stream is unaffected. However, Blomkvist (2022) argues that this view is too simplistic to be able to account for the variety of deficits that we see in aphantasia. I would recommend either engaging more fully with this debate or cutting it, as it currently is too vague for a reader to follow.

      We have decided to leave the discussion about scene construction and its connection to complex decision making and empathy out of the current manuscript. We have included the argument of Bergmann & Ortiz-Tudela (2023) in the Introduction (Line 101):

      ‘In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Reviewer #2 (Recommendations For The Authors):

      In general, I really enjoyed reading this paper.

      Thank you very much for the positive evaluation of our manuscript as well as your comments.

      There were only a few things that I had some concerns about. For example, it was unclear to me whether the whole-brain analysis (Figures 3 and 4) was corrected for multiple comparisons or why only a small volume correction was applied for the functional connectivity analysis. If these results are borderline significant, this should be made more explicit in the manuscript. I don't think this is a major issue as the investigation of both the hippocampus and visual cortex was strongly hypothesis-driven, but it would still be good to be explicit about the strength of the findings.

      For the whole-brain analysis, we applied a threshold of p < .001, voxel cluster of 10, but no other multiple comparisons correction applied. The peak in the right hippocampus did survive the whole-brain threshold but we decided to lower this threshold just for display purposes in Figure 3, so that the readers can easily see the cluster.

      We have made the statistical thresholds more easily assessable for the reader on the following pages:

      Figure 3 (Page 27): ‘Images are thresholded at p < .001, cluster size 10, uncorrected, except (D) which is thresholded at p < .01, cluster size 10, for display purposes only (i.e., the peak voxel and adjacent 10 voxels also survived p < .001, uncorrected).’

      Figure 4 (Page 30): ‘Image is displayed at p < .05, small volume corrected, and a voxel cluster threshold of 10 adjacent voxels.’

      I was wondering whether it would be possible to use DCM to investigate the directionality of the connectivity. Given that there are only two ROIs and two alternative hypotheses (top-down versus bottom-up) this seems like an ideal DCM problem.

      We thank the reviewer for this suggestion and will consider testing the effective connectivity between both regions of interest in a future investigation. 

      Line 385: typo: 'great' should be 'greater'.

      We have altered the typo from ‘great’ to ‘greater’ in Line 397.

      Line 400: absence of evidence of an effect is not evidence of absence of an effect.

      We agree with the reviewer that this was unclear. We changed the wording in Line 412:

      ‘In addition, aphantasics and controls did not differ significantly in their time searching for a memory in AM trials, t(19) = 1.03, p = .315.’

      Typo line 623: 'overseas'.

      We have altered the mistyped word from ‘overseas’ to ‘oversees’ in Line 647.

    1. Reviewer #1 (Public Review):

      Summary:

      This is an experimentally soundly designed work and a very well-written manuscript. There is a very clear logic that drives the reader from one experiment to the next, the experimental design is clearly explained throughout and the relevance of the acquired data is well analyzed and supports the claims made by the authors. The authors made an evident effort to combine imaging, genetic, and molecular data to describe previously unknown early embryonic movement patterns and to identify regulatory mechanisms that control several aspects of it.

      Strengths:

      The authors develop a new method to analyze, quantitatively, the onset of movement during the latter embryonic stages of Drosophila development. This setup allows for a high throughput analysis of general movement dynamics based on the capture of variations of light intensity reflected by the embryo. This setup is capable of imaging several embryos simultaneously and provides a detailed measure of movement over time, which proves to be very useful for further discoveries in the manuscript. This setup already provides a thorough and quantifiable description of a process that is little known and identifies two different phases during late embryonic movements: a myogenic phase and a neurogenic phase, which they elegantly prove is dependent on neuronal activity by knocking down action potentials across the nervous system.

      However, in this system, movement is detected as a whole, and no further description of the type of movement is provided beyond frequency and amplitude; it would be interesting to know from the authors if a more precise description of the movements that take place at this stage can be achieved with this method (e.g. motion patterns across the A-P body axis).

      Importantly, this highly quantitative experimental setup is an excellent system for performing screenings of motion regulators during late embryonic development, and its use could be extended to search for different modulators of the process, beyond miRNAs (genetic mutants, drugs, etc.).

      Using their newly established motion detection pipeline, the authors identify miR-2b-1 as required for proper larval and embryonic motion, and identify an overall reduction in the quantity of both myogenic and neurogenic movements, as well as an increased frequency in neurogenic movement "pulses".

      Focusing on the neurogenic movement phenotype the authors use in situ probes and perform RT-PCR on FACS-sorted CNS cells to unambiguously detect miR-2b-1 expression in the embryonic nervous system. The neurogenic motion defects observed in miR-2b-1 mutant embryos and early larvae can be completely rescued by the expression of ectopic miR-2b-1 specifically in the nervous system, providing solid evidence of the requirement and sufficiency of miR-2b-1 expressed in the nervous system to regulate these phases of movement.

      To explore the mechanism through which miR-2b-1 impacts embryonic movement, the authors use a state-of-the-art bioinformatic approach to identify potential targets of miR-2b-1, and find that the expression levels of an uncharacterized gene, CG3638, are indeed regulated by miR-2b-1. Furthermore, they prove that by knocking down the expression of CG3638 in a miR-2b-1 mutant background, the neurogenic embryonic movement defects are rescued, pointing that the repression of CG3638 by miR-2b-1 is necessary for correct motion patterns in wild-type embryos. Therefore, this paper provides the first functional characterization of CG3638, and names this gene Motor.

      Finally, the authors aim to discriminate which elements of the embryonic motor system miR-2b-1/Motor are required. Using directed overexpression of miR-2b-1 and Motor knockdown in the motor neurons and the chordotonal (sensory) organs, they prove that the miR-2b-1/Motor regulatory axis is specifically required in the sensory organs to promote normal embryonic and larval movement.

      Weaknesses:

      The initial screening to identify miRNAs involved in motion behaviors is performed in early larval movement. The logic presented by the authors is clear - it is assumed that early larval movement cannot proceed normally in the absence of previous embryonic motion - and ultimately helped them identify a miRNA required for modulation of embryonic movement. However, it is possible that certain miRNAs play a role in the modulation of embryonic movement while being dispensable for early L1 behaviors. Such regulators might have been missed with the current screening setup.

    2. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript, "A microRNA that controls the emergence of embryonic movement" by Menzies, Chagas, and Alonso provides evidence that Drosophila miR-2b-1 is expressed in neurons and controls the expression of the predicted chloride channel CG3638, here named "Motor". Loss of the miRNA leads to movement phenotypes that can be rescued by downregulation of Motor; using specific drivers, the authors show that a larval movement phenotype (slower movement) can be rescued by knockdown of Motor in the chordotonal organs, suggesting that the increase in Motor found in the chordotonal organs is likely the root of the movement defects. Overall, I found the data presented in the manuscript of reasonable quality and are well enough supported by the presented data.

      The genetic and phenotypic analysis seems to be correct. The nicest part of the manuscript is the connection between the loss of a miRNA and finding its likely target in generating a phenotype. The authors also develop some protocols for the analysis of the movement phenotypes which may be useful for others.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the Authors:

      Reviewer 1:

      (1) Figure legends are too sparing, and often fail to describe with enough detail and accuracy the experiments presented. Especially in a work like this one, which uses plenty of different approaches and techniques and has a concise main text, description in the figure legends can really help the reader to understand the technical aspects of the experimental design. In my opinion, this will also help highlight the effort the authors put into exploring different and often new technical approaches. 

      We thank Reviewer 1 for highlighting this point and agree with them that the original figure legends lacked detailed information. In this revised version of our paper we edited all figure legends providing higher detail on experiments and information displayed (see Main text p12-16, Supplementary Information p2-5). We hope this change will improve the clarity and accuracy of the description of our experiments. 

      Reviewer 2:

      (1) Is there evidence that the early movement phenotype is actually linked to the larval movement phenotype? I noticed that the chordotonal driver experiment was only examined for larval movement. Is this driver not expressed earlier? Could the authors check the early phenotype using this driver? Are there early drivers that are expressed in chordotonal organ precursors (not panneuronal) and does the knockdown of CG3638 in these specific cells suppress the early phenotype?

      (2) More broadly, I would like to understand the function of the early embryonic movements. My concern is that they may only be a sign that the nervous system is firing up. If the rescue of the late miRNA mutant phenotype with chordotonal organ expression is only through a late change in the expression of CG3638, then the larval phenotype is probably not due to a developmental change, but a change in the immediate functioning of the neurons. Would this suggest that the early pulsing is not required for anything, at least at our level of understanding? If the driver is actually expressed early and late, then perhaps the authors could test later drivers to delimit the early and late functions of the miRNA? 

      The comments by Reviewer 2 in the points above are important and enquire about the biological role of early embryonic movements and whether these movements are linked to later larval activity or are somewhat irrelevant to the behaviour of the animal at later stages. 

      To address this important question, we conducted a new experiment in which we reduced neural activity specifically in the embryo (i.e. from 10hs AEL until the end of embryogenesis) and tested whether this treatment had any impact on larval movement. If – as put by Rev2 – the ‘early pulsing is not required for anything’ and the larval phenotype emerges from an acute change in neuronal physiology, then our experiment should show no effects at the larval stage. The results shown in Figure S4 (see Supplementary Information, p5) show that this is not the case: artificial reduction of neural activity during embryogenesis leads to a statistically significant reduction in larval speed, similar to that caused by the loss of miR-2b-1. This shows that modifications of embryonic activity impact larval movement. 

      Furthermore, earlier work on the biological role of embryonic activity identified an activity-dependent ‘critical period’ during late embryogenesis (Giachello and Baines, 2015; Ackerman et al., 2021): manipulations at or around this critical period result in both locomotor and seizure phenotypes in larvae. We cite these papers in the main text (p7).

      In addition, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity specifically during the embryonic period prevents the generation of normal neural activity patterns in both, embryo and larva. Similar results are observed when proprioceptive sensory inputs to the central nervous system are blocked, with larval locomotion also disrupted. 

      Altogether, the data already in the literature plus our new addition to the paper, show that early embryonic movements play a key role in the development of the nervous system and larval locomotion.

      (3) Given the role in the larval chordotonal organs, have the authors also checked the adult movements? 

      The question of whether miR-2b-1 action in chordotonal organs affects behaviour at later stages of the Drosophila life cycle is interesting and was the reason why we assessed different genetic manipulations at the larval stage. However, we believe that assessing adult locomotor phenotypes is beyond the scope of this paper. 

      (4) The authors state that mir-2b-1 is a mirtron. I do not believe this is correct. It is not present in an intron in Btk from what I can see. Also, in the reference that the authors use when stating that mir-2b is a mirtron, I believe mir-2b-1 is actually used as a non-mirtron control miRNA. As mirtrons are processed slightly differently from regular hairpins and often use only the 3' end of the hairpin for miRNA creation, this may not be a trivial distinction. 

      We are grateful to Rev2 for highlighting this point: indeed, as they say, miR-2b-1 is located in the 3’UTR of host gene Btk, rather than in an intron. Accordingly, in this revision we remove the comment on miR-2b-1 being a mirtron (p6) and deleted the citation accordingly. 

      (5) For miRNA detection, the authors use in situ hybridization and QPCR. Both methods show that the gene is expressed but not that the mature miRNA is made. If the authors wanted a truly independent test for the presence of the miRNA, a miRNA sensor might be a better choice and it would hint at which part of the hairpin makes the functional miRNA. This is probably not necessary but could be a nice addition. 

      We thank Rev2 for drawing attention to this point and allowing this clarification. The qPCR protocol we used is based on the method developed by Balcells et al., 2011 (w/303 citations) (see Materials and Methods section in Supplementary Information, p14) which allows the specific amplification of mature miRNA transcripts, and not their precursors. This method for mature miRNA PCR is so robust that it has even been patented (WO2010085966A2). To ensure that the reader is clear about our methods, we state in the main text (p6) that we perform "RT-PCR for the mature miRNA transcript".  [NB: miRNA sensors provide a useful method to assess miRNA expression but can also act as competitive inhibitors of physiological miRNA functions, titrating away miRNA molecules from their real targets in tissue; therefore, results using this method are often difficult to interpret.]

      (6) Curious about mir-2b-1 and any overlap with the related mir2b-2 and the mir2a genes. I am just wondering about the similarity in their sequences/targets and if they might have similar phenotypes or enhance the phenotypes being scored by the authors. 

      This is an interesting point raised by REV2 and indeed miR-2b-1 does belong to the largest family of microRNAs in Drosophila, the miR-2 family, discussed in detail by Marco et al., 2012. However, we consider that performing tests of additional miRNA mutations, both individually and in combination with miR-2b-1, is beyond the scope of this paper.

      (7) Related to this, the authors show that the reduction of a single miRNA target suppresses the miRNA loss of function phenotype. This indicates that this target is quite important for this miRNA. I wonder if the target site is conserved in the human gene that the authors highlight.

      This is another interesting comment by Rev2. To pursue their idea, we have performed a blast for the miR-2b-1 target site in the human orthologs of CG3638 and did not find a match suggesting that the relationship between miR-2b-1 and CG3638 is not evolutionarily preserved between insects and mammals. 

      Public Reviews:

      Reviewer #1:

      Weaknesses: 

      The authors do not describe properly how the miRNA screening was performed and just claim that only miR-2b-1 mutants presented a defective motion phenotype in early L1. How many miRNAs were tested, and how candidates were selected is never explicitly mentioned in the text or the Methods section.

      We identified miR-2b-1 as part of a genetic screen aimed at detecting miRNAs with impact on embryonic movement, but this full screen is not yet complete. Seeing the clear phenotype of miR2b-1 in the embryo prompted us to study this miRNA in detail, which is what we report in this paper. 

      The initial screening to identify miRNAs involved in motion behaviors is performed in early larval movement. The logic presented by the authors is clear - it is assumed that early larval movement cannot proceed normally in the absence of previous embryonic motion - and ultimately helped them identify a miRNA required for modulation of embryonic movement. However, it is possible that certain miRNAs play a role in the modulation of embryonic movement while being dispensable for early L1 behaviors. Such regulators might have been missed with the current screening setup. Although similar changes to those described for the neurogenic phase of embryonic movement are described for the myogenic phase in miR-2b-1 mutants (reduction in motion amplitude), this phenotype goes unexplored. This is not a big issue, as the authors convincingly demonstrate later that miR-2b-1 is specifically required in the nervous system for proper embryonic and larval movement, and the effects of miR-2b-1 on myogenic movement might as well be the focus of future work. However, it will be interesting to discuss here the implications of a reduced myogenic movement phase, especially as miR-2b-1 is specifically involved in regulating the activity of the chordotonal system - which precisely detects early myogenic movements. 

      We thank Rev1 for their interest in that loss of miR-2b-1 results in a decrease in movement during the myogenic phase, in addition to the neurogenic phase. Indeed, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity during a period that overlaps with the myogenic phase prevents the formation of normal neural activity patterns and larval locomotion. They also observe the same when inhibiting proprioceptive sensory inputs to the central nervous system. This could suggest that the effects of miR-2b-1 on the myogenic phase might have ‘knock-on’ effects upon the later neurogenic phase and larval movement. However, we note that genetic restoration of miR-2b-1 expression specifically to neurons completely rescues the larval speed phenotype (Fig. 3G), suggesting that the dominant effect of miR-2b-1 upon movements is through its action within neurons. To recognise Rev1’s comment we have added a short sentence to the text (p7) suggesting that ‘the effects of miR-2b-1 observed at earlier stages (myogenic phase) are possibly offset by normal neural expression of miR-2b-1’.  

      FACS-sorting of neuronal cells followed by RT-PCR convincingly detects the presence of miR-2b-1 in the embryonic CNS. However, control of non-neuronal cells would be required to explore whether miR-2b-1 is not only present but enriched in the nervous system compared to other tissues. This is also the case in the miR-2b-1 and Janus expression analysis in the chordotonal organs: a control sample from the motor neurons would help discriminate whether miR-2b-1/Janus regulatory axis is specifically enriched in chordotonal organs or whether both genes are expressed throughout the CNS but operate under a different regulation or requirements for the movement phenotypes.

      The RNA in situ hybridisation data included in the paper (Fig. 3B) show that RNA probes for miR2b-1 precursors reveal very strong signal in neural tissue – with very low signal detected in other tissues – strongly indicating that expression of miR-2b-1 is highly enriched in the nervous system.

      Reviewer #2:

      Weaknesses: 

      As I mentioned above, I felt the presentation was a bit overstated. The authors present their data in a way that focuses on movement, the emergence of movement, and how their miRNA of interest is at the center of this topic. I only point to the title and name that they wish to give the target of their miRNA to emphasize this point. "Janus" the GOD of movement and change. The results and discussion section starts with a paragraph saying, "Movement is the main output of the nervous system... how developing embryos manage to organise the necessary molecular, cellular, and physiological processes to initiate patterned movement is still unknown. Although it is clear that the genetic system plays a role, how genes control the formation, maturation and function of the cellular networks underlying the emergence of motor control remains poorly understood." While there is nothing inherently untrue about these statements, it is a question of levels of understanding. One can always argue that something in biology is still unknown at a certain level. However, one could also argue that much is known about the molecular nature of movement. Next, I am not sure how much this work impacts the area of study regarding the emergence of movement. The authors show that a reduction of a miRNA can affect something about certain neurons, that affects movement. The early movements, although slightly diminished, still emerge. Thus, their work only suggests that the function of some neurons, or perhaps the development of these neurons may impact the early movements. This is not new as it was known already from early work from the Bate lab.  Later larval movements were also shown to be modified in the miRNA mutants and were traced to "janus" overexpression in the chordotonal organs. As neurons are quite sensitive to the levels of Cl- and Janus is thought to be a Cl- channel, this could lead to a slight dysfunction of the chordotonal neurons. So, based on this, the work suggests that dysfunction of the chordotonal organs could impact larval movement. This was, of course, already known. The novelty of this work is in the genes being studied (important or not). We now know that miR 2b-1 and Janus are expressed in the early neurons and larval chordotonal neurons and their removal is consistent with a role for these genes in the functioning of these neurons. This is not to trivialize these findings, simply to state that these results are not significantly changing our overall understanding of movement and the emergence of movement. I would call it a stretch to say that this miRNA CONTROLS the emergence of movement, as in the title. 

      As already mentioned in our provisional response, on this point we politely – but strongly – disagree with Rev2’s suggestion that the findings are inflated by our language. We also note that they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. 

      Finally, the name Janus should be changed as it is already being used. A quick scan of flybase shows that there is a Janus A and B in flies (phosphatases) and I am surprised the authors did not check this. I was initially worried about the Janus kinase (JAK) when I performed the search. While I understand that none are only called Janus, studies of the jan A and B genes refer to the locus as the janus region, which could lead to confusion. The completely different molecular functions of the genes relative to CG3638 add to the confusion. Thus, I ask that the authors change the name of CG3638 to something else.

      Thank you for spotting this omission. In the revised MS we propose a new name – Movement Modulator (Motor) – for the gene previously described as Janus (CG3638) to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names. All instances where Janus was used are now replaced by Motor (abstract; main text pages 9-10; Figure 4).

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

    1. eLife assessment

      This manuscript is useful to researchers with an interest in cervical cancers because it provides scRNA-seq data from a diverse cohort of 15 early-stage cervical cancer patients. While the dataset could be of use to the research community, the key claims of the paper around the immunosuppressive microenvironment associated with specific tumour cell clusters (and the properties/importance of those clusters) are incomplete. Additional experiments will be required to substantiate these claims.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors in this manuscript performed scRNA-seq on a cohort of 15 early-stage cervical cancer patients with a mixture of adeno- and squamous cell carcinoma, HPV status, and several samples that were upstaged at the time of surgery. From their analyses they identified differential cell populations in both immune and tumour subsets related to stage, HPV status, and whether a sample was adenocarcinoma or squamous cell. Putative microenvironmental signaling was explored as a potential explanation for their differential cell populations. Through these analyses the authors also identified SLC26A3 as a potential biomarker for later stage/lymph node metastasis which was verified by IHC and IF. The dataset is likely useful for the community, however, the strong claims made are not adequately supported by the data and would require additional functional validation.

      Strengths:

      The dataset could be useful for the community.<br /> SLC26A3 could potentially be a useful marker to predict lymph node metastasis with further study.

      Weaknesses:

      The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      For the sequencing, which kit was used on the Novaseq6000?

      Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

    3. Reviewer #2 (Public Review):

      Summary:

      Peng et al. present a study using scRNA-seq to examine phenotypic properties of cervical cancer, contrasting features of both adenocarcinomas (ADC) and squamous cell carcinoma (SCC), and HPV-positive and negative tumours. They propose several key findings: unique malignant phenotypes in ADC with elevated stemness and aggressive features, interactions of these populations with immune cells to promote an immunosuppressive TME, and SLC26A3 as a biomarker for metastatic (>=Stage III ) tumours.

      Strengths:

      This study provides a valuable resource of scRNA-seq data from a well-curated collection of patient samples. The analysis provides a high-level view of the cellular composition of cervical cancers. The authors introduce some mechanistic explanations of immunosuppression and the involvement of regulatory T cells that are intriguing.

      Weaknesses:

      I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

    4. Author response:

      Reviewer #1 (Public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      Response: upon revision, we plan to rewrite the introduction of the manuscript.

      (2) For the sequencing, which kit was used on the Novaseq6000?

      Response: for sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and will add the information in Methods.

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      Response: we apologize for the inadequacy of descriptions of data analysis process due to word count limit. We plan to provide more information, and if possible we also would like to provide scripts as supplementary data in the revised manuscript.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      Response: we will add the list of marker genes for cell type annotation in the revised manuscript.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      Response: considering this inadequacy, we plan to use statistic approaches for further analyses to compare the differences between each set of groups up revision.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Response: we feel sorry for impreciseness when presenting histograms such as Fig 2D and we will add labels in Y-axis. As for the width of bars, we just used the histograms generated originally from the data package. However, we did not intend to double the width on purpose to strengthen the visual importance. We sincerely feel sorry for this and will correct the similar mistakes alongside the whole manuscript.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Response: we agree that many conclusions, which were based on bio-informatic predictions, are written in an over-affirmative way. Upon revision, we will rewrite these conclusions more precisely.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Response: we are thankful for this suggestion. We think that each cluster of epithelial cells is specified from other clusters and identified by DEGs, but they are not heavily unconnected from others. Upon revision, we plan to add further validation for the existence of Epi_10_CYSTM1.

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Response: from the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. We plan to rewrite the conclusion more precisely or delete this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      Response: we feel thankful for this question. The conclusion “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We will correct the description in the up-coming revised manuscript. As for SLC26A3, we also do not think it is “broadly” expressed, but it is specified in later tumors. When we presented the data of IHC, we only showed the strongly-positive area of each slide in order to emphasize the differences, however, this has caused misunderstandings. Thus, upon revision, we would like to show the other areas of one case or even the scan of one whole slide as supplementary data.

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Response: we apologize for the ignorance of further validation of cytotoxic T cells. From fig. 4B and 4C, the four different clusters of T cells were basically identified based on canonical T cell markers. And then we focused mainly on the validation and further analysis of Tregs, neglecting the other clusters. In fig. 4D we intended to only show the top DEGs in each T cell cluster and hoped to find some potential marker genes for next-step analysis. However, we did not notice that there might be contamination of epithelial cells within cytotoxic T cells when clustering. We will optimize the analysis of this part in our revision.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Response: our initial purpose was to use GO analysis as supports for our conclusions. However we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we plan to rewrite the conclusion from the GO analysis in a more scientific way or delete these data.

      Reviewer #2 (Public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      Response: we understand that many of the conclusions are too sure but lack profound supporting evidence, thus we will optimize the writing in the revised manuscript. More importantly, to strengthen the validity of our data, we will try to use statistical approaches for further analysis.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      Response: we sincerely feel grateful for being questioned on the validity, appropriateness and the real potential of SLC26A3. We plan to add more explanation of the importance of SLC26A3 in the discussion part. We are also sorry for some over-sure conclusions about ADC-specific cell clusters, as well as the marker gene SLC26A3. However, we do not think these conclusions are problematic. In fact, due to the heterogeneity among different individuals, as well as even different sites within one individual when sampling, we think a “small faction” does not means it will not make sense. Also, these ADC-specific clusters (including Epi_10_CYSTM1) do have certain proportions when comparing with those “big fraction” groups (Fig. 2D). Furthermore, when considering the specificity of DEGs to ADC only, but not to SCC, we think it might be these ADC-specific cluster genes to have the central function to make a difference between ADC and SCC. And we further used validation experiment to support our hypothesis. Lastly and most importantly, SLC26A3 was coming from sample 7 whose clinical stage is FIGO IIIC (late stage) and pathological type is ADC. Among the 15 cases, there are only 4 cases whose clinical stages are late (within which 3 are ADC). At this point of view, we think 1 in 3 (33%) having expression of SLC26A3 (or existence of cluster Epi_10_CYSTM1) should be considered as a potential choice. Samples coming from early-staged and SCC patients do not have fractions of Epi_10_CYSTM1. This likewise indicates the specificity of this cell cluster to ADC. Therefore, in our revised manuscript, we plan to add more in-depth discussion about this question.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Response: do you mean Figure 1B and D? In the revised manuscript, we will list the canonical marker genes to cluster different types of cells to at least support that the clustering of cell types match most of the present published references. To further avoid the contamination of cells in each cluster, we will use quality controls and re-analyze these data upon revision.

    1. eLife assessment

      This important work illuminates the dynamics of BRAF in both its monomeric and dimeric forms, with or without inhibitors, combining traditional techniques and sophisticated computational analyses. The evidence presented is convincing and suggests a potential allosteric effect, though substantiating the exact mechanism will require further studies. The work has implications for understanding kinase signaling and the development of potential drug candidates. This study will be of interest to structural biologists, medicinal chemists, and pharmacologists.

    2. Reviewer #1 (Public Review):

      Summary:

      This manuscript from Clayton and co-authors, entitled "Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors", aims at clarifying the molecular mechanism of BRAF dimer selectivity. Indeed, first generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microseconds MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped identify a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      Strengths:

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidines protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors are able to stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      Weaknesses:

      Regarding the analyses of the mixed state simulations, the DFG dihedral probability densities for the apo protomer (Fig. 5a right) are highly overlapping. It is not convincing that a slight shift can support the conclusion that the binding in one protomer is enough to shift the DFG motif outward allosterically. Moreover, the DFG dihedral time-series for the apo protomer (Supplementary Figure 9) clearly shows that the measured quantities are affected by significant fluctuations and poor consistency between the three replicates. The apo protomer of the mixed state simulations could be affected by the same problem that the authors pointed out in the case of the apo dimer simulations, where the amount of sampling is insufficient to model the DFG-out/-in transition properly. There is similar concern with the Lys483-Glu501 salt bridge measured for the apo protomers of the mixed simulations. As it can be observed from the probabilities bar plot (Fig. 5a middle), the standard deviation is too high to support a significant role for this interaction in the allosteric modulation of the apo protomer.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors employ molecular dynamics simulations to understand the selectivity of FDA approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signalling and the development of future BRAF inhibitor drugs.

      Strengths:

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (All-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      Weaknesses:

      Despite the use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems, the authors could consider adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comment 1: This manuscript from Clayton and co-authors, entitled ”Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors”, aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first-generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microsecond MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped in identifying a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidine protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors can stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: One potential weakness in the manuscript is the lack of reported uncertainties related to the analyzed quantities. Providing this information would significantly enhance the clarity regarding the reliability of the analyses and the confidence in the claims presented.

      Response and revision: We agree with the reviewer that reporting uncertainties will clarify and strengthen our arguments. Following this suggestion, we have added error bars to Figures 3 and 5 representing the standard deviation of the K-E salt bridge probability. This shows that the deviation across replicas of how often the salt bridge is present. Thus, it better supports our claim that this salt bridge is promoted by the presence of PHI1, as the deviation of the salt bridge is minimal for protomers containing PHI1. In addition to these error bars, we have also included a table to the Supplementary Information (Supplementary Table 2) containing the mean and standard deviation of the αC position, K-E distance, and DFG pseudo dihedral for each protomer in our dimer simulations.

      Reviewer #2 (Public review):

      Comment 1: The authors employ molecular dynamics simulations to understand the selectivity of FDA-approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signaling and the development of future BRAF inhibitor drugs.

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (all-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: The use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems. However, the authors could consider the adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

      Response: The current free energy methods are capable of giving accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet. Thus, we decided not to pursue the calculations.

      Reviewer #1 (Suggestions to author)

      Comment 1: The general recommendation is to give more details about the procedure for the analyses performed and, when possible, show the uncertainties relative to the analyzed quantities. This would clearly indicate the reliability of the analyses and the confidence of the claims. Moreover, it is not always clear how the analyses were performed.

      Response and revision: As previously mentioned, we have added uncertainties to our bar graphs in Figures 3 and 5 as well as Supplemental Table 2. In regards to the clarity of our analysis, we added more detail on how the probability distributions were created, which we will discuss in our response to Comment 3.

      Comment 2: It is not clear why the authors decided to titrate only the histidines without considering the other charged residues. In particular, the authors show in Supplementary Figure 2 a network of which Asp595 (protomer A) is a part and that, given the direct interaction, could affect the protonation state of His477 (protomer B).

      Response: The reviewer is correct in that Asp595 directly interacts with His477 on the opposite protomer. This is exactly the reason why we did not consider titrating Asp595 – the interaction with His477 should further stabilize the charged state of Asp595 and downshift its pKa from the solution value of about 3.8. Thus, Asp595 will be charged at physiological pH and does not need to be titrated in the CpHMD simulations.

      Comment 3: Regarding the probability density plots (Figures 3 and 5), clarify if you used all the data from all the replicas and all the protomers. If possible, show a comparison between each replica in the Supplementary Figures. A Supplementary Table with the probability values for the measured K-E salt bridge could be helpful since the bar plots are hard to compare. Also in this case please report the uncertainty or a comparison between the replicas.

      Response and revision: To clarify how we created the probability density plots, the following line was added to the Methods section:

      On page 15, third paragraph: All probability distributions were created by combining the last three µs of each replica for each system, with each distribution consisting of 50 bins. Unless specified, distributions contain quantities from both protomers in dimeric simulations.

      As previously mentioned, we have included Supplemental Table 2 which contains the mean and standard deviation of the K-E distance across systems. For comparison between replicas, we found the time series of the K-E distance in the inhibitor-bound monomer and dimer systems in Supplemental Figure 7 to be sufficient.

      Comment 4: It would be better to define the claim: ”it is clear that the timescale of the DFG-out to DFG-in transition is longer than our simulation timeframe of a few microseconds” (lines 208-209). To me it is not obvious why this should be ”clear”.

      Response and revision: Our original statement was to convey that, as DFG-in is sampled very rarely, our simulations cannot accurately represent DFG transitions. We have revised the manuscript to the following:

      On page 6, fourth paragraph: While this does suggest dimerization loosens the DFG motif, our simulations do not appropriately model the DFG-out/-in transition as the DFG-in state is only occasionally sampled.

      Comment 5: In the case of the inhibited monomer simulations, the authors state: ”the PHI1Glu501 interaction can become completely disrupted, with the distance moving beyond 6 A to˚ as high as 12 A; correlated with the disruption of the PHI1-Glu501 interaction, the˚     αC position is shifted out to the range of 21 A-24˚ A” (lines 241-244). However, the plot of the PHI1-Glu501˚ interaction time-series (Supplementary Figure 7) shows that just in one replica of one protomer (Protomer A), the interaction is disrupted, and the αC position never exceeds 21 A (time-series˚ reported in Supplementary Figure 6). None of the fluctuations of the αC position appear to be correlated with the disruption of the ligand-Glu501 interaction. The time-series reported in Supplementary Figures 6 and 7 suggest that the two events are uncorrelated. Please explain this aspect or quantify the correlation to support your claim.

      Response: We believe the source of this confusion is because we did not include a time series of αC for inhibited monomer simulations–Supplementary Figure 6 mentioned in the comment is of dimeric BRAF. Thus, We have added Supplementary Figure 8, a timeseries plot of the αC position for inhibited monomer and dimer protomers.

      Comment 6: Regarding the analyses of the positive cooperativity, the DFG dihedral probability densities for the apo protomer (Figure 5a) are highly overlapping. Thus, it is hard to believe that these small differences support the claim that ”PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor” (lines 300-302). The authors should show that the differences in the DFG distributions (in particular, apo dimer vs PHI1 mixed) are statistically significant. Only in this case, the data could support the claim that PHI1 bound to one protomer modulates the DFG conformation in the second one. In my opinion, the overlap between the DFG dihedral probability (Figure 5a) is too high to support the claim that PHI1 is able to allosterically modulate this region in the second apo protomer. Please provide an appropriate statistical test that demonstrates that those distributions are significantly different.

      Response and revision: We have adjusted this statement based on the new Supplementary Table 2 to read as the following:

      On page 9, third paragraph: Although the shift is small (the differences between means is approximately one standard deviation, see Supplementary Table 2), it suggests that PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor. In contrast, the DFG dihedral of the apo protomer in the LY-bound mixed dimer appears to be slightly smaller than the apo dimer with difference between means of approximately one standard deviation (Supplementary Table 2), which is unfavorable for binding the second inhibitor (orange and grey, Figure 5a right).

      Comment 7: Regarding the dimer holo simulations, I agree that in the LY-bound dimer simulations, the hydrogen bond between the ligand and the E501 is weaker, but I do not understand the sentence ”as seen from the local density maximum centered at∼3.4 A” at line 233, since the 2D˚ density plot (Figure 3h) shows that the highest peak is close to 5 A. Also, it would be useful to˚ clarify how these 2D density plots reported in Figure 3 were obtained.

      Response and revision: While the highest peak in Figure 3h is close to 5 A, we were more˚ interested in the local peak close to 3.4 A. To avoid confusion we have modified the line to separate˚ both peaks:

      On page 7, second paragraph: In the LY-bound dimer simulations, however, the LY–Glu501 h-bond is weaker and less stable than the counterpart of the PHI1-bound dimer, as seen from the local density maximum centered at ∼3.4 and the global maximum near ∼4.5 A (Figure 3g,h).˚

      Comment 8: I have a comment on the strategy suggested to empirically classify the inhibitors by comparing the Glu501-Lys483 distance and the αC position in the two protomers of the crystal structures (in the Concluding Discussion section). The authors suggest that differences below 1 A could determine whether the flexibility of these regions is restricted or not (and whether the˚ inhibitor is equipotent or dimer-selective). However, differences below 1 A, in structures where˚ the average resolution is 2.5 A, might be highly unreliable. In fact, as the authors pointed out, LY˚ and Ponatinib would be classified (erroneously) as dimer-selective inhibitors according to these criteria.

      Response and revision: We agree that this proposed method could be unreliable; we intend this strategy to be used as a “quick and dirty” method for analyzing future structures in order to assess selectivity for dimeric BRAF. To convey this, we added the following sentence:

      On page 12, second paragraph: Given that the resolution of a resolved structure is often ∼23 A, this proposed assessment is not intended to replace more rigorous tests, i.e. utilizing MD˚ simulations.

      Comment 9: A suggestion is to include representative snapshots of the MD simulation in the GitHub repository could allow the reader to better appreciate the results described in the present study.

      Response and revision: In order to convey the difference between induced effects of PHI1 and LY, we have added a new folder named snapshots to the GitHub repository which contains the snapshots from the simulations of one LY or one PHI1 bound BRAF (visualized in Figure 5c) in the form of PDB files.

    1. Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Tie et.al., the authors couple the methodology which they have developed to measure LQ (localization quotient) of proteins within the Golgi apparatus along with RUSH based cargo release to quantify the speed of different cargos traveling through Golgi stacks in nocodazole induced Golgi ministacks to differentiate between cisternal progression vs stable compartment model of the Golgi apparatus. The debate between cisternal progression model and stable compartment model has been intense and going on for decades and important to understand the basic way of function/organization of the Golgi apparatus. As per the stable compartment model, cisterna are stable structures and cargo moves along the Golgi apparatus in vesicular carriers. While as per cisternal progression model, Golgi cisterna themselves mature acquiring new identity from the cis face to the trans face and act as transport carriers themselves. In this work, authors provide a missing part regarding intra-Golgi speed for transport of different cargoes as well as the speed of TGN exit and based on the differences in the transport velocities for different cargoes tested favor a stable compartment model. The argument which authors make is that if there is cisternal progression, all the cargoes should have a similar intra-Golgi transport speed which is essentially the rate at which the Golgi cisterna mature. Furthermore, using a combination of BFA and Nocodazole treatments authors show that the compartments remain stable in cells for at least 30-60 minutes after BFA treatment.

      Strengths:

      The method to accurately measure localization of a protein within the Golgi stack is rigorously tested in the previous publications from the same authors and in combination with pulse chase approaches has been used to quantify transport velocities of cargoes through the Golgi. This is a novel aspect in this paper and differences in intra-Golgi velocities for different cargoes tested makes a case for a stable compartment model.

      Weaknesses:

      Experiments are only tested in one cell line (HeLa cells) and predominantly derived from experimental paradigm using RUSH assays where a secretory cargo is released in a wave (not the most physiological condition) and therefore additional approaches would make a more compelling case for the model.

    2. eLife assessment

      This important study sheds new light on cargo movement within the Golgi apparatus, challenging the cisternal progression model by providing convincing evidence for a velocity decrease from cis to trans Golgi and variable speeds within cisternae, suggesting a more stable compartmental nature. While these findings propose refinements to the classic model, they prompt further exploration of recent models like rapid partitioning and rim progression, necessitating additional experimental approaches to account for cargo expression variations and HeLa cell-specific effects.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the use of quantitative imaging approaches, which have been a key element of the labs work over the past years, to address one of the major unresolved discussions in trafficking: intra-Golgi transport. The approach used has been clearly described in the labs previous papers, and is thus clearly described. The authors clearly address the weaknesses in this manuscript and do not overstate the conclusions drawn from the data. The only weakness not addressed is the concept of blocking COPI transport with BFA, which is a strong inhibitor and causes general disruption of the system. This is an interesting element of the paper, which I think could be improved upon by using more specific COPI inhibitors instead, although I understand that this is not necessarily straightforward.

      I commend the authors on their clear and precise presentation of this body of work, incorporating mathematical modelling with a fundamental question in cell biology. In all, I think that this is a very robust body of work, that provides a sound conclusion in support of the stable compartment model for the Golgi.

      General points:

      The manuscript contains a lot of background in its results sections, and the authors may wish to consider rebalancing the text: The section beginning at Line 175 is about 90% background and 10% data. Could some data currently in supplementary be included here to redress this balance, or this part combined with another?

    4. Reviewer #3 (Public Review):

      The manuscript by Tie et al. provides a quantitative assessment of intra-Golgi transport of diverse cargos. Quantitative approaches using fluorescence microscopy of RUSH synchronized cargos, namely GLIM and measurement of Golgi residence time, previously developed by the author's team (publications from 20216 to 2022), are being used here.

      Most of the results have been already published by the same team in 2016, 2017, 2020 and 2021. In this manuscript, very few new data have been added. The authors have put together measurements of intra-Golgi transport kinetics and Golgi residence time of many cargos. The quantitative results are supported by a large number of Golgi mini-stacks/cells analyzed. They are discussed with regard to the intra-Golgi transport models being debated in the field, namely the cisternal maturation/progression model and the stable compartments model. However, over the past decades, the cisternal progression model has been mostly accepted thanks to many experimental data.

      The authors show that different cargos have distinct intra-Golgi transport kinetics and that the Golgi residence time of glycosyltransferases is high. From this and the experiment using brefeldinA, the authors suggest that the rim progression model, adapted from the stable compartments model, fits with their experimental data.

      Strengths:

      The major strength of this manuscript is to put together many quantitative results that the authors previously obtained and to discuss them to give food for thought about the intra-Golgi transport mechanism.<br /> The analysis by fluorescence microscopy of intra-Golgi transport is tough and is a tour de force of the authors even if their approach show limitations, which are clearly stated. Their work is remarkable in regards to the numbers of Golgi markers and secretory cargos which have been analyzed.

      Weaknesses:

      As previously mentioned, most of the data provided here were already published and thus accessible for the community. Is there is a need to publish them again?<br /> The authors' discussion about the intra-Golgi transport model is rather simplistic. In the introduction, there is no mention of the most recent models, namely the rapid partitioning and the rim progression models. To my opinion, the tubular connections between cisternae and the diffusion/biochemical properties of cargos are not enough taken into account to interpret the results. Indeed, tubular connections and biochemical properties of the cargos may affect their transit through the Golgi and the kinetics with which they reach the TGN for Golgi exit.<br /> Nocodazole is being used to form Golgi mini-stacks, which are necessary to allow intra-Golgi measurement. The use of nocodazole might affect cellular homeostasis but this is clearly stated by the authors and is acceptable as we need to perturb the system to conduct this analysis. However, the manual selection of the Golgi mini-stack being analyzed raises a major concern. As far as I understood, the authors select the mini-stacks where the cargo and the Golgi reference markers are clearly detectable and separated, which might introduce a bias in the analysis.<br /> The terms 'Golgi residence time ' is being used but it corresponds to the residence time in the trans-cisterna only as the cargo has been accumulated in the trans-Golgi thanks to a 20{degree sign}C block. The kinetics of disappearance of the protein of interest is then monitored after 20{degree sign}C to 37{degree sign}C switch.<br /> Another concern also lies in the differences that would be introduced by different expression levels of the cargo on the kinetics of their intra-Golgi transport and of their packaging into post-Golgi carriers.

    1. eLife assessment

      This study presents valuable findings on the ligand- and ion-dependent structural dynamics of a transcriptional riboswitch. The single-molecule data presented are solid and prompts intriguing hypotheses and models, which will undoubtedly stimulate future structural analyses. These findings are of considerable interest to biochemists and biophysicists engaged in the study of RNA structure and riboswitch mechanisms.

    2. Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support the author's proposed model of how the riboswitch dynamics contribute to function.<br /> (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.<br /> (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.<br /> (4) In the revised version, the authors utilized multiple destabilizing and compensatory mutations to strengthen their structural interpretation of the KL structure and dynamics and cementing their conclusions.

    3. Reviewer #2 (Public Review):

      Summary:

      Gao et al., used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging, but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very highly quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into co-transcriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

    4. Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. Results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation on the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligand-binding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light into the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool to the understand of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when comparing to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only on complexity, but also in transcriptional speed, that can direct interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with the dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single-molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support for the author's proposed model of how the riboswitch dynamics contribute to function.

      (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.

      (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after the extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.

      Weaknesses:

      (1) The authors use only one mutant to confirm that their FRET signal indicates the formation of the KL. Importantly, this mutation does not involve the nucleotides that are part of the KL interaction. It would be more convincing if the authors used mutations in both strands of the KL and performed compensatory mutations that restore base pairing. Experiments like this would solidify the structural interpretation of the work, particularly in the context of the full-length riboG RNA or in the cotranscriptional mimic experiments, which appear to have more conformational heterogeneity.

      We thank the reviewer for describing our work “in-depth characterization” of riboG. We agree with the reviewer and we have added two more mutants, G71C and U72C with the mutations located at the KL (Figure 2– figure supplement 8A, 8B, 9A, 9B, Figure 3– figure supplement 6A, 6B, 7A, 7B, and Figure 4– figure supplement 6A, 6B, 7A, 7B). Furthermore, we have performed compensatory mutations, C30G-G71C and A29G-U72C that restore base pairing in the KL (Figure 2– figure supplement 8C, 8D, 9C, 9D, Figure 3– figure supplement 6C, 6D, 7C, 7D, and Figure 4– figure supplement 6C, 6D, 7C, 7D). We added the experimental results in the revised manuscript accordingly as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).

      (2) The existence of the pre-folded state (intermediate FRET ~0.5) is not well supported in their data and could be explained by an acquisition artifact. The dwell times are very short often only a single frame indicating that there could be a very fast transition (< 0.1s) from low to high FRET that averages to a FRET efficiency of 0.5. To firmly demonstrate that this intermediate FRET state is metastable and not an artifact, the authors need to perform measurements with a faster frame rate and demonstrate that the state is still present.

      We thank the reviewer for the great comment. We added smFRET experiments at higher time resolution, 20 ms, as well as lower time resolution (Figure 2– figure supplement 3).  Based on our experimental results, the intermediate state (EFRET ~0.5) exists at the smFRET collected at 20 ms, 100 ms and 200 ms. 

      (3) The PLOR method employs a non-biologically relevant polymerase (T7 RNAP) to mimic transcription elongation and folding near the elongation complex. T7 RNAP has a shorter exit channel than bacterial RNAPs and therefore, folding in the exit channel may be different between different RNAPs. Additionally, the nascent RNA may interact with bacterial RNAP differently. For these reasons, it is not clear how well the dynamics observed in the T7 ECs recapitulate riboswitch folding dynamics in bacterial ECs where they would occur in nature. 

      We thank the reviewer for the comment. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 13–14).

      Reviewer #2 (Public Review):

      Summary:

      Gao et al. used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very high quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into cotranscriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near-cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

      We thank the reviewer for describing our work “The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14).

      Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. The results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation of the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligandbinding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light on the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool for the understanding of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full-length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when compared to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only in complexity, but also in transcriptional speed, which can directly interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware of is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

      We thank the reviewer for describing our work as “The investigations were very thorough, providing data that supports the conclusions”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the cotranscriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14). And we also agree with the reviewer that the lower NTP may affect the transcriptional speed. Regarding the fluorophores, we purposely placed them away from the KL to avoid their influence on the formation of the KL.

      Reviewer #1 (Recommendations For The Authors):

      Related to weakness 1

      - The authors cite a paper that investigated mutations in the KL duplex but do not include these mutations in their analysis. It is unclear why the authors chose the G77C mutation and not the other mutants previously tested. Can the authors explain their choice of mutation in detail in the text? I also did not see the proposed secondary structure for the G77C mutant shown in Figure 2 -supp 3A in the cited paper, is this a predicted structure? Please explain how this structure was determined. 

      We thank the reviewer for the comment. The reason we chosen the G77C mutation is based on previous report that G77C can disturb the formation of the KL, as we stated in the manuscript as “Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)” ( page 7). And the secondary structure for the G77C mutant was predicted by Mfold, which as cited in the manuscript and added in the reference list as “Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13), 3406-3415”. 

      - It is not clear to me that the structural interpretation of their FRET states is correct and that the FRET signal reports on the base pairing of the KL in only the high FRET state. The authors should perform experiments with additional mutations in the KL duplex to confirm that their construct reports on KL duplex formation alone and not other structural dynamics. 

      We thank the reviewer for the comment. We have included additional mutations to establish a connection between the high-FRET state to the formation of the KL. The results have been added to the manuscript as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).  

      - For the full-length riboG-136 (Cy3Cy5 riboG in Figure 4), the authors have clearly defined peaks at 0.6 and 0.4. However, the authors do not explain their structural interpretation of these states. Do the authors believe that the KL is forming in these states? It would be helpful to have data on mutations in the KL in the context of the full-length riboG to better understand the structural transitions of these intermediate states. 

      Based on our mutation studies, we proposed that the peak with EFRET ~0.8 corresponds to the conformation with the KL, while the states with EFRET ~0.4 and 0.6 are the states without a stable KL. 

      Related to weakness 2:

      - For the riboG-apt and riboG-term RNAs, the proposed intermediate FRET state (EFRET = 0.5) is poorly fit by a Gaussian and the dwell times in the state are almost entirely single-frame dwells. It is likely that this state is the result of a camera blurring artifact, in which RNAs undergo a FRET transition between two frames giving an apparent FRET efficiency which is between that of the two transitioning states. This artifact arises when the average dwell times of the true states (Elow and Ehigh) are comparable to the frame duration (within a factor of ~5-10; see https://doi.org/10.1021/acs.jpcb.1c01036). To confirm the presence of the intermediate state, the authors should perform at least a few experiments with higher time resolution to support the existence of the 0.5 state with a lifetime of 0.1 s. Alternatively, the data should be refit to a two-state HMM and the authors could explain in the text that the density in the FRET histogram between the two states is likely due to transitions that are faster than the time resolution of the experiment. 

      We thank the reviewer for the great comment. Taking the suggestion into consideration, we performed smFRET experiments with a higher time resolution of 20 ms. As a result, we still detected the intermediate state, supporting that it is not an artifact. The new data has been included in the revised manuscript (Figure 2-figure supplement 3).  

      Related to weakness 3:

      - The authors depict the polymerase footprint differently in some of the figures and it is unclear if this is part of their model. Is the cartoon RNAP supposed to indicate the RNA:DNA hybrid or the footprint of T7 RNAP on the RNA? For example, in Figure 8a there are 8 nts (left) and 9 nts (right) covered by RNAP, and only 6nts in Figure 6 - supp 2A. This is particularly misleading for the EC-87 and EC-88 in Figure 6 - supp 2, where it is likely that this stem is not formed at all and the KL strand is single-stranded. The authors should clarify and at least indicate in the figure legend if the RNAP cartoon is part of the model or only a representation. 

      We thank the reviewer for bringing the issues to our attention. Due to space limitations, we chose to represent the polymerase footprint differently in Figure 8. However, we have included the statement “DNA templates from EC-87 to EC-105 are not displayed in the model” in the legend of Figure 8 to avoid the confusion.

      Moreover, we have corrected the error of 6 nts Figure 6-supplement figure 2.  

      - With a correct 9 bp RNA:DNA hybrid, the EC-88 construct would not be able to form the top part of the P2 stem and the second half of the KL RNA would be single-stranded. In this case, an interaction between the KL nucleotides would resemble a pseudoknot and not a kissing loop interaction. Can the authors explain if this could explain the heterogeneity they observe in the EC-88 construct compared to the riboGapt  RNA?

      Thank the reviewer for the comment. We have added the statement in the revised manuscript as “The T7 RNA polymerase (RNAP) sequestered about 8 nt of the nascent RNA, preventing the EC-88 construct from forming the P2 stem (Durniak et al., 2008; Huang & Sousa, 2000; Lubkowska et al., 2011; Tahirov et al., 2002; Wang et al., 2022; Yin & Steitz, 2002). Consequently, a pseudoknot structure potentially formed instead of the expected KL. This distinction may account for the observed heterogeneity between EC-88 and riboG-apt” ( page 11).

      Other comments:

      (1) It appears that the FRET histograms in the PLOR experiments (Figure 6 and related figures) only show the fits presumably to highlight the overlays. However, this makes it impossible to determine the goodness of the fit. The authors should instead show the outline of the raw histogram with the fit, or at least show the raw histograms with fits in the supplement. 

      We have replaced Figure 6- figure supplements 2-4 to enhance the clarity of the raw and fitted smFRET histograms.  

      (2) The authors should consider including a concluding paragraph to put the results into a larger context. How does the kinetic window compare to other transcriptional riboswitches? Would the authors comment on how the transcription speed compares to the kinetics for the formation of the KL? 

      We thank the reviewer for the comment. We have added the comparison of riboG to other transcription riboswitches to the manuscript as “Nevertheless, the ligand-sensitive windows of riboswitches during transcription vary. In a study conducted by Helmling et al. using NMR spectroscopy, they proposed a broad transcriptional window for deoxyguanosine-sensing riboswitches, whereby the ligand binding capability gradually diminishes over several nucleotide lengths (Helmling et al., 2017). However, more recent research by Binas et al. and Landgraf et al. on riboswitches sensing ZMP, c-di-GMP, and c-GAMP revealed a narrow window with a sharp transition in binding capability, even with transcript lengths differing by only one or three nucleotides (Binas et al., 2020; Landgraf et al., 2022). In line with the findings for the c-GAMP-sensing riboswitch, our study on the guanidine-IV riboswitch also demonstrated a sharp transition in binding capability with just a single nucleotide extension” ( page 14). 

      We appreciate the reviewer’s comment in comparing the transcription speed to the kinetics of the KL formation. However, we must acknowledge that we have limited kinetic data in this study to confidently make such a comparison.

      (3) Cy3Cy5 RiboG is a confusing name because it implies that the others are not also Cy3Cy5 labeled. The authors should consider changing the names and being consistent throughout. I suggest full-length riboG or riboG-136. 

      We have changed “Cy3Cy5 riboG” to “Cy3Cy5-full-length riboG” (pages 15 and 16).

      (4) The transcriptional readthrough experiment should be explained when first mentioned in line 109. 

      We have added the citation (Chien et al., 2023) of the transcriptional readthrough experiment to the manuscript as “we noted that the transcriptional read-through of the guanidine-IV riboswitch during the single-round PLOR reaction was sensitive to Gua+, exhibiting an apparent EC50 value of 68.7  7.3 μM (Figure 1D) (Chien et al., 2023)” (page 5). 

      (5) Kd values in text should have uncertainties, and the way these uncertainties are obtained should be explained.

      We have added the uncertainties of Kd values in the revised manuscript ( page 6) and the legend of Figure 2-supplement 6 as “The percentages of the folded state (EFRET ~ 0.8) of Cy3Cy5-riboG-apt were plotted with the concentrations of Gua+ at 0.5 mM Mg2+, with an apparent Kd of 286.0  18.1 μM in three independent experiments”.

      (6) The authors mention "strategies" on line 306, but it is unclear what they are referring to. Are the strategies referring to the constructs (EC-87, etc) or Steps 1-8 in the supplemental figure? Please clarify. 

      We have clarified the confusion by adding “The detailed procedures of strategies 1-8 were shown in Figure 7–figure supplement 1” to the manuscript ( page 12).

      (7) What are the fraction of dynamic traces versus static traces in the cases for the full-length riboG? This would help depict the structural heterogeneity in the population. 

      We have added the fractions of dynamic single-molecule traces of the full-length riboG to Figure 4-supplements 1-5. 

      (8) The labels in Figure 4 (A-E) don't match the caption (A-H). 

      We have corrected the error. 

      (9) The coloring of the RNA strands in Figure 4A should be explained in the figure legend. It could be interpreted as multiple strands annealed instead of a continuous strand. 

      We have revised the legend of Figure 4A by adding “The full-length riboG contains the aptamer domain (black), terminator (red) and the extended sequence (blue). Cy3 and Cy5 are shown by green and red sparkles, respectively”.

      (10) Reported quantities and uncertainties should have the same number of decimal places. In many places, the uncertainties likely have too many significant figures, for example, in Figure 5 and related figures. 

      We have corrected the significant figures of the uncertainties. 

      (11) In Figure 5, A and B should have the same vertical scale to facilitate comparison. 

      We have adjusted Figure 5A to match the vertical scale of Figure 5B in the revised manuscript.

      (12) In Figure 5C-D, the construct from which those trajectories come should be indicated in the legend. 

      We have added the construct to the legend of Figures 5C and D.  

      (13) In Figure 6J, the splines between data points are confusing and can be misleading. They suggest that the data has been fit to a model, but I am not sure if it represents a model. The data points should be colored instead and lines removed. 

      We thank the reviewer for the comment. We have changed Figure 6J by coloring the data points and removing the lines to avoid confusion. 

      (14) Line 330 mentions a P2 structure in Figure 8, but there is no such label in Figure. Please clarify. 

      We thank the reviewer for the comment and have added P2 to Figure 8. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1B. The authors don't seem to address the role of the blue stem-loop following Stems 1 and 2. Is this element needed at all for gene regulation? Does it impact the conformations or folding of the preceding Stems 1 and 2? It seems feasible to disrupt the stem and see whether there is an impact on riboswitch function. 

      We thank the reviewer for the comment. The presence of the sequence which formed blue stem-loop indicates the formation of an anti-terminator conformation in riboG during transcription. Our smFRET data shows that the inclusion of the stem-loop sequence induces additional peaks in the full-length riboG compared to the riboGterm. This indicates that the stem-loop influences the folding of the kissing loop (KL) and potentially also affects the stems 1 and 2.  

      (2) Figure 7 supplement 1, C &D. Maybe I am missing something, but it seems to me in reaction #8 (EC-105, last two lanes), the readthrough percentage is close to 50% based on the gel but plotted in D as 20%. Further, there is a strong effect of guanidine in reaction #8 but that is not reflected in the quantitation in panel D. 

      We thank the reviewer for the comment. The observed discrepancy between reaction 8 in (C) and (D) is from the differential handling of the crude product at the last step (step 17) in gel loading for (C), contrasted with the combination of crude products from steps 16 and 17 to calculate the read-through percentage in (D). We have corrected the discrepancy by replacing Figure 7-Supplement figure 1C (now Figure 7C), and revised the legend to include the following clarification: “Taking into consideration that the 17 step-PLOR reaction exhibited a pause within the terminator region, resulting in a significant amount of terminated product at step 16, crude products from steps 16 and 17 were collected for (C) and (D) of the 17 step-PLOR reaction (Lanes 15 and 16 in C)”.

      (3) Figure 7C is a control that shows the quality of the elongation complexes, which probably should be in the supplement. Instead, in Figure 7 supplement 1, panels C and D are actual experiments and could be moved into the main figure.  

      We thank the reviewer for the comment. We made the adjustment.  

      (4) Figure S7D. I would suggest not labelling the RNA polymerase halt/stoppage sites due to NTP deprivation as "pausing sites" because transcriptional pausing has previously been defined as natural sites where the RNA polymerase transiently halts itself, but not due to the lack of the next NTPs. In this case, the elongating complexes were artificially halted, which is technically not "pausing", as it will not restart/resume on its own without intervention. 

      We have changed the “pausing” to “halting”.  

      (5) Figure 7 is titled "In vitro transcriptional performance of riboG." But the data is actually not about the performance of the riboswitch, or how well it functions. I would suggest the authors revise the title. This is mostly about the observed sensitivity window of the riboswitch to ligand-mediated conformational switching. 

      We have changed the title of Figure 7 to “Ligand-mediated conformational switching of riboG during transcription”.

      (6) Figure 7A, the illustration gives the visual impression that there are multiple RNA polymerases on the same DNA template, which is not the case. 

      We have revised Figure 7A by adding arrows between RNA polymerases to illustrate the movement of a single RNAP, rather than multiple RNAP on the same template.

      (7) It could be informative to compare the guanidine-IV riboswitch with the first three classes (I, II, III), to see how their architectures or gene regulatory mechanisms are similar or different. 

      We thank the reviewer for the comment. We have added the comparison of the guanidine-IV riboswitch to other three guanidine riboswitches to the manuscript as “The guanidine-IV riboswitch exhibits similarities to the guanidine-I riboswitch in gene regulatory mechanism, functioning as a transcriptional riboswitch. Structurally, it resembles the guanidine-II riboswitch through the formation of loop-loop interactions upon binding to guanidine (Battaglia & Ke, 2018; L. Huang et al., 2017; Lin Huang et al., 2017; Lenkeit et al., 2020; Nelson et al., 2017; Reiss & Strobel, 2017; Salvail et al., 2020)” ( page 12).  

      Reviewer #3 (Recommendations For The Authors):

      In addition to the public review items, I provide the following recommendations:

      (1) As a second language speaker, I understand that writing a compelling and concise story may be hard, and we tend to write more than needed or more repetitively. That being said, I do think that the writing could be improved to make it more concise, clear, and avoid repetitions.

      We thank the reviewer for the comment. We re-wrote the abstract and some sentences in the manuscript.

      (2) In the abstract, instead of saying that "...This lack of understanding has impeded the application of this riboswitch", which makes the statement too strong, perhaps, stating something along the lines of "this understanding would assist the application of this riboswitch", would be a better fit. 

      We have re-wrote the abstract, and revised the sentence.  

      (3) Methods should state which RNA polymerase was used. PLOR uses T7 RNA pol, so I assume it was the same. 

      We have added the statement “T7 RNAP was utilized in the PLOR and in vitro transcription reactions except noted” in the Methods ( page 15). 

      (4) The impact statement says comprehensive structure-function, where perhaps comprehensive folding-function would be more appropriate. We are still missing a lot of structural information about this particular riboswitch. 

      We agree with the reviewer, and changed “comprehensive structure-function” to “folding-function” in Impact statement ( page 2).

      (5) Higher Mg2+ concentrations implicated in a lesser extent of the switch of RiboGapt, a sentence talking about it would be useful (how Mg2+ could have promiscuous interaction and interfere with folding). 

      We have added the role of higher Mg2+ to the manuscript as “However, at a higher concentration of 50.0 mM Mg2+, the proportion of the pre-folded and unfolded conformations were more prevalent at 50.0 mM Mg2+ than at 20.0 mM Mg2+. This suggests that an excess of Mg2+ may promote the pre-folded and even unfolded conformations” ( page 6).

      (6) In the investigations of RiboG-term and RiboG, seems like that monovalents from the buffer are sufficient to promote secondary structure. A statement commenting on this would benefit the paper and the audience. 

      We agree with the reviewer and have accordingly revised the manuscript accordingly by adding “This indicates that monovalent ions in the buffer can facilitate the formation of stable guanidine-IV riboswitch” ( page 8).

      (7) Figure 3. Figure goes to panel E and legend to panel H. G and H colors do not correspond to actual figure colors. 

      We made the correction.  

      (8) Figure 4. The same as Figure 3, the panels and figures are divergent.  

      We made the correction.  

      (9) During the discussion, stating that the DNA and RNA pol play a role in folding and ligand binding may be excessive. This could be an indirect effect of the transcriptional bubble hindering part of the nascent RNA from folding, which is something intrinsic to any transcription and not specific to this system. 

      We agree with the reviewer and deleted the statement about the DNA and RNAP play a role in folding and ligand binding.

      (10) PLOR is not properly cited. When introduced in the manuscript, please cite the original PLOR paper (Liu et. al. Nature 2015) and additional related papers. 

      We cited the original PLOR paper (Liu et al, Nature 2015) and the related papers (Liu et al, Nature Protocols 2018). ( pages 4 and 15)

      (11) The kinetics race of folding and binding could be a little more emphasized in discussion, particularly from the perspective of its physiological importance. 

      We agree with the reviewer and deleted the kinetics race of folding and binding from the Discussion part.

    1. eLife assessment

      This study presents a valuable finding on the relationship between brain activity related to sustained attention and substance use in adolescence/early adulthood with a large longitudinal dataset. The evidence supporting the claims of the authors is solid, although the inclusion of more details of methods, results, and data analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists, psychologists, and clinicians working on substance use or addiction.

    2. Reviewer #1 (Public Review):

      This study explored the relationship between sustained attention and substance use from ages 14 to 23 in a large longitudinal dataset. They found behaviour and brain connectivity associated with poorer sustained attention at age 14 predicted subsequent increase in cannabis and cigarette smoking from ages 14-23. They concluded that the brain network of sustained attention is a robust biomarker for vulnerability to substance use. The big strength of the study is a substantial sample size and validation of the generalization to an external dataset. In addition, various methods/models were used to prove the relationship between sustained attention and substance use over time.

    3. Reviewer #2 (Public Review):

      Weng and colleagues investigated the relationship between sustained attention and substance use in a large cohort across three longitudinal visits (ages 14, 19, and 23). They employed a stop signal task to assess sustained attention and utilized the Timeline Followback self-report questionnaire to measure substance use. They assessed the linear relationship between sustained attention-associated functional connections and substance use at an earlier visit (age 14 or 19). Subsequently, they utilized this relationship along with the functional connection profile at a later age (age 19 or 23) to predict substance use at those respective ages. The authors found that connections in association with reduced sustained attention predicted subsequent increases in substance use, a conclusion validated in an external dataset. Altogether, the authors suggest that sustained attention could serve as a robust biomarker for predicting future substance use.

      This study by Weng and colleagues focused on an important topic of substance use prediction in adolescence/early adulthood. While the study largely achieves its aims, several points merit further clarification:

      (1) Regarding connectome-based predictive modeling, an assumption is that connections associated with sustained attention remain consistent across age groups. However, this assumption might be challenged by observed differences in the sustained attention network profile (i.e., connections and related connection strength) across age groups (Figures 2 G-I, Fig. 3 G_I). It's unclear how such differences might impact the prediction results.

      (2) Another assumption of the connectome-based predictive modeling is that the relationship between sustained attention network and substance use is linear, and remains linear over development. Such linear evidence from either the literature or their data would be of help.

      (3) Heterogeneity in results suggests individual variability that is not fully captured by group-level analyses. For instance, Figure 1A shows decreasing ICV (better-sustained attention) with age on the group level, while there are both increasing and decreasing patterns on the individual level via visual inspection. Figure 7 demonstrates another example in which the group with a high level of sustained attention has a lower risk of substance use at a later age compared to that in the group with a low level of sustained attention. However, there are individuals in the high sustained attention group who have substance use scores as high as those in the low sustained attention group. This is important to take into consideration and could be a potential future direction for research.

      The above-mentioned points might partly explain the significant but low correlations between the observed and predicted ICV as shown in Figure 4. Addressing these limitations would help enhance the study's conclusions and guide future research efforts.

    4. Reviewer #3 (Public Review):

      Summary:

      Weng and colleagues investigated the association between attention-related connectivity and substance use. They conducted a study with a sizable sample of over 1,000 participants, collecting longitudinal data at ages 14, 19, and 23. Their findings indicate that behaviors and brain connectivity linked to sustained attention at age 14 forecasted subsequent increases in cigarette and cannabis use from ages 14 to 23. However, early substance use did not predict future attention levels or attention-related connectivity strength.

      Strengths:

      The study's primary strength lies in its large sample size and longitudinal design spanning three time-points. A robust predictive analysis was employed, demonstrating that diminished sustained attention behavior and connectivity strength predict substance use, while early substance use does not forecast future attention-related behavior or connectivity strength.

      Weaknesses:

      It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.

    1. eLife assessment

      This interesting study reports that muscle contains fibro-adipogenic progenitor cells (FAPs) that promote regeneration following injury of peripheral neurons. These novel results indicate that several known growth factors are involved in the process of regeneration. This is an important contribution, however the analysis is incomplete since additional experimental data is needed to support the main conclusions.

    2. Reviewer #1 (Public Review):

      In this manuscript, Yoo et al describe the role of a specialized cell type found in muscle, Fibro-adipogenic progenitors (FAPs), in promoting regeneration following sciatic nerve injury. Using single-cell transcriptomics, they characterize the expression profiles of FAPs at various times after nerve crush or denervation. Their results reveal that a population of these muscle-resident mesenchymal progenitors up-regulate the receptors for GDNF, which is secreted by Schwann cells following crush injury, suggesting that FAPs respond to this growth factor. They also find that FAPs increase expression of BDNF, which promotes nerve regeneration. The authors demonstrate FAP production of BDNF in vivo is upregulated in response to injection of GDNF and that conditional deletion of BDNF in FAPs results in delayed nerve regeneration after crush injury, primarily due to lagging remyelination. Finally, they also find reduced BDNF expression following crush injury in aged mice, suggesting a potential mechanism to explain the decrease in peripheral nerve regenerative capability in aged animals. These results are very interesting and novel and provide important insights into the mechanisms regulating peripheral nerve regeneration, which has important clinical implications for understanding and treating nerve injuries. However, there are a few concerns that the authors need to address.

      Given that only a fraction of the FAPs express BDNF after injury, the authors need to demonstrate the specificity of the Prrx1-Cre for FAPs. This is particularly important because muscle stem cell also express GDNF receptors (Fig. 3C & D) and myogenic progenitors/satellite cells produce BDNF after nerve injury (Griesbeck et al., 1995 (PMID 8531223); Omura et al., 2005 (PMID 16221288)). Moreover, as the authors point out, there are multipotent mesenchymal precursor cells in the nerve that migrate into the surrounding tissue following nerve injury and contribute to regeneration (Carr et al, PMID 30503141). Therefore, there are multiple possible sources of BDNF, highlighting the need to clearly demonstrate that FAP-derived BDNF is essential.

      Similarly, the authors should provide some evidence that BDNF protein is produced by FAPs. All of their data for BDNF expression is based on mRNA expression and that appears to only be increased in a small subset of FAPs. Perhaps an immunostaining could be done to demonstrate up-regulation of BDNF in FAPs after injury.

      The suggestion that Schwann cell-derived GDNF is responsible for up-regulation of BDNF in the FAPs is indirect, based largely on the data showing that injection of GDNF into the muscle is sufficient to up-regulate BDNF (Fig. 4F & G). However, to more directly connect the 2 observations in a causal way, the authors should inject a Ret/GDNF antagonist, such as a Ret-Fc construct, then measure the BDNF levels.

      In assessing the regeneration after nerve crush, the authors focus on remyelination, for example, assessing CMAP and g-ratios. However, they should also quantify axon regeneration, which can be done distal to the crush injury at earlier time points, before the 6 weeks scored in their study. Evaluating axon regeneration, which occurs prior to remyelination, would be especially useful because BDNF can act on both Schwann cells, to promote myelination, and axons, enhancing survival and growth. They could also evaluate the stability of the neuromuscular junctions, particularly if a denervation was done with the conditional knock outs, although that may be a bit beyond the scope of this study.

    3. Reviewer #2 (Public Review):

      Summary:

      Yoo and colleagues studied the cellular mechanism allowing fibro-adipogenic progenitors (FAPs), muscle resident mesenchymal progenitors, to contribute to nerve regeneration upon regenerative injury. In addition to their expected role in the maintenance of muscle tissue, FAPs also contribute to the maturation and maintenance of neural tissue. After nerve injury, they prevent dying back loss of motor neurons. Consistently, muscle denervation activates FAPs, suggesting that FAPs can sense the injured distal peripheral nerve.

      A transcriptomic database was established using flow cytometry protocols and single-cell RNA-seq. FAPs were isolated from sciatic nerve crush (SNC), considered a regenerative condition, and compared to a non-regenerative condition consisting of denervation-affected muscles (DEN) at different time points after injury: early (3 and 7 days post-injury, dpi) and late (14 and 28 dpi), when the regeneration process has started to resolve. Transcriptome changes of the nine different conditions were compared: non-injured, 3, 7, 14, and 28 days after injury. Bioinformatic analysis and other filters were applied, including UMAP plots, hierarchical clustering analysis using differentially expressed genes (DEGs), volcano plots, and RNA velocity analysis. In addition to most of the supplementary material, the first three and a half central figures consist of the analysis of the transcriptome changes comparing the different conditions. Overall, the data indicate similar DEGs after both types of injury at early stages. Still, just after SNC, the gene expression pattern reaches similar levels compared to non-injured, meaning the injured process is resolved. For example, the Interleukin6/Stat3 pathway is upregulated in both injury models but downregulated at 28 days just in SNC. When focusing on the comparison between 28 dpi between both types of injury, it indicates a role of FAPs in the resolution of inflammation in SNC and participation of FAPs in fibrosis and inflammation in DEN at 28 dpi. Genes related to wound healing were enriched in both.

      With the question in mind of how FAPs are sensing injury, the authors identified a subset of FAPs relevant to regeneration in the SNC model. The unsupervised clustering of FAPs cells considering the nine different types of samples resulted in seven clusters of FAPs. Cluster one was exclusive to non-injury animals or regenerated samples. Clusters two and three were exclusive to the early injured or denervated nerve, suggesting that cluster one senses injury and clusters two and three are derived from it. Among the highest DEGs in cluster one were the GDNF receptors Ret and Gfra1. It is known that GDNF is released by Schwann cells after nerve injury in the literature. Also, gene expression analysis in clusters two and three predicts RTK involvement and GDNF signaling. Altogether, transcriptomic data suggest that GDNF is the mechanism by which FAPs sense nerve injury.

      On the other hand, they found BDNF expression limited to cluster two of injured FAPs, suggesting that FAPs respond to GDNF by secreting BDNF. Although the specific role of secreted BDNF by FAPs in nerve regeneration is unknown, BDNF is known to have a regenerative influence on injured sciatic nerves by promoting both axonal growth and myelination. Consistent with their hypothesis, the analysis of gene expression in Schwann cells (sorted using the Plp1CreER Rosatd tomato mouse) and FAPs after injury indicates an initial increase in GDNF gene expression in early time points after injury in Schwann cells, followed by increased expression of BDNF in FAPs. Using conditional knock-out of BDNF in low limb FAPs (Prrx1Cre; Bdnffl/fl), they were able to demonstrate that nerve regeneration is impaired in Prrx1Cre; Bdnffl/fl, by delayed myelinization of axons.

      Strengths:

      I found the article well-written and cleverly maximized the interpretation and analysis of single-cell transcriptome data. Their findings illuminate how growth factors allow communication between cells responding to injury to promote regeneration. I find the data generated by the authors sufficient to support their model and claims,

      Weaknesses:

      Although, I find the data the authors generated enough for their claims. I do see them as relatively poor, and a complementary analysis of protein expression would strengthen the paper through immunostaining of the different genes mentioned for FAPs and Schwann cells. The model is entirely supported by measuring mRNA levels and negative regulation of gene expression in specific cells. Additionally, what happens to the structure of the neuromuscular junction after regeneration when GDNF or BDNF expression is reduced? The determination of decreasing levels of FAPs BDNF mRNA during aging is interesting; is the gain of BDNF expression in FAPs reverting the phenotype?

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Kyusang Yoo et al. "Muscle-resident mesenchymal progenitors sense and repair peripheral nerve injury via the GDNF-BDNF axis" investigates the role and mechanisms of fibro-adipogenic progenitors (FAPs), that are muscle-resident mesenchymal progenitors, in the maturation and maintenance of the neuromuscular system. There is earlier evidence that absence of FAPs or its functional decline with age cause smaller regenerated myofibers. Role of FAPs on peripheral nerve regeneration is very poorly studied. This study has translational importance because traumatic injury to the peripheral nerve can cause lifelong paralysis of the injured limb.

      This manuscript provides data indicating that GDNF-BDNF axis plays an important role in peripheral nerve regeneration and function.

      Strengths:

      Because the role of FAPs on peripheral nerve regeneration is very poorly studied this investigation is a major step towards understanding the mechanism on the role of FAPs. They use scRNA-seq, animal models, and cKO mice that is also important. This study has translational importance because traumatic injury to the peripheral nerve can cause lifelong paralysis of the injured limb.<br /> This is an interesting and original study focusing on the role of FAPs and indicating that GDNF-BDNF axis plays an important role in peripheral nerve regeneration and function.

      Weaknesses:

      In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.<br /> Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells.<br /> There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation In FAP cells.<br /> The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive.

    1. eLife assessment

      Cancer treatments are not just about the tumor - there is an ever-increasing need for treating pain, fatigue, and anhedonia resulting from the disease. Using an implantable oral tumor model in the mouse, the authors provide valuable information showing that nerve fibers are transmitting sensory signals to the brain that reduce pleasure and motivation. These findings are in part supported by anatomical and transcript changes in the tumor that suggest sensory innervation, neural tracing, and neural activity measurements; however, the study is incomplete in its current form.

    2. Reviewer #1 (Public Review):

      Summary:

      Using a mouse model of head and neck cancer, Barr et al show that tumor-infiltrating nerves connect to brain regions via the ipsilateral trigeminal ganglion, and they demonstrate the effect this has on behavior. The authors show that there are neurites surrounding the tumors using a WGA assay and show that the brain regions that are involved in this tumor-containing circuit have elevated Fos and FosB expression and increased calcium response. Behaviorally, tumor-bearing mice have decreased nest building and wheel running and increased anhedonia. The behavior, Fos expression, and heightened calcium activity were all decreased in tumor-bearing mice following nociceptor neuron elimination.

      Strengths:

      This paper establishes that sensory neurons innervate head and neck cancers and that these tumors impact select brain areas. This paper also establishes that behavior is altered following these tumors and that drugs to treat pain restore some but not all of the behavior. The results from the experiments (predominantly gene and protein expression assays, cFos expression, and calcium imaging) support their behavioral findings both with and without drug treatment.

      Weaknesses:

      Study suggests that the effects of their tumor models of mouse behavioral are largely non-specific to the tumor as most behaviors are rescued by analgesic treatment. So, most of the changes were likely due to site-specific pain and not a unique signal from the tumor.

    3. Reviewer #2 (Public Review):

      Summary:

      Cancer treatments are not just about the tumor - there is an ever-increasing need for treating pain, fatigue, and anhedonia resulting from the disease as patients are undergoing successful but prolonged bouts with cancer. Using an implantable oral tumor model in the mouse, Barr et al describe neural infiltration of tumors, and posit that these nerve fibers are transmitting pain and other sensory signals to the brain that reduce pleasure and motivation. These findings are in part supported by anatomical and transcriptional changes in the tumor that suggest sensory innervation, neural tracing, and neural activity measurements. Further, the authors conduct behavior assays in tumor-bearing animals and inhibit/ablate pain sensory neurons to suggest the involvement of local sensory innervation of tumors in mediating cancer-induced malaise.

      Strengths:

      • This is an important area of research that may have implications for improving the quality of life of cancer patients.

      • The studies use a combination of approaches (tracing and anatomy, transcriptional, neural activity recordings, behavior assays, loss-of-function) to support their claims.

      • Tracing experiments suggest that tumor-innervating afferents are connected to brain nuclei involved in oral pain sensing. Consistent with this, the authors observed increased neural activity in those brain areas of tumor-bearing animals. It should be noted that some of these brain nuclei have also been implicated in cancer-induced behavioral alterations in non-head and neck tumor models.

      • Experiments are for the most part well-controlled, and approaches are validated.

      • The paper is well-written and the layout was easy to follow.

      Weaknesses:

      • The main claim is that tumor-infiltrating nerves underlie cancer-induced behavioral alterations, but the experimental interventions are not specific enough to support this. For example, all TRPV1 neurons, including those innervating the skin and internal organs, are ablated to examine sensory innervation of the tumor. Within the context of cancer, behavioral changes may be due to systemic inflammation, which may alter TRPV1 afferents outside the local proximity of tumor cells. A direct test of the claims of this paper would be to selectively inhibit/ablate nerve fibers innervating the tumor or mouth region.

      • Behavioral results from TRPV1 neuron ablation studies are in part confounded by differing tumor sizes in ablated versus control mice. Are the differences in behavior potentially explained by the ablated animals having significantly smaller tumors? The differences in tumor sizes are not negligible. One way to examine this possibility might be to correlate behavioral outcomes with tumor size.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors have tested for and demonstrated a physical (i.e., sensory nerves to the brain) connection between tumors and parts of the brain. This can explain why there is an increase in depressive disorders in HNSCC patients. While connections such as this have been suspected, this is a novel demonstration pointing to sensory neurons that is accompanied by a remarkable amount of complementary data.

      Strengths:

      There is substantial evidence provided for the hypotheses tested. The data are largely quite convincing.

      Weaknesses:

      The authors mention in their Discussion the need for additional experiments. Could they also include / comment on the potential impact on the anti-tumor immune system in their model?

      Minor:

      The authors mention the importance of inflammation contributing to pain in cancer but do not clearly highlight how this may play a role in their model. Can this be clarified?

      The tumor model apparently requires isoflurane injection prior to tumor growth measurements. This is different from most other transplantable types of tumors used in the literature. Was this treatment also given to control (i.e., non-tumor) mice at the same time points? If not, can the authors comment on the impact of isoflurane (if any) in their model?

      The authors emphasize in several places that this is a male mouse model. They mention this as a limitation in the Discussion. Was there an original reason why they only tested male mice?

    1. eLife assessment

      The authors show that short bouts of chemical ischemia lead to presynaptic changes in glutamate release and long-term potentiation, whereas longer bouts of chemical ischemia lead to synaptic failure and presumably cell death (which could be confirmed experimentally). This solid work relies on rigorous electrophysiology/imaging experiments and data analysis. It is valuable as it provides new mechanistic details on chemical ischemia, though its implications for ischemic stroke in vivo remain to be determined.

    2. Reviewer #1 (Public Review):

      Summary:

      This work by Passlick and colleagues set out to reveal the mechanism by which short bouts of ischemia perturb glutamate signalling. This manuscript builds upon previous work in the field that reported a paradoxical increase in synaptic transmission following acute, transient ischemia termed ischemic or anoxic long-term potentiation. Despite these observations, how this occurs and the involvement of glutamate release and uptake mechanisms remains unanswered.

      Here the authors employed two distinct chemical ischemia models, one lasting 2 minutes, the other 5 minutes. Recording evoked field excitatory postsynaptic potentials in acute brain slices, the authors revealed that shorter bouts of ischemia resulted in a transient decrease in postsynaptic responses followed by an overshoot and long-term potentiation. Longer bouts of chemical ischemia (5 minutes), however, resulted in synaptic failure that did not return to baseline levels over 50 minutes of recording (Figure 1).

      Two-photon imaging of fluorescent glutamate sensor iGluSnFR expressed in astrocytes matched postsynaptic responses with shorter ischemia resulting in a transient dip before the increase in extracellular glutamate which was not the case with prolonged ischemia (Figure 2).

      Mechanistically, the authors show that these increased glutamate levels and postsynaptic responses were not due to changes in glutamate clearance (Figure 3). Next using a competitive antagonist for AMPA postsynaptic AMPA receptors the authors show that synaptic glutamate release was enhanced by 2 minute chemical ischemia.

      Taken together, these data reveal the underlying mechanism regarding ischemic long-term potentiation, highlighting presynaptic release as the primary culprit. Additionally, the authors show relative insensitivity of glutamate uptake mechanisms during ischemia, highlighting the resilience of astrocytes to this metabolic challenge.

      Strengths:

      This manuscript uses robust and modern techniques to address the mechanism by which ischemia influences synaptic transmission in the hippocampus.

      The data are of high quality, with adequately powered sample sizes to address their hypotheses.

      Weaknesses:

      The question of the physiological relevance of short bouts of ischemia remains.

      The precise mechanisms underlying the shift between ischemia-induced long-term potentiation and long-term failure of synaptic responses were not addressed. Could this be cell death?

      Sex differences are not addressed or considered.

    3. Reviewer #2 (Public Review):

      Summary:

      To investigate the impact of chemical ischemia induced by blocking mitochondrial function and glycolysis, the authors measured extracellular field potentials, performed whole-cell patch-clamp recordings, and measured glutamate release with optical techniques. They found that shorter two-minute-lasting blockade of energy production initially blocked synaptic transmission but subsequently caused a potentiation of synaptic transmission due to increased glutamate release. In contrast, longer five-minute-lasting blockage of energy production caused a sustained decrease of synaptic transmission. A correlation between the increase of intracellular potassium concentration and the response upon chemical ischemia indicates that the severity of the ischemia determines whether synapses potentiate or depress upon chemical ischemia. A subsequent mechanistic analysis revealed that the speed of uptake of glutamate is unchanged. An increase in the duration of the fiber volley reflecting the extracellular voltage of the action potentials of the axon bundle was interpreted as an action potential broadening, which could provide a mechanistic explanation. In summary, the data convincingly demonstrate that synaptic potentiation induced by chemical ischemia is caused by increased glutamate release.

      Strengths:

      The manuscript is well-written and the experiments are carefully designed. The results are exciting, novel, and important for the field. The main strength of the manuscript is the combination of electrophysiological recordings and optical glutamate imaging. The main conclusion of increased glutamate release was furthermore supported with an independent approach relying on a low-affinity competitive antagonist of glutamate receptors. The data are of exceptional quality. Several important controls were carefully performed, such as the stability of the recordings and the size of the extracellular space. The number of experiments is sufficient for the conclusions. The careful data analysis justifies the classification of two types of responses, namely synaptic potentiation and depression after chemical ischemia. Except for the duration of the presynaptic action potentials (see below weaknesses) the data are carefully discussed and the conclusions are justified.

      Weaknesses:

      The weaknesses are minor and only relate to the interpretation of some of the data regarding the presynaptic mechanisms causing the potentiation of release. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased, which could be due to the action potential broadening of the individual axons but could also be due to differences in conduction velocity. We are therefore skeptical whether the conclusion of action broadening is justified.

    4. Reviewer #3 (Public Review):

      Summary:

      This valuable study shows that shorter episodes (2 minutes duration) of energy depletion, as it occurs in ischemia, could lead to long-lasting dysregulation of synaptic transmission with presynaptic alterations of glutamate release at the CA3-CA1 synapses. A longer duration of chemical ischemia (5 minutes) permanently suppresses synaptic transmission. By using electrophysiological approaches, including field and patch clamp recordings, combined with imaging studies, the authors demonstrated that 2 minutes of chemical ischemia leads to a prolonged potentiation of synaptic activity with a long-lasting increase of glutamate release from presynaptic terminals. This was observed as an increase in iGluSnFR fluorescence, a sensor for glutamate expressed selectively on hippocampal astrocytes by viral injection. The increase in iGluSnFR fluorescence upon 2-minute chemical ischemia could not be ascribed to an altered glutamate uptake, which is unaffected by both 2-minute and 5-minute chemical ischemia. The presynaptic increase in glutamate release upon short episodes of chemical ischemia is confirmed by a reduced inhibitory effect of the competitive antagonist gamma-D-glutamylglycine on AMPA receptor-mediated postsynaptic responses. Fiber volley durations in field recording are prolonged in slices exposed to 2 min chemical ischemia. The authors interpret this data as an indication that the increase in glutamate release could be ascribed to a prolongation of the presynaptic action potential possibly due to inactivation of voltage-dependent K+ channels. However, more direct evidence is needed to support this hypothesis fully. This research highlights an important mechanism by which altered ionic homeostasis underlying metabolic failure can impact on neuronal activity. Moreover, it also showed a different vulnerability of mechanisms involved in glutamatergic transmission with a marked resilience of glutamate uptake to chemical ischemia.

      Strengths:

      (1) The authors use a variety of experimental techniques ranging from electrophysiology to imaging to study the contribution of several mechanisms underlying the effect of chemical ischemia on synaptic transmission.

      (2) The experiments are appropriately designed and clearly described in the figures and in the text.

      (3) The controls are appropriate.

      Weaknesses:

      - The data on fiber volley duration should be supported by more direct measurements to prove that chemical ischemia increases presynaptic Ca2+ influx due to a presynaptic broadening of action potentials. Given the influence that positioning of the stimulating and recording electrode can have on the fiber volley properties, I found this data insufficient to support the assumption of a relationship between increased iGluSnFR fluorescence, action potential broadening, and increased presynaptic Ca2+ levels.

      - The results are obtained in an ex-vivo preparation, it would be interesting to assess if they could be replicated in vivo models of cerebral ischemia.

      Impact:

      This study provides a more comprehensive view of the long-term effects of energy depletion during short episodes of experimental ischemia leading to the notion that not only post-synaptic changes, as reported by others, but also presynaptic changes are responsible for long-lasting modification of synaptic transmission. Interestingly, the direction of synaptic changes is bidirectional and dependent on the duration of chemical ischemia, indicating that different mechanisms involved in synaptic transmission are differently affected by energy depletion.

    1. eLife assessment

      This important work provides interesting datasets of myofiber differentiation. The evidence supporting the involvement of SRF2 in selected biological processes is convincing, however, additional evidence to pin-point the major action of SRF2 during muscle differentiation is appreciated. The work will be of broad interest to developmental biologists in general and molecular biologists in the field of gene regulation.

    2. Reviewer #1 (Public Review):

      Summary

      The work by She et al. investigates the role of SRFS2 in the MyoD+ progenitor cells during development. Deletion of SRFS2 in MyoD+ progenitor cells resulted in a defect in the directional migration of these cells and resulted in the presence of myoD+ progenitor in both nonmuscle and muscle tissues. The authors showed a defect in gene program regulation ECM, cell migration, cytoskeletal organization, and skeletal muscle differentiation by scRNA-seq. The authors further showed that many of these processes are regulated by a downstream target of SRFS2, the serine-threonine kinase Aurka. Finally, the authors showed that SRFS2 acts as a splicing factor and could contribute to differentiation by controlling the splicing of muscle-specific transcripts. This study addresses an important question in skeletal muscle development by focusing on the pathways and factors that regulate the migration of myoD+ progenitors and the impact of this process in skeletal muscle differentiation. This work is interesting but requires experimental evidence to support the findings.

      Strengths

      The regulators of myod+progenitor migration during skeletal muscle development is not completely understood. This work demonstrates that SRFS2 and aura kinase are key players in the process. Combining knockout and reporter lines in mice, the authors perform a detailed analysis of skeletal muscle cells to demonstrate the specific defects in SRFS2 in skeletal muscle development.

      Weaknesses

      This work explores an interesting question on regulating myoD+ progenitors and the defects of this process in skeletal muscle differentiation by SRFS2 but spreads out in many directions rather than focusing on the key defects. A number of approaches are used, but they lack the robust mechanistic analysis of the defects that result in muscle differentiation. Specifically, the role of SRFS2 on splicing appears to be a misfit here and does not explain the primary defects in the migration of myoD+ progenitors. There are concerns about the scRNA-seq and many transcripts in muscle biology that are not expressed in muscle cells. Focusing on main defects and additional experimental evidence to clear the fusion vs. precocious differentiation vs. reduced differentiation will strengthen this work.

      (1) The analysis of RNA-seq data (Figure 2) is limited, and it is unclear how it relates to the work presented in this MS. The Go enrichment analysis is combined for both up and down-regulated DEG, thus making it difficult to understand the impact differently in both directions. Stac2 is a predominant neuronal isoform (while Stac3 is the muscle), and the Symm gene is not found in the HGNC or other databases. Could the authors provide the approved name for this gene? The premise of this work is based on defects in ECM processes resulting in the mis-targeting of the muscle progenitors to the nonmuscle regions. Which ECM proteins are differentially expressed?

      (2) Could authors quantify the muscle progenitors dispersed in nonmuscle regions before their differentiation? Which nonmuscle tissues MyoD+ progenitors are seen? Most of the tDT staining in the enlarged sections appears to be punctate without any nuclear staining seen in these cells (Figure 3 B, D E-F). Could authors provide high-resolution images? Also, in the diaphragm cross-sections in mutants, tdT labeling appears to be missing in some areas within the myofibers defined as cavities by the authors (marked by white arrows, Figure 3H). Could this polarized localization of tDT be contributing to specific defects?

      (3) Is there a difference in the levels of tDT in the myoD" muscle progenitors that are mis-targeted vs the others that are present in the muscle tissues?

      (4) scRNA is unsuitable for myotubes and myofibers due to their size exclusion from microfluidics. Could authors explain the basis for scRNA-seq vs SnRNA-seq in this work? How are SKM defined in scRNA-data in Figure 4? As the myofibers are small in KO, could the increased level of late differentiation markers be due to the enrichment of these small myotubes/myofibers in scRNA? A different approach, such as ISH/IF with the myogenic markers at E9.5-10.5, may be able to resolve if these markers are prematurely induced.

      (5) TNC is a marker for tenocytes and is absent in skeletal muscle cells. The authors mentioned a downregulation of TNC in the KO SKM derived clusters. This suggests a contamination of the tenocytes in the control cells. In spite of the downregulation of multiple ECM genes showed by scRNA-seq data, the ECM staining by laminin in KO in Figure 3 appears to be similar to controls.

      (6) The expression of many fusion genes, such as myomaker and myomerger, is reduced in KO, suggesting a primary fusion defect vs a primary differentiation defect. Many mature myofiber proteins exhibit an increased expression in disease states, suggesting them as a compensatory mechanism. Authors need to provide additional experimental evidence supporting precocious differentiation as the primary defect.

      (7) The fusion defects in KO are also evident in siRNA knockdown for SRSF2 and Aurka in C2C12, which mostly exhibits mononucleated myocytes in knockdowns. Also, a fusion index needs to be provided.

      (8) The last section of the role of SRSF2 on splicing appears to be a misfit in this study. Authors describe the Bin1 isoforms in centronuclear myopathy, but exon17 is not involved in myopathy. Is exon17 exclusion seen in other diseases/ splicing studies?

    3. Reviewer #2 (Public Review):

      Summary:

      This study was aimed to study the role of SRSF2 in governing MyoD progenitors to specific muscle regions. The Results confirmed the role of SRSF2 in controlling myogenic differentiation through the regulation of targeted genes and alternative splicing during skeletal muscle development.

      Strengths:

      The study used different methods and techniques to achieve aims and support the conclusions such as RNA sequencing analysis, Gene Ontology analysis, immunostaining analysis.<br /> This study provides novel findings that SRSF2 controls the myogenic differentiation of MyoD+ progenitors, using transgenic mouse model and in vitro studies.

      Weaknesses:

      Although unbiased sequencing methods were used, their findings about SRSF2 served as a transcriptional regulator and functioned in alternative splicing events are not novel.<br /> The introductions and discussion is not clearly written. The authors did not raise clear scientific questions in the introduction part. The last paragraph is only copy-paste of the abstract. The discussion part is mainly the repeat of their results without clear discussion.

    1. Reviewer #3 (Public Review):

      Summary:

      This study employs an optogenetics approach aimed at activating oncogene (KRASG12V) expression in a single somatic cell, with a focus on following the progression of activated cell to examine tumourigenesis probabilities under altered tissue environments. The research explores the role of stemness factors (VENTX/NANOG/OCT4) in facilitating oncogenic RAS (KRASG12V)-driven malignant transformations. Although the evidence provided are incomplete, the authors propose an important mechanism whereby reactivation of re-programming factors correlates with the increased likelihood of a mutant cell undergoing malignant transformation.

      Strengths:

      · Innovative Use of Optogenetics: The application of optogenetics for precise activation of KRAS in a single cell is valuable to the field of cancer biology, offering an opportunity to uncover insight into cellular responses to oncogenic mutations.<br /> · Important Observations: The findings concerning stemness factors' role in promoting oncogenic transformation are important, contributing data to the field of cancer biology.

      Weaknesses:

      Lack of Methodological Clarity: The manuscript lacks detailed descriptions of methodologies, making it difficult to fully evaluate the experimental design and reproducibility, rendering incomplete evidence to support the conclusion. Improving methodological transparency and data presentation will crucially strengthen the paper's contributions to understanding the complex processes of tumourigenesis.<br /> Sub-optimal Data Presentation and Quality:

      The resolution of images throughout the manuscript are too low. Images presented in Figure 2 and Figure 4 are of very low resolution. It is very hard to distinguish individual cells and in which tissue they might reside.<br /> Lack of quantitative data and control condition data obtained from images of higher magnification limits the ability to robustly support the conclusions.

      Here are some details:<br /> · Tissue specificity of the cells express KRASG12V oncogene: In this study, the ubiquitin promoter was used to drive oncogenic KRASG12V expression. Despite this, the authors claim to activate KRAS in a single brain cell based on their localized photo-activation strategy. However, upon reviewing the methods section, the description was provided that 'Localized uncaging was performed by illumination for 7 minutes on a Nikon Ti microscope equipped with a light source peaking at 405 nm, Figure 1. The size of the uncaging region was controlled by an iris that defines a circular illumination with a diameter of approximately 80 μm.' It is surprising that an epi-fluorescent microscope with an illumination diameter of around 80μm can induce activation in a single brain cell beneath skin tissue. Additionally, given that the half-life for mTFP maturation is around 60 minutes, it is likely that more cells from a variety of different lineages could be activated, but the fluorescence would not be visible until more than 1-hour post-illumination. Authors might want to provide more evidence to support their claim on the single cell KRAS activation.<br /> · Stability of cCYC: The manuscript does not provide information on the half-life and stability of cCYC. Understanding these properties is crucial for evaluating the system's reliability and the likelihood of leakiness, which could significantly influence the study's outcomes.<br /> · Metastatic Dissemination claim: Typically, metastatic cancer cells migrate to and proliferate within specific niches that are conducive to outgrowth, such as the caudal hematopoietic tissue (CHT) or liver. In figure 3 A, an image showing the presence of mTFP expressing cells in both the head and tail regions of the larva, with additional positive dots located at the fin fold. This is interpreted as "metastasis" by the authors. However, the absence of a supportive cellular compartment within the fin-fold tissue makes the presence of mTFP-positive metastatic cells there particularly puzzling. This distribution raises concerns about the spatial specificity of the optogenetic activation protocol.<br /> The unexpected locations of these signals suggest potential ectopic activation of the KRAS oncogene, which could be occurring alongside or instead of targeted activation. This issue is critical as it could affect the interpretation of whether the observed mTFP signal expansion over time is due to actual cell proliferation and infiltration, or merely a result of ectopic RAS transgene activation.<br /> · Image Resolution Concerns: The cells depicted in Figure 3C β, which appear to be near the surface of the yolk sac and not within the digestive system as suggested in the MS, underscore the necessity for higher-resolution imaging. Without clearer images, it is challenging to ascertain the exact locations and states of these cells, thus complicating the assessment of experimental results.<br /> · The cell transplantation experiment is lacking protocol details: The manuscript does not adequately describe the experimental protocols used for cell transplantation, particularly concerning the origin and selection of cells used for injection into individual larvae. This omission makes it difficult to evaluate the reliability and reproducibility of the results. Such as the source of transplanted cells:<br /> • If the cells are derived from hyperplastic growths in larvae where RAS and VX (presumably VENTX) were locally activated, the manuscript fails to mention any use of fluorescence-activated cell sorting (FACS) to enrich mTFP-positive cells. Such a method would be crucial for ensuring the specificity of the cells being studied and the validity of the results.<br /> • If the cells are obtained from whole larvae with induced RAS + VX expression, it is notable and somewhat surprising that the larvae survived up to six days post-induction (6dpi) before cells were harvested for transplantation. This survival rate and the subsequent ability to obtain single cell suspensions raise questions about the heterogeneity of the RAS + VX expressing cells that transplanted.<br /> · Unclear Experimental Conditions in Figure S3B: The images in Figure S3B lack crucial details about the experimental conditions. It is not specified whether the activation of KRAS was targeted to specific cells or involved whole-body exposure. This information is essential for interpreting the scope and implications of the results accurately.<br /> · Contrasting Data in Figure S3C compared to literature: The graph in Figure S3C indicates that KRAS or KRAS + DEX induction did not result in any form of hyperplastic growth. This observation starkly contrasts with previous literature where oncogenic KRAS expression in zebrafish led to significant hyper-proliferation and abnormal growth, as evidenced by studies such as those published in and Neoplasia (2018), DOI: 10.1016/j.neo.2018.10.002; Molecular Cancer (2015), DOI: 10.1186/s12943-015-0288-2; Disease Models & Mechanisms (2014) DOI: 10.1242/dmm.007831. The lack of expected hyperplasia raises questions about the experimental setup or the specific conditions under which KRAS was expressed. The authors should provide detailed descriptions of the conditions under which the experiments were conducted in Figure S3B and clarifying the reasons for the discrepancies observed in Figure S3C are crucial. The authors should discuss potential reasons for the deviation from previous reports.

      Further comments:

      Throughout the study, KRAS-activated cell expansion and metastasis are two key phenotypes discussed that Ventx is promoting. However, the authors did not perform any experiments to directly show that KRAS+ cells proliferate only in Ventx-activated conditions. The authors also did not show any morphological features or time-lapse videos demonstrating that KRAS+ cells are motile, even though zebrafish is an excellent model for in vivo live imaging. This seems to be a missed opportunity for providing convincing evidence to support the authors' conclusions.

      There were minimal experimental details provided for the qPCR data presented in the supplementary figures S5 and S6, therefore, it is hard to evaluate result obtained.

    2. eLife assessment

      This study provides valuable initial characterization of a verterbrate embryonic system that demonstrates aspects of an optogenetically inducible hyperplasia model. Although the evidence provided is incomplete to conclude that the system is demonstrating tumor initiation from a single cell that is metastasizing that can be quantitatively assessed, the authors propose a mechanism whereby reactivation of re-programming factors correlates with the increased likelihood of a mutant cell undergoing malignant transformation. This work will be of interest to developmental and cancer biologists mainly for the novel genetic tools described.

    3. Reviewer #1 (Public Review):

      Scerbo et al. developed an approach based on the oncogene kRasG12V and a reprogramming factor to induce deterministic and reproducible malignant transformation in a single cell. The activation of kRasG12V alone is not sufficient in their hands to initiate carcinogenesis, but when combined with the transient activation of a reprogramming factor (such as Ventx, Nanog, or Oct4), it significantly increases the probability of malignant transformation. This combination of oncogene and reprogramming factor may alter the epigenetic and functional state of the cell, leading to the development of tumors within a short period of time. The use of these two factors allows for the controlled manipulation of a single cell to study the cellular and molecular events involved in the early stages of tumorigenesis. The authors then performed allotransplantations of allegedly single fluorescent TICs in recipient larvae and found a large number of fluorescent cells in distant locations, claiming that these cells have all originated from the single transplanted TIC and migrated away. The number of fluorescent cells showed in the recipient larve just after two days is not compatible with a normal cell cycle length and more likely represents the progeny of more than one transplanted cell. The ability to migrate from the injection site should be documented by time-lapse microscopy. Then, the authors conclude that "By allowing for specific and reproducible single cell malignant transformation in vivo, their optogenetic approach opens the way for a quantitative study of the initial stages of cancer at the single cell level". However, the evidence for these claims are weak and further characterization should be performed to:

      (1) show that they are actually activating the oncogene in a single cell (the magnification is too low and it is difficult to distinguish a single nucleus, labelling of the cell membrane may help to demonstrate that they are effectively activating the oncogene in, or transplanting, a single cell)<br /> (2) the expression of the genes used as markers of tumorigenesis is performed in whole larvae, with only a few transformed cells in them. Changes should be confirmed in FACS sorted fluorescent cells<br /> (3) the histology of the so called "tumor masses" is not showing malignant transformation, but at the most just hyperplasia. In the brain, the sections are not perfectly symmetrical and the increase of cellularity on one side of the optic tectum is compatible with this asymmetry.<br /> (4) The number of fluorescent cells found dispersed in the larve transplanted with one single TIC after 48 hours will require a very fast cell cycle to generate over 50 cells. Do we have an idea of the cell cycle features of the transplanted TICs?

    4. Reviewer #2 (Public Review):

      Summary:

      In the work by Scerbo et al, the authors aim to better understand the open question of what factors constrain cells that are genetically predisposed to form cancer (e.g. those with a potentially cancer-causing mutation like activated Ras) to only infrequently undergo this malignant transformation, with a focus on the influence of embryonic or pluripotency factors (e.g. VENTX/NANOG). Using genetically defined zebrafish models, the authors can inducibly express the KRASG12V oncogene using a combination of Cre/Lox transgenes further controlled by optogenetically inducible Cre-activated (CreER fusion that becomes active with light-induced uncaging of a tamoxifen-analogue in a targeted region of the zebrafish embryo). They further show that transient expression and activation of a pluripotency factor (e.g. Ventx fused to a GR receptor that is activated with addition of dexamethasone) must occur in the model in order for overgrowth of cells to occur. This paper describes a genetically tractable and modifiable system for studying the requirements for inducing cellular hyperplasia in a whole organism by combining overexpression of canonical genetic drivers of cancer (like Ras) with epigenetic modifiers (like specific transcription factors), which could be used to study an array of combinations and temporal relationships of these cancer drivers/modifiers.

      Strengths:

      The combination of Cre/lox inducible gene expression with potentially localized optogenetic induction (CreER and uncaging of tamoxifen analogues) of recombination as well as well inducible activation of a transcription factor expressed via mRNA injection (GR-fusion to the TF and dex induction) offers a flexible system for manipulating cell growth, identity, and transcriptional programs. With this system, the authors establish that Ras activation and at least transient Ventx overexpression are together required to induce a hyperproliferative phenotype in zebrafish tissues.

      The ability to live image embryos over the course of days with inducible fluorophores indicating recombination events and transgene overexpression offers a tractable in vivo system for studying hyperplastic cells in the context of a whole organism.

      The transplant experiments demonstrate the ability of the induced hyperplastic cells to grow upon transfer to new host.

      Weaknesses:

      There is minimal quantitation of key aspects of the system, most critically in the efficiency of activation of the Ras-TFP fusion (Fig 1) in, purportedly, a single cell. The authors note "On average the oncogene is then activated in a single cell, identified within ~1h by the blue fluorescence of its nuclear marker) but no additional quantitative information is provided. For a system that is aimed at "a statistically relevant single-cell<br /> tracking and characterization of the early stages of tumorigenesis", such information seems essential.

      The authors indicate that a single cell is "initiated" (Fig 2) using the laser optogenetic technique, but without definitive genetic lineage tracing, it is not possible to conclude that cells expressing TFP distant from the target site near the ear are daughter cells of the claimed single "initiated" cell. A plausible alternative explanation is 1) that the optogenetic targeting is more diffuse (i.e. some of the light of the appropriate wavelength hits other cells nearby due to reflection/diffraction), so these adjacent cells are additional independent "initiated" cells or 2) that the uncaged tamoxifen analogue can diffuse to nearby cells and allow for CreER activation and recombination. In Fig 2B, the claim is made that "the activated cell has divided, giving rise to two cells" - unless continuously imaged or genetically traced, this is unproven. In addition, it appears that Figures S3 and S4 are showing that hyperplasica can arise in many different tissues (including intestine, pancreas, and liver, S4C) with broad Ras + Ventx activation (while unclear from the text, it appears these embryos were broadly activated and were not "single cell activated using the set-up in Fig 1E? This should be clarified in the manuscript). In Fig S7 where single cell activation and potential metastasis is discussed, similar gut tissues have TFP+ cells that are called metastatic, but this seems consistent with the possibility that multiple independent sites of initiation are occurring even when focal activation is attempted.

      Although the hyperplastic cells are transplantable (Fig 4), the use of the term "cells of origin of cancer" or metastatic cells should be viewed with care in the experiments showing TFP+ cells (Fig 1, 2, 3) in embryos with targeted activation for the reasons noted above.

    1. eLife assessment

      This study describes the application of machine learning and Markov state models to characterize the binding mechanism of alpha-Synuclein to the small molecule Fasudil. The results suggest that entropic expansion can explain such binding. However, the simulations and analyses in their present form are inadequate.

    2. Reviewer #1 (Public Review):

      Summary:

      This is a well-conducted study about the mechanism of binding of a small molecule (fasudil) to a disordered protein (alpha-synuclein). Since this type of interaction has puzzled researchers for the last two decades, the results presented are welcome as they offer relevant insight into the physical principles underlying this interaction.

      Strengths:

      The results show convincingly that the mechanism of entropic expansion can explain the previously reported binding of fasudil to alpha-synuclein. In this context, the analysis of the changes in the entropy of the protein and of water is highly relevant. The combination use of machine learning for dimensional reduction and of Markov State Models could become a general procedure for the analysis of other systems where a compound binds a disordered protein.

      Weaknesses:

      It would be important to underscore the computational nature of the results, since the experimental evidence that fasudil binds alpha-synuclein is not entirely clear, at least to my knowledge.

    3. Reviewer #2 (Public Review):

      The manuscript by Menon et al describes a set of simulations of alpha-Synuclein (aSYN) and analyses of these and previous simulations in the presence of a small molecule.

      While I agree with the authors that the questions addressed are interesting, I am not sure how much we learn from the present simulations and analyses. In parts, the manuscript reads more like an attempt to apply a whole range of tools rather than with a goal of answering any specific questions.

      There's a lot going on in this paper, and I am not sure it is useful for the authors, readers or me to spell out all of my comments in detail. But here are at least some points that I found confusing/etc

      Major concerns

      p. 5 and elsewhere:<br /> I lack a serious discussion of convergence and the statistics of the differences between the two sets of simulations. On p. 5 it is described how the authors ran multiple simulations of the ligand-free system for a total of 62 µs; that is about 25 times less than for the ligand system. I acknowledge that running 1.5 ms is unfeasible, but at a bare minimum the authors should discuss and analyse the consequences for the relatively small amount of sampling. Here it is important to say that while 62 µs may sound like a lot it is probably not enough to sample the relevant properties of a 140-residue long disordered protein.

      p. 7:<br /> The authors make it sound like a bad thing than some methods are deterministic. Why is that the case? What kind of uncertainty in the data do they mean? One can certainly have deterministic methods and still deal with uncertainty. Again, this seems like a somewhat ad hoc argument for the choice of the method used.

      p. 8:<br /> The authors should make it clear (i) what the reconstruction loss and KL is calculated over and (ii) what the RMSD is calculated over.

      p. 9/figure 1:<br /> The authors select a beta value that may be the minimum, but then is just below a big jump in the cross-validation error. Why does the error jump so much and isn't it slightly dangerous to pick a value close to such a large jump.

      p. 10:<br /> Why was a 2-dimensional representation used in the VAE? What evidence do the authors have that the representation is meaningful? The authors state "The free energy landscape represents a large number of spatially close local minima representative of energetically competitive conformations inherent in αS" but they do not say what they mean by "spatially close". In the original space? If so, where is the evidence.

      p. 10:<br /> It is not clear from the text whether the VAEs are the same for both aSYN and aSYN-Fasudil. I assume they are. Given that the Fasudil dataset is 25x larger, presumably the VAE is mostly driven by that system. Is the VAE an equally good representation of both systems?

      p. 10/11:<br /> Do the authors have any evidence that the latent space representation preserves relevant kinetic properties? This is a key point because the entire analysis is built on this. The choice of using z1 and z2 to build the MSM seems somewhat ad hoc. What does the auto-correlation functions of Z1 and Z2 look like? Are the related to dynamics of some key structural properties like Rg or transient helical structure.

      p. 11:<br /> What's the argument for not building an MSM with states shared for aSYN +- Fasudil?

      p. 12:<br /> Fig. 3b/c show quite clearly that the implied timescales are not converged at the chosen lag time (incidentally, it would have been useful with showing the timescales in physical time). The CK test is stated to be validated with "reasonable accuracy", though it is unclear what that means.

      p. 12:<br /> In Fig. 3d, what are the authors bootstrapping over? What are the errors if the authors analyse sampling noise (e.g. bootstrap over simulation blocks)?

      p. 13:<br /> I appreciate that the authors build an MSM using only a subset of the fasudil simulations. Here, it would be important that this analysis includes the entire workflow so that the VAE is also rebuilt from scratch. Is that the case?

      p. 18:<br /> I don't understand the goal of building the CVAE and DCVAE. Am I correct that the authors are building a complex ML model using only 3/6 input images? What is the goal of this analysis. As it stands, it reads a bit like simply wanting to apply some ML method to the data. Incidentally, the table in Fig. 6C is somewhat intransparent.

      p. 22:<br /> "Our results indicate that the interaction of fasudil with αS residues governs the structural features of the protein."<br /> What results indicate this?

      p. 23:<br /> The authors should add some (realistic) errors to the entropy values quoted. Fig. 8 have some error bars, though they seem unrealistically small. Also, is the water value quoted from the same force field and conditions as for the simulations?

      p. 23:<br /> Has PDB2ENTROPY been validated for use with disordered proteins?

      p. 23/24:<br /> It would be useful to compare (i) the free energies of the states (from their populations), (ii) the entropies (as calculated) and (iii) the enthalpies (as calculated e.g. as the average force field energy). Do they match up?

      p. 31:<br /> It is unclear which previous simulation the new aSYN simulations were launched from. What is the size of the box used?

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript Menon, Adhikari, and Mondal analyze explicit solvent molecular dynamics (MD) computer simulations of the intrinsically disordered protein (IDP) alpha-synuclein in the presence and absence of a small molecule ligand, Fasudil, previously demonstrated to bind alpha-synuclein by NMR spectroscopy without inducing folding into more ordered structures. In order to provide insight into the binding mechanism of Fasudil the authors analyze an unbiased 1500us MD simulation of alpha-synuclein in the presence of Fasudil previously reported by Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510). The authors compare this simulation to a very different set of apo simulations: 23 separate1-4us simulations of alpha-synuclein seeded from different apo conformations taken from another previously reported by Robustelli et. al. (PNAS, 115 (21), E4758-E4766), for a total of ~62us.

      To analyze the conformational space of alpha-synuclein - the authors employ a variational auto-encoder (VAE) to reduce the dimensionality of Ca-Ca pairwise distances to 2 dimensions, and use the latent space projection of the VAE to build Markov state Models. The authors utilize k-means clustering to cluster the sampled states of alpha-synuclein in each condition into 180 microstates on the VAE latent space. They then coarse grain these 180 microstates into a 3-macrostate model for apo alpha-synuclein and a 6-macrostate model for alpha-synuclein in the presence of fasudil using the PCCA+ course graining method. Few details are provided to explain the hyperparameters used for PCCA+ coarse graining and the rationale for selecting the final number of macrostates.

      The authors analyze the properties of each of the alpha-synuclein macrostates from their final MSMs - examining intramolecular contacts, secondary structure propensities, and in the case of alpha-synuclein:Fasudil holo simulations - the contact probabilities between Fasudil and alpha-synuclein residues.

      The authors utilize an additional variational autoencoder (a denoising convolutional VAE) to compare denoised contact maps of each macrostate, and project onto an additional latent space. The authors conclude that their apo and holo simulations are sampling distinct regions of the conformational space of alpha-synuclein projected on the denoising convolutional VAE latent space.

      Finally, the authors calculate water entropy and protein conformational entropy for each microstate. To facilitate water entropy calculations - the author's take a single structure from each macrostate - and ran a 20ps simulation at a finer timestep (4 femtoseconds) using a previously published method (DoSPT), which computes thermodynamic properties of water from MD simulations using autocorrelation functions of water velocities. The authors report that water entropy calculated from these individual 20ps simulations is very similar.

      For each macrostate the authors compute protein conformational entropy using a previously published Maximum Information Spanning tree approach based on torsion angle distributions - and observe that the estimated protein conformational entropy is substantially more negative for the macrostates of the holo ensemble.

      The authors calculate mean first passage times from their Markov state models and report a strong correlation between the protein conformational entropy of each state and the mean first passage time from each state to the highest populated state.

      As the authors observe the conformational entropy estimated from macrostates of the holo alpha-synuclein:Fasudil is greater than those estimated from macrostates of the apo holo alpha-synuclein macrostates - they suggest that the driving force of Fasudil binding is an increase in the conformational entropy of alpha-synuclein. No consideration/quantification of the enthalpy of alpha-synuclein Fasudil binding is presented.

      Strengths:

      The author's utilize MD simulations run with an appropriate force field for IDPs (a99SB-disp and a99SB-disp water (Robustelli et. al, PNAS, 115 (21), E4758-E4766) - which has previously been used to perform MD simulations of alpha-synuclein that have been validated with extensive NMR data.

      The contact probability between Fasudil and each alpha-synuclein residue observed in the previously performed 1500us MD simulation of alpha-synuclein in the presence of Fasudil (Robustelli et. al., Journal of the American Chemical Society, 144(6), pp.2501-2510) was previously found to be in good agreement with experimental NMR chemical shift perturbations upon Fasudil binding - suggesting that this simulation is a reasonable choice for understanding IDP:small molecule interactions.

      Weaknesses:

      Major Weakness 1: Simulations of apo alpha-synuclein and holo simulations of alpha-synuclein and fasudil are not comparable.

      The most robust way to determine how presence of Fasudil affects the conformational ensemble of alpha-synuclein conclusions is to run apo and holo simulations of the same length from the same starting structures using the same simulation parameters.

      The 23 1-4 us independent simulations of apo alpha-synuclein and the long unbiased 1500us alpha-synuclein in the presence of fasudil are not directly comparable. The starting structures of simulations used to build a Markov state model to describe apo alpha-synuclein were taken from a previously reported 73us MD simulation of alpha-synuclein run with the a99SB-disp force field and water model) with 100mM NaCl, (Robustelli et. al, PNAS, 115 (21), E4758-E4766). As the holo simulation of alpha-synuclein and Fasudil was run in 50mM NaCl, snapshots from the original apo alpha-synuclein simulation were resolvated with 50mM NaCl - and new simulations were run.

      No justification is offered for how starting structures were selected. We have no sense of the conformational variability of the starting structures selected and no sense of how these conformations compare to the alpha-synuclein conformations sampled in the holo simulation in terms of standard structural descriptors such as tertiary contacts, secondary structure, radius of gyration (Rg), solvent exposed surface area etc. (we only see a comparison of projections on an uninterpretable non-linear latent-space and average contact maps). Additionally, 1-4 us is a relatively short timescale for a simulation of a 140 residue IDP- and one is unlikely to see substantial evolution for many structural properties of interest (ie. secondary structure, radius of gyration, tertiary contacts) in simulations this short. Without any information about the conformational space sample in the 23 apo simulations (aside from a projection on an uninterpretable latent space)- we have no way to determine if we observe transitions between distinct states in these short simulations, and therefore if it is possible the construct a meaningful MSM from these simulations.

      If the structures used for apo simulations are on average more compact or contain more tertiary contacts - then it is unsurprising that in short independent simulations they sample a smaller region of conformational space. Similarly, if the starting structures have similar dimensions - but we only observe extremely local sampling around starting structures in apo simulations in the short simulation times - it would also not be surprising that we sample a smaller amount of conformational space. By only presenting comparisons of conformational states on an uninformative VAE latent space - it is not possible for a reader to ask simple questions about how the conformational ensembles compare.

      It is noted that the authors attempt to address questions about sampling by building an MSM of single contiguous 60us portion of the holo simulation of alpha-synuclein and Fasudil - noting that:

      "the MSM built using lesser data (and same amount of data as in water) also indicated the presence of six states of alphaS in presence of fasudil, as was observed in the MSM of the full trajectory. Together, this exercise invalidates the sampling argument and suggests that the increase in the number of metastable macrostates of alphaS in fasudil solution relative to that in water is a direct outcome of the interaction of alphaS with the small molecule."

      However, the authors present no data to support this assertion - and readers have no sense of how the conformational space sampled in this portion of the trajectory compares to the conformational space sampled in the independent apo simulations or the full holo simulation. As the analyzed 60us portion of the holo trajectory may have no overlap with conformational space sampled in the independent apo simulations - it is unclear if this control provides any information. There is no quantification of the conformational entropy of the 6 states obtained from this portion of the holo trajectory or the full conformational space sampled. No information is presented to determine if we observe similar states in the shorter portion of the holo trajectory. Furthermore - as the authors provide almost no justification for the criteria used to select of the final number of macrostates for any of the MSMs reported in this work- and the number of macrostates is effectively a free parameter in the PCCA+ method, arriving at an MSM with 6 macrostates does not convey any information about the conformational entropy of alpha-synuclein in the presence or absence of ligands. Indeed - the implied timescale plot for 60us holo MSM (Figure S2) - shows that at least 10 processes are resolved in the 120 microstate model - and there is no information to provided explaining/justifying how a final 6-macrostate model was determined. The authors also do not project the conformations sampled in this sub- trajectory onto the latent space of the final VAE.

      One certainly expects that an MSM built with 1/20th of the simulation data should have substantial differences from an MSM built from the full trajectory - so failing additional information and hyperparameter justification - one wonders if the emergence of a 6-state model could be the direct result of hardcoded VAE and MSM construction hyperparameter choices.

      Required Controls For Supporting the Conclusions of the Study: The authors should initiate apo and holo simulations from the same starting structures - using the same simulation software and parameters. This could be done by adding a Fasudil ligand to the apo structures - or by removing the Fasudil ligand from a subset of holo structures. This would enable them to make apples-to-apples comparisons about the effect of Fasudil on alpha-synuclein conformational space.

      Failing to add direct apples-to-apples comparisons, which would be required to truly support the studies conclusions, the authors should at least compare the conformational space sampled in the independent apo simulations and holo simulations using standard interpretable IDP order parameters (ie. Rg, end-to-end distance, secondary structure order parameters) and/or principal components from PCA or tICA obtained from the holo simulation. The authors should quantify the number of transitions observed between conformational states in their apo simulations. The authors could also perform more appropriate holo controls, without additional calculations, by taking batches of a similar number of short 1-4us segments of simulations used to compute the apo MSMs and examining how the parameters/macrostates of the holo MSMs vary with the input with random selections.

      Major Weakness 2: There is little justification of how the hyperparameters MSMs were selected. It is unclear if the results of the study depend on arbitrary hyperparameter selections such as the final number of macrostates in each model.

      It is unclear what criteria were used to determine the appropriate number of microstates and macrostates for each MSM. Most importantly - as all analyses of water entropy and conformational entropy are restricted to the final macrostates - the criteria used to select the final number of macrostates with the PCCA+ are extremely important to the results of the conclusions of the study. From examining the ITS plots in Figure 3 - it seems both MSMs show the same number of resolved processes (at least 11) - suggesting that a 10-state model could be apropraite for both systems. If one were to simply select a large number of macrostates for the 20x longer holo simulation - do these states converge to the same conformational entropy as the states seen in the short apo simulations? Is there some MSM quality metric used to determine what number of macrostates is more appropriate?

      Required Controls For Supporting the Conclusions of the Study: The authors should specify the criteria used to determine the appropriate number of microstates and macrostates for their MSMs and present controls that demonstrate that the conformational entropies calculated for their final states are not simply a function of the ratio of the number macrostates chosen to represent very disparate amounts of conformational sampling.

      Major Weakness 3: The use of variational autoencoders (VAEs) obscures insights into the underlying conformational ensembles of apo and holo alpha-synuclein rather than providing new ones.

      No rationale is offered for the selection of the VAE architecture or hyperparameters used to reduce the dimensionality of alpha-synuclein conformational space.

      It is not clear the VAEs employed in this study are providing any new insight into the conformational ensembles and binding mechanisms of Fasudil to alpha-synuclein, or if the underlying latent space of the VAEs are more informative or kinetically meaningful than standard linear dimensionality reduction techniques like PCA and tICA. The initial VAE is used to reduce the dimensionality of alpha-synuclein conformational ensembles to 2 degrees of freedom - but it is unclear if this projection is structurally or kinetically meaningful. It is not clear why the authors choice to use a 2-dimeinsional projection instead of a higher number of dimensions to build their MSMs. Can they produce a more kinetically and structurally meaningful model using a higher dimensional VAE latent space?

      Additionally - it is not clear what insights are provided by the Denoising Convolutional Variational Autoencoder. The authors appear to be noising-and-denoising the contact maps of each macrostate, and then projecting the denoised values onto a new latent space - and commenting that they are different. Does this provide additional insight that looking at the contact maps in Figures 4&5 does not? Is this more informative than examining the distribution of the Radii of gyration or the secondary structure propensities of each ensemble? It is not clear what insight this analysis adds to the manuscript.

      Suggested controls to improve the study: The authors should project interpretable IDP structural descriptors (ie. secondary structure, radius of gyration, secondary structure content, # of intramolecular contacts, # of intermolecular contacts between alpha-synuclein and Fasudil ) onto this latent space to illustrate if any of these properties are meaningful separated by the VAE projection. The authors should compare these projections, and MSMs built from these projections, to projections and MSMs built from projections using standard linear dimensionality projection techniques like PCA and tICA.

      Major Weakness 4: The MSMs produced in this study have large discrepancies with MSMs previously produced on the same dataset by the same authors that are not discussed.

      Previously - two of the authors of this manuscript (Menon and Mondal) authored a preprint titled "Small molecule modulates α-synuclein conformation and its oligomerization via Entropy Expansion" (https://www.biorxiv.org/content/10.1101/2022.10.20.513005v1.full) that analyzed the same 1500us holo simulation of alpha-synuclein binding Fasudil. In this study - they utilized the variational approach to Markov processes (VAMP) to build an MSM using a 1D order parameter as input (the radius of gyration), first discretizing the conformational space into 300 microstates before similarly building a 6 macrostate model. From examining the contact maps and secondary structure propensities of the holo MSMs from the current study and the previous study- some of the macrostates appear similar, however there appear to be orders of magnitude differences in the timescales of conformational transitions between the two models. The timescales of conformational transitions in the previous MSM are on the order of 10s of microseconds, while the timescales of transitions in this manuscript are 100s-1000s microseconds. In the previous manuscript, a 3 state MSM is built from an apo α-synuclein obtained from a continuous 73ms unbiased MD simulation of alpha-synuclein run at a different salt concentration (100mM) and an additional 33 ms of shorter simulations. The apo MSM from the previous study similarly reports very fast timescales of transitions between apo states (on the order ~1ms) - while the MSM reported in the current study (Figure 9) are on the order of 10s-100s of microseconds).

      These discrepancies raise further concerns that the properties of the MSMs built on these systems are extremely sensitive to the chosen projection methods and MSM modeling choices and hyperparameters, and that neither model may be an accurate description of the true underlying dynamics

      Suggestions to improve the study: The authors should discuss the discrepancies with the MSMs reported in their previous studies.

    1. eLife assessment

      This valuable study establishes a method for live-cell imaging, tracking, and quantification of Alu elements marking euchromatic regions of the nucleus. The method will help characterize the relationship between chromatin dynamics and transcriptional activity. While the findings are largely consistent with previous reports, characterization of the technique is incomplete and could benefit from additional controls.

    2. Reviewer #1 (Public Review):

      The manuscript from Chang et al. presents a new technique to track chromatin locus mobility in live cells, by specifically tracking Alu rich sequences using a CRISPR based technique. The experiments in Fig. 1-2 provide extensive validation of the reagent, and the experiments in Figs. 3-4 yield new insights into chromatin dynamics and its relationship to transcription. While the findings in this manuscript are interesting, some points need to be addressed to support the central claims.

      One item of consideration is the use of bulk PIV methods to monitor chromatin mobility. While these whole genome methods certainly are useful for studying chromatin mobility at a diffraction limited (or higher scale) as well as tracking correlations at the micron scale, these methods obscure dynamics at the TAD/nucleosomal level (~200 nm). Since the studies use fluorescently labeled H2B to study chromatin dynamics, some consideration should be given to using Halo-tagged variants of H2B to get a single molecule view within specific chromatin contexts. A few recent studies (Saxton et al. 2023, Daugird et al. 2023) have used these methods to show how histone dynamics at the single molecule level depends on the chromatin context.

      Secondly, there should be additional discussion of how the mean-squared network displacement relates to single locus and histone mobility at the sub-diffraction level. While it is reassuring to see that MSND and single particle tracking MSD exponents roughly agree at the sub-second time scale, how these relate at longer time scales is not clear. Figure S5A shows MSD for individual loci, but only timelags upto 1s are shown. It should be possible to track loci considerably longer than that. MSD exponents in the literature are quite varied beyond the second time-scale, and the authors have an excellent system to shed light on this question.

      Finally, some additional discussion about why the transcriptional inhibition results shown here differ from other studies in the literature (e.g. Daugird et al. 2023) would better place these findings in context.

    3. Reviewer #2 (Public Review):

      Summary:

      Chromatin organization and dynamics are critical for eukaryotic genome functions, but how are they related to each other? To address this question, Chang et al. developed a euchromatic labeling method using CRISPR/dCAS9 targeting Alu elements. These elements are highly enriched in the A compartment, which is closely associated with transcriptionally active and gene-rich regions. Labeling Alu elements allowed live-cell imaging of the gene-rich A compartment (euchromatin). Using the developed system, Chang et al. found while Alu-rich chromatin is depleted in regions with high chromatin density (putative heterochromatin), Alu density and chromatin density are not correlated in the euchromatin. Combining the live-cell imaging of Alu elements with bulk chromatin labeling (fluorescent histone H2B), the authors showed that transcriptionally active chromatin (A compartment) has an increased mobility. Transcription inhibitors flavopiridol and 𝛼-amanitin treatments increased the mobility of Alu-rich chromatin, and ActD had the opposite effect on chromatin mobility.

      Strengths:

      Alu labeling is a valuable euchromatin labeling method, and measuring its mobility would contribute to a comprehensive understanding of the relationship between chromatin dynamics and transcriptional activity.

      Weaknesses:

      Some of the findings are consistent with the previous reports and not new. There are some issues to be addressed. My specific comments are the following:

      Line 58. "these methods generally lack information regarding the local chromatin environment (e.g., epigenetic state) and genomic context (e.g., A/B compartments and TADs)." This description is not accurate because Nozaki et al. (2023) performed euchromatin-specific nucleosome labeling/imaging (Hi-C contact domains with active histone marks, A-compartment). More recently, Semeigazin et al. (2024)(https://www.researchsquare.com/article/rs-3953132/v1) also did euchromatic-specific nucleosome labeling/imaging in living cells.

      Line 154. "we defined the euchromatin regions in our images by excluding heterochromatin (top 5% pixel intensity) and nucleolar areas."<br /> I am not so sure that this definition is reasonable. How were the top 5% H2B intensity regions distributed? Did they include the nuclear periphery region, which is also heterochromatin-rich? Could the authors show the ΔPCC between whole H2B (including both euchromatin and heterochromatin) and dCas9-sgAlu?

      Line 214. "our data suggests that Alu-rich (gene-rich) regions have increased chromatin mobility compared to Alu-poor (gene-poor) regions." A similar finding on nucleosome motion has already been published by Nozaki et al. 2023 and Semeigazin et al. 2024 (described above).

      Line 282. A recent important paper on the relationship between histone acetylation, transcription initiation, and nucleosome mobility (PMID: 37792937) is missing and should be discussed.

      Line 303. "Alu-rich chromatin may be more sensitive upon flavopiridol and 𝛼-amanitin treatments compared to Alu-poor chromatin (Figure 5)." Nagashima et al. (2019) also revealed that 𝛼-amanitin treatment did not increase the chromatin dynamics in heterochromatin-rich nuclear periphery regions.

    4. Reviewer #3 (Public Review):

      The manuscript by Chang, Quinodoz and Brangwynne describes the results of live cell imaging of fluorescently labeled Alu element genomic sites in combination with H2B-GFP marked chromatin in human cancer cells. The study includes dCas9 based genomic engineering for Suntag enhanced Alu element labeling. The motion of Alu elements and chromatin was analyzed in real time at 500 ms intervals over 1 min at high resolution. Advanced image analysis algorithms were developed.

      The main objective of the study is to understand how motion of euchromatin or active chromatin relates to chromatin density. Alu elements, which are spread throughout the genome are used as a proxy for euchromatin or also A compartments. The study finds that Alu-rich chromatin is more mobile than Alu poor one and that actinomycin but not flavopyridol or alpha amanitin cause some decrease in the determined mobility. The authors emphasize the heterogeneity of motion, Alu clustering and chromatin density underscoring the complexity of the problem.

      Although the topic is important and the imaging well performed, the study lacks depth and does not provide any truly new insights into our understanding of the link between genome activity and mobility nor diffusive behavior of the chromatin fiber in situ. Although the approach to record context dependent dynamics based on segmentation of pixels of varying intensity is elegant, the analysis of the trajectories requires further explanation and justification to be able to interpret the results. Important information on the biology of the engineered cell lines is lacking. Presented results are not discussed with respect to existing literature and knowledge.

      Major concerns:<br /> - Are Alu elements a good proxy for A compartments? What consequences do massive dCas9 tags have on the genome and the engineered cells? How does the bulky dCas9-Suntag label impact mobility and transcription of Alu elements themselves? How many off target sites are potentially labeled?

      (1) The authors should state the size of the dCas9-Suntag construct and perform FRAP analysis to compare the tag's behavior to the one of H2B-GFP<br /> (2) dCas9 locally unwinds DNA and is strongly bound to its target sequence impeding polymerase progression.<br /> (3) The authors need to check if DNA breaks are induced. An immunofluorescence using a gH2AX antibody is a minimum in all conditions tested. DNA breaks largely affect chromatin mobility which is a topic of major debate (see PMC5769766, PMID33061931).<br /> (4) The authors need to confirm that in dCas/sgAlu cells Alu elements are still transcribed similarly to wt cells (transcriptome or at least some qPCR).<br /> (5) Please compare H2B-GFP mobility of sgAlu tagged and untagged cells.<br /> (6) Figure 1D shows significant background in the Cut&run sgAlu line compared to H3K4me3 line. Are these off target sites? Was the H3K4me3 Cut&run performed in the engineered cell line? Did the authors test another guide RNA? Non-specific binding could also contribute to the observed heterogeneity in the measured dynamics.<br /> (7) Figure 3G shows that H2B MSND at tau=5s is high for high H2B density independently of Alu density questioning the value of using Alu sg tagging as a proxy for euchromatin.

      - What are the physical principles of the measured motion? What is the rationale for the MSND analyses deployed in this study?<br /> (1) Please provide the equation used for MSND (seems to be different from the standard MSD one).<br /> (2) Figure 3: all MSD curves have a slope suggesting an alpha exponent significantly smaller than 0.5 reminiscent of subdiffusion (example panels A and E compare thick line to slope of the triangle bottom right). Please explain. Is it gaussian noise? Confinement? This was seen before for faster acquisition rates, but still requires explanation and interpretation.<br /> (3) What is the rationale for choosing the value at τ =5 s? Figure 3 panel E shows large variations in the MSND at all time points, curves do not start at the same lag time.<br /> (4) Figure S5 shows that for Alu elements, alpha is close to 0.5 at τ =<1 s but lower for larger tau, the relationship to intensity is inverse as well. Please explain.<br /> (5) It would be important to show the D values of your estimations. Plots for MSD curves in non log scale are important to be presented to show if there are different diffusion regimes (such as in Figure 4).<br /> (6) It is mentioned that the "Our measurements of total chromatin dynamics at lag time τ = 5 s are typically on the order of 10-2 μm2 (Figure 3 A, B), in agreement with past studies (Shaban et al., 2020; Zidovska et al., 2013)". This is inaccurate as both cited studies were performed at different time lags 0.2 sec. Change in time lag is supposed to show different diffusion behaviour. For consistency, the comparison should be done at the same time lag and the same number of analyzed video frames.<br /> (7) The study applies the MSND analysis for different time lags starting from 0.5 s to 11 s for videos of 60 s. Change in the number of data points affects the accuracy to calculate the diffusion coefficient. What is the impact of this uncertainty on the results and conclusions?

      - Inhibition of polymerase 2 activity increases mobility as was shown before.<br /> (1) Figure 4: change in motion following alpha amanitin and Flavopiridol treatments recapitulate results from the Maeshima group (Nagashima 2019). Data shown for actinomycin treated cells appear extreme. A huge drop in H2B MSND (panel B and D). Please ensure that the cells are still alive after 4-6h exposure to ActD. ActD also affects cytoskeleton and replication, so different conclusion may be drawn if cells are still alive.<br /> (2) Treatment effects could also be enhanced should dCas9/ sgAlu induce massive DNA damage (see above). Check H2B-GFP motion in cells (both treated and not) not labeled with sgAlu.

      - Positioning with respect to the literature:<br /> (1) The introduction, first paragraph is oversimplified, please review the literature citing work performed by many groups in the field using H2B-GFP, telomere or single site labeling in the past 10 years. Give details on the cell type used (mouse or human normal or cancer cells, amplified signals or single genes, same cell or cells at different stages of development, methodologies from whole genome to single particle tracking etc.).<br /> (2) The manuscript claims to introduce a novel mapping of the spatiotemporal dynamics of the A compartment in living cells. However, the authors did not discuss other previous approaches that were developed for the same purpose. The dynamic motion of active transcription chromatin domains/A compartment over the whole nucleus was investigated in different studies that used Mintbody labeling, please check PMCID: PMC7926250, PMCID: PMC8647360, PMID: 27534817, PMCID: PMC8491620<br /> (3) PIV applies a relatively large interrogation window size of micrometers to estimate the displacement vectors. Dynamic changes within the set window can include both A and B compartments, where the contribution of genomic processes to local chromatin motion, typically taking place at the nanometer scale, is missed. The Hi-D method ( PMCID: PMC7168861) introduced an Optical Flow approach to overcome this limitation of PIV (PMCID: PMC6061878 ). Could the authors test if Hi-D method to analyze the movies recorded in this study confirms their conclusions?

      Heterogeneity of chromatin dynamics independent of chromatin density was shown by previous studies such as PMCID: PMC7775763 , and PMCID: PMC7168861 . Could the authors discuss their findings in the context of these studies?

    5. Author response:

      We thank the reviewers for their positive feedback and helpful suggestions for improving our manuscript.

      We appreciate the reviewers highlighting areas where we can improve clarity, particularly in the analysis methodologies and details. We agree that additional control experiments and expansion on single-molecule tracking analysis will provide additional support for our interpretations. 

      We acknowledge the reviewers' suggestion to describe our work's relationship to other studies. While some of our findings are similar to those in past studies, our work introduces a new approach for labeling euchromatin with direct sequence specificity on a genome-wide scale, enabling a deeper understanding of euchromatin organization and dynamics. We will provide more context on the novelty of our work and incorporate a more comprehensive discussion of our work’s relation to other studies in the manuscript.

    1. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use the model organism Drosophila to explore the sex and age impacts of a TBI method. They find age and sex differences: older age is susceptible to mild TBI and females are also more susceptible. In particular, they pursue a finding that virgin vs mated females show different responses: virgins are protected but mated females succumb to TBI with climbing deficits. In fact, virgin females compared to mated females are largely protected. They discover that this is associated with exposure of the females to Sex Peptides in the reproductive neurons of the female reproductive tract. When they extend to RNAseq of brains, they show that there are very few genes in common between males, mated females, virgins and females mated with males lacking Sex Peptide. The few chronic genes associated with mated females seem associated with the immune system. These findings suggest that mated females have a compromised immune system, which might make them more vulnerable.

      Strengths:

      This is an interesting paper that allows a detailed comparison of sex and age in TBI which is largely only possible in such a simple model, where large numbers and many variations can be addressed. Overall the findings are interesting.

      Weaknesses:

      Although the findings beyond Sex Peptide are observational, the work sets the stage for more detailed studies to pursue the role of the genes they find by RNAseq and whether for example, boosting the innate immune system would protect the mated females, among other experiments.

    2. eLife assessment

      In the current study, the authors describe how sex and age affect the consequence of traumatic brain injury in Drosophila. They find that females are more sensitive than males, and mated females are sensitive whereas virgin females are not. This fundamental work substantially advances our understanding of how sex-dependent response to traumatic brain injury occurs, by identifying the Sex Peptide and the immune system as modulators of sex differences. The authors provide a compelling set of results, showing that female Sex Peptide signaling in Drosophila adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use the Drosophila model system to study the impact of mild head trauma on sex-dependent brain deficits. They identify Sex Peptide as a modulator of greater negative outcome in female flies. Additionally, they observe that increased age at the time of injury results in worse outcomes, especially in females, and that this is due to chronic suppression of innate immune defense networks in mated females. The results demonstrate a novel signaling pathway that promotes age- and sex-dependent outcomes after head injury.

      Strengths:

      The authors have modified their previously reported TBI model in flies to mimic mild TBI, which is novel. Methods are explained in detail, allowing for reproducibility. Experiments are rigorous with appropriate statistics. A number of important controls are included. The work tells a complete mechanistic story and adds important data to increase our understanding of sex-dependent differences in recovery after TBI. The discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      A very minor weakness is that exact n values should be included in the figure legends. There should also be confirmation of knockdown by RNAi in female flies either by immunohistochemistry or qRT-PCR if possible.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors used a Drosophila model to show that exposure to repetitive mild TBI causes neurodegenerative conditions that emerge late in life and disproportionately affect females. In addition to well-known age-dependent impact, the authors identified Sex Peptide (SP) signaling as a key factor in female susceptibility to post-injury brain deficits.

      Strengths:

      The authors have presented a compelling set of results showing that female Sex Peptide signaling adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury in Drosophila. They have (1) compared the phenotypes of adult male and female flies sustaining TBI at different ages, and the phenotypes of virgin females and mated females, (2) compared the phenotypes of eliminating SP signaling in mating females and introducing SP-signaling into virgin females, (3) compared transcriptomic changes of different groups in response to TBI. The results are generally consistent and robust.

      Weaknesses:

      The authors have made their claims largely based on assaying climbing index and vacuole formation as the only indicators of late-life neurodegeneration after TBI. However, these phenotypes are not really specific to TBI-related neurodegeneration, and the significance and mechanisms of especially vacuole formation are not clear. The authors should perform additional analyses on TBI-related neurodegeneration in flies, which have been shown before (Genetics. 2015 Oct; 201(2): 377-402). Furthermore, it is also really surprising to see so few DEGs even in wild-type males and mated females, and to see that none of the DEGs overlapped among groups or are even related to the SP-signaling. This raises questions about the validity of the RNA-seq analysis. It is critical to independently verify their RNA-sequencing results and to add some more molecular evidence to support their conclusion. Finally, it is unknown what the implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans, and what are the mechanisms underlying the sexual dimorphism.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use the model organism Drosophila to explore the sex and age impacts of a TBI method. They find age and sex differences: older age is susceptible to mild TBI and females are also more susceptible. In particular, they pursue a finding that virgin vs mated females show different responses: virgins are protected but mated females succumb to TBI with climbing deficits. In fact, virgin females compared to mated females are largely protected. They discover that this is associated with exposure of the females to Sex Peptides in the reproductive neurons of the female reproductive tract. When they extend to RNAseq of brains, they show that there are very few genes in common between males, mated females, virgins and females mated with males lacking Sex Peptide. The few chronic genes associated with mated females seem associated with the immune system. These findings suggest that mated females have a compromised immune system, which might make them more vulnerable.

      Strengths:

      This is an interesting paper that allows a detailed comparison of sex and age in TBI which is largely only possible in such a simple model, where large numbers and many variations can be addressed. Overall the findings are interesting.

      Weaknesses:

      Although the findings beyond Sex Peptide are observational, the work sets the stage for more detailed studies to pursue the role of the genes they find by RNAseq and whether for example, boosting the innate immune system would protect the mated females, among other experiments.

      We thank the reviewer for their time and effort in evaluating our manuscript. We agree that future studies are needed to further determine the role of the genes that we have identified through RNA sequencing in the late life emergence of neurodegenerative conditions after the exposure to mild head trauma. We would like to investigate whether elevating mated female immunity can mitigate the risk for age-dependent neurodegeneration after mild head trauma.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use the Drosophila model system to study the impact of mild head trauma on sex-dependent brain deficits. They identify Sex Peptide as a modulator of greater negative outcome in female flies. Additionally, they observe that increased age at the time of injury results in worse outcomes, especially in females, and that this is due to chronic suppression of innate immune defense networks in mated females. The results demonstrate a novel signaling pathway that promotes age- and sex-dependent outcomes after head injury.

      Strengths:

      The authors have modified their previously reported TBI model in flies to mimic mild TBI, which is novel. Methods are explained in detail, allowing for reproducibility. Experiments are rigorous with appropriate statistics. A number of important controls are included. The work tells a complete mechanistic story and adds important data to increase our understanding of sex-dependent differences in recovery after TBI. The discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      A very minor weakness is that exact n values should be included in the figure legends. There should also be confirmation of knockdown by RNAi in female flies either by immunohistochemistry or qRT-PCR if possible.

      We thank the reviewer for the evaluation of our manuscript and for the suggestion to include the exact n values in the figure legends. We will include the n values in our revision.

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings.  It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We will revise the manuscript to make these points clear. 

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors used a Drosophila model to show that exposure to repetitive mild TBI causes neurodegenerative conditions that emerge late in life and disproportionately affect females. In addition to well-known age-dependent impact, the authors identified Sex Peptide (SP) signaling as a key factor in female susceptibility to post-injury brain deficits.

      Strengths:

      The authors have presented a compelling set of results showing that female Sex Peptide signaling adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury in Drosophila. They have (1) compared the phenotypes of adult male and female flies sustaining TBI at different ages, and the phenotypes of virgin females and mated females, (2) compared the phenotypes of eliminating SP signaling in mating females and introducing SP-signaling into virgin females, (3) compared transcriptomic changes of different groups in response to TBI. The results are generally consistent and robust.

      Weaknesses:

      The authors have made their claims largely based on assaying climbing index and vacuole formation as the only indicators of late-life neurodegeneration after TBI. However, these phenotypes are not really specific to TBI-related neurodegeneration, and the significance and mechanisms of especially vacuole formation are not clear. The authors should perform additional analyses on TBI-related neurodegeneration in flies, which have been shown before (Genetics. 2015 Oct; 201(2): 377-402). Furthermore, it is also really surprising to see so few DEGs even in wild-type males and mated females, and to see that none of the DEGs overlapped among groups or are even related to the SP-signaling. This raises questions about the validity of the RNA-seq analysis. It is critical to independently verify their RNA-sequencing results and to add some more molecular evidence to support their conclusion. Finally, it is unknown what the implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans, and what are the mechanisms underlying the sexual dimorphism.

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains.

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and essentially no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We will incorporate these important points in the revision. 

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the late-life onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR.

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SP-like molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript, but will revise it to further clarify the discussion.

    1. eLife assessment

      The authors show in vitro that TAK1 overexpression reduces tumor cell migration and invasion, while TAK1 knockdown promotes a mesenchymal phenotype and enhances migration and invasion. The work is a valuable addition to the field of tumor biology of esophageal squamous cell carcinoma. Although minor limitations exist, the overall evidence is solid. The data aligns with previous findings by the same researchers and others.

    2. Reviewer #1 (Public Review):

      Summary:

      In previously published work, the authors found that Transforming Growth Factor β Activated Kinase 1 (TAK1) may regulate esophageal squamous cell carcinoma (ESCC) tumor cell proliferation via the RAS/MEK/ERK axis. They explore the mechanisms for TAK1 as a possible tumor suppressor, demonstrating phospholipase C epsilon 1 as an effector of tumor cell migration, invasion and metastatic potential.

      Strengths:

      The authors show in vitro that TAK1 overexpression reduces tumor cell migration and invasion while TAK1 knockdown promotes a mesenchymal phenotype (epithelial-mesenchymal transition) and enhances migration and invasion. To explore possible mechanisms of action, the authors focused on phospholipase C epsilon 1 (PLCE1) as a potential effector, having identified this protein in co-immunoprecipitation experiments. Further, they demonstrate that TAK1-mediated phosphorylation of PLCE1 is inhibitory. Each of the observations is supported by different experimental strategies, e.g. use of different approaches for knockdown (pharmacologic, RNA inhibition, CRISPR/Cas). Xenograft experiments showed that suppression/loss of TAK1 is associated with more frequent metastases and conversely that PLCE1 is associated positively with xenograft metastases. A considerable amount of experimental data is presented for review, including supplemental data, that show that TAK1 regulation may be important in ESCC development.

      Weaknesses:

      As noted by the authors, immunoprecipitation (IP) experiments identified a number (24) of proteins as potential targets for the TAK1 ser/thr kinase. Prior work (cited as Shi et al, 2021) focused on a different phosphorylation target for TAK1, Ras association domain family 9 (RASSF9), but a more comprehensive discussion of the co-IP experiments would help place this work in a better context.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Ju Q et al performed both in vitro and in vivo experiments to test the effect of TAK1 on cancer metastasis. They demonstrated that TAK1 is capable of directly phosphorylating PLCE1 and this modification represses its enzyme activity, leading to suppression of PIP2 hydrolysis and subsequently signal transduction in the PKC/GSK-3β/β-Catenin axis.

      Strengths:

      The quality of data is good, and the presentation is well organized in a logical way.

      Weaknesses:

      The study missed some key link in connecting the effect of TAK1 on cancer metastasis via phosphorylating PLCE1.

    4. Reviewer #3 (Public Review):

      Summary:

      The research by Qianqian Ju et al. found that the knockdown of TAK1 promoted ESCC migration and invasion, whereas overexpression of TAK1 resulted in the opposite outcome. These in vitro findings could be recapitulated in a xenograft metastasis mouse model.

      Mechanistically, TAK1 phosphorylates PLCE1 S1060 in the cells, decreasing PLCE1 enzyme activity and repressing PIP2 hydrolysis. As a result, reducing DAG and inositol IP3, thereby suppressing signal transduction of PKC/GSK 3β/β Catenin. Consequently, cancer metastasis-related genes were impeded by TAK1.

      Overall, this study offers some intriguing observations. Providing a potential druggable target for developing agents for dealing with ESCC.

      The strengths of this research are:

      (1) The research always uses different experimental approaches to address one question. The experiments are largely convincing and appear to be well executed.<br /> (2) The phenotypes were observed from different angles: at the mouse model, cellular level, and molecular level.<br /> (3) The molecular mechanism was down to a single amino acid modification on PLCE1.

      The weaknesses part of this research are:

      (1) Most of the phenotypes are only observed in the ECA-109 cell line. Whether TAK1-PLCE1 S1060 is a common pathway in other ESCC cells or just specific to the ECA-109 cell line is unclear. Using more cell lines to see whether this is a common mechanism of ESCC metastasis would greatly amplify the impact of this finding.<br /> (2) Most of the experiments were done in protein overexpression conditions, with the protein level increasing hundreds of folds in the cell, producing an artificial environment that would sometimes generate false positive results.<br /> (3) Whether TAK1 can directly phosphorylate PLCE1 S1060 needs more tests, especially the in vitro biochemical evidence.

    1. eLife assessment

      This manuscript describes an AI-automated microscopy-based approach to characterize both bacterial and host cell responses associated with Shigella infection of epithelial cells. The methodology is compelling and should be helpful for investigators studying a variety of intracellular pathogens. The authors have acquired valuable findings regarding host and bacterial responses in the context of infection, which should be followed up with further mechanistic-based studies.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

    3. Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria.

      The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on the technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and we fully agree that this reference should be more fully discussed in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and the excitement for our results, including our findings on T3SS activity and Shigella-septin interactions_._ In accordance with the Reviewer’s comments, we agree to carefully re-edit our manuscript to avoid overselling our data in a future version of the manuscript. We will also consider to rearrange figures depending on new results.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescent labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

    1. eLife assessment

      This valuable work characterized a new set of small molecules targeting the interaction between ELF3-MED23, with one of the reported compounds representing a promising novel therapeutic strategy, The evidence supporting the conclusions is solid, although including characterization with breast and lung cancer cell models would strengthen the study. This article will be of interest to medical and cell biologists working on cancer and, particularly, on HER2-overexpression cancers.

    2. Reviewer #1 (Public Review):

      Summary:

      Soo-Yeon Hwang et al. synthesized and characterized a new set of small molecules targeting the interaction between ELF3-MED23, the transcription factor, and a coactivator for HER2 transcription, respectively. The authors used a combination of biochemical analysis, cell-based assays, and an in vivo xenograft model to prove that the lead compound 10 inhibits the HER2 transcription and protein expression levels, subsequently inducing anticancer activity in the gastric cancer cell line, the xenograft model, particularly in the trastuzumab-resistant cell line. The experiential data is solid and supports the model for the anticancer potency of the compound for the HER2+ gastric cancer model. Although the compound showed promising data for its potential antitumor activity for HER2+ cancers, it is a little bit narrow to the HER2+ cancer field since the most relevant HER2+ cancer model is HER2+ breast cancer and the Herceptin-resistance, indeed the author also discussed this point in the manuscript. Therefore, additional data with the breast cancer HER2+ cell model will help to impact the work in the field.

      Strengths:

      The current manuscript proposed a potential alternative strategy targeting HER2 overexpression cancers by attenuating HER2 transcription levels. The study provides solid evidence that the lead compound 10 can interrupt the binding of ELF3 to MED23, leading to the inhibition of HER2 transcription. Remarkably, the following cell-based assays and xenograft model revealed the promising antitumor activity of the compound in the gastric cancer model.

      Weaknesses:

      While the novel compound showed a promising potency to the HER2-positive gastric cancer cells and xenograft model, it would be great to also to be evaluated with the HER2-positive breast cancer cell models. The author did not compare the current compounds with other therapeutic strategies targeting HER2 expression at the genetic level. It is unclear whether the EGFR inhibitors gefitinib and canertinib but not HER2-specific inhibitors (i.e. tucatinib) were used as a control in the manuscript.

    3. Reviewer #2 (Public Review):

      Summary:

      The findings highlight the importance of targeting the ELF3-MED23 protein-protein interaction (PPI) as a potential therapeutic strategy for HER2-overexpressing cancers, notably gastric cancers, as an alternative to trastuzumab. The evidence, including the strong potency of compound 10 in inhibiting ELF3-MED23 PPI, its capacity to lower HER2 levels, induce apoptosis, and impede proliferation both in laboratory settings and animal models, indicates that compound 10 holds promise as a novel therapeutic option, even for cases resistant to trastuzumab treatment.

      Strengths:

      The experiments conducted are robust and diverse enough to address the hypothesis posed.

      Weaknesses:

      The rationale behind the proposed structural modifications for the three groups of compounds is not clear.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors synthesized a compound which can inhibit ELF3 and MED23 interaction which leads to inhibition of HER2 expression in gastric cancer.

      Strengths:

      Enough evidence shows the potency of compound 10 in inhibiting ELF3 and MED23 interaction.

      Weaknesses:

      Compound 10 potency as PPI inhibitor has been shown in only one cell line NCI-N87.

    1. eLife assessment

      This manuscript describes a method for genetic manipulation of Leishmania species which should be sufficiently efficient to enable genome-wide genetic screens. The authors improved numerous aspects of their previously described method, which is based on sequence-specific genome editing to introduce premature stop codons using a CAS9-cytidine deaminase variant. The work is thoroughly described, with convincing data, and will be very important for Leishmania researchers, as well as perhaps suggesting the use of similar approaches in other organisms in which genetic manipulation is challenging.