6,659 Matching Annotations
  1. Feb 2024
    1. Author Response

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      Authors Response- We ae grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B). We will characterize other markers of senescence including senescence-associated β galactosidase in the revised manuscript.

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We only observed the death of cells at higher concentrations of 15d-PGJ2 (5 µM and 10 µM) (Fig. S2A), but not significantly at the 4 µM concentration used in Figure 2. This is the reason 4uM was used, and we should have clarified this. We will include viability data for the low concentration of 15d-PGJ2 (4 µM) in the revised manuscript.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ¬2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation. The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported.

      Literature does support this statement, and it is important to clarify this mis-impression for the field as whole

      Let us clarify-

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively.

      Interaction between HRas and 15d-PGJ2 in skeletal muscles has not been shown before, even though both HRas and 15d-PGJ2 are shown to be key regulators of muscle homeostasis.

      Activation of HRas by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of HRas signaling.

      Recently, our lab contributed to a study where the functional implication of activation of HRas signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021).

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage based senescence that is used in this manuscript. It is not a models for replicative senescence since it is immortalized. In this study we show that C2C12 cells undergo DNA damage mediated senescence after treatment with Doxo. We also observe similar phenotype in MCF7 breast cancer cells and IMR90 lung fibroblasts after treatment with Doxo (Data will be updated in the supplementary figure 1). Also, several reports in the literature have shown induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide mediated DNA damage. Moustogiannis et al 2021 show induction of replicative senescence in C2C12 cells.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogene-induced senescence, would strengthen the study.

      Fig. 1E shows time dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows increase in the levels of 15d-PGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate info will be added to the materials and methods section in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript offers a commendable exploration into the relationship between plasma omega-6/omega-3 fatty acid ratios and mortality outcomes.

      Strengths:

      The chosen study design and analytical techniques align well with the research objectives, and the results resonate with existing literature.

      Weaknesses:

      Lack of information on the selection criteria for participants; 5. The analysis of individual PUFAs is not appropriate; The definition of comorbidities is vague; The rationale of conducting the mediation analysis of blood biomarkers is not given.

      Thank you for your insightful feedback and for acknowledging the strengths of our manuscript, particularly regarding the alignment of our study design and analytical methods with our research objectives. Your recognition of how our results resonate with existing literature is greatly appreciated.

      Addressing the concerns you've raised:

      Selection Criteria for Participants: In the “Methods-Study population” section, we have outlined the exclusion criteria for participant selection. This information provides comprehensive insight into our methodology for selecting the study cohort.

      Analysis of Individual PUFAs: We acknowledge your concern regarding the analysis of individual PUFAs due to their inter-correlations in plasma levels. However, the correlations between omega-3% and omega-6% (r = -0.12) and between DHA% and LA% (r = 0.03) are actually low. Because DHA is one of omega-3 PUFAs, we did not include PUFAs in the same model. Similar considerations apply to LA and omega-6. We believe that exploring the effects of individual fatty acids adds valuable depth to our research. Both DHA and LA have been included in the same model due to their low correlation, with careful adjustments for confounding factors to provide a nuanced understanding of their individual impacts on mortality.

      Definition of Comorbidities: The definition of comorbidities, including hypertension, diabetes, and longstanding illness, is elaborated under the Methods section. These conditions were identified through self-reported data collected via the Assessment Centre Environment (ACE) touchscreen questionnaire, allowing us to capture a broad range of chronic conditions as reported by participants.

      Rationale for Mediation Analysis: Initially, our approach to mediation analysis included various blood biomarkers available in the UK Biobank database to explore the potential underlying pathways. However, upon considering your feedback regarding the overlap of fatty acids with lipid classes or lipid particles in plasma, we have decided to remove these elements from our mediation analysis.

      Reviewer #2 (Public Review):

      Summary:

      This study utilized a large sample from the UK Biobank which enhanced statistical robustness, employed a prospective design to establish clear temporal relationships, used objective biomarkers for assessing plasma omega-6/omega-3 ratio, and investigated various mortality causes including CVD and cancer for a holistic health understanding.

      Strengths:

      The authors used a large sample size, employed a prospective design, and investigated various mortality.

      Weaknesses:

      Analyzing n-3 and n-6 PUFAs separately might be less instructive. It might not be methodologically sound to treat TG, HDL, LDL, and apolipoproteins as mediators. It's imperative to exercise caution when drawing causal conclusions from the observed correlations. The manuscript might propose potential research trajectories.

      We are grateful for your thoughtful analysis of our study's strengths and for your constructive feedback on areas for improvement.

      Response to Weaknesses:

      Analyzing n-3 and n-6 PUFAs Separately: We recognize the challenge in analyzing n-3 and n-6 PUFAs separately due to their correlations. However, the correlation between n-3% and n-6% in UK Biobank was actually relatively low (r = -0.12). We include them in one model to test if both are associated with the outcomes after controlling for the effects of the other. Indeed, both were negatively associated with the mortality outcomes in our analysis. We believe our supplemental analysis of n-3 and n-6 PUFAs provides useful information to the readers, in addition to our findings based on the n-6/n-3 ratio.

      Mediation Analysis of TG, HDL, LDL, and Apolipoproteins: We appreciate your insight on the methodological considerations of treating these biomarkers as mediators. After careful review and in line with suggestions from another reviewer, we have removed these elements from our mediation analysis. This revision improves the net scientific rigor of our work, ensuring that our conclusions are drawn from the most robust and methodologically sound of our analyses.

      Causal Conclusions from Correlations: We fully agree with the need for caution in interpreting correlations in observational studies. To this end, we have avoided implying causality in our manuscript. Terms suggesting causality, like "protective effects," have been replaced with "inverse associations" to more accurately represent our findings. This adjustment enhances the clarity and accuracy of our conclusions.

      Proposing Future Research Trajectories: Recognizing the importance of advancing causal and mechanistic understanding in this field, we have called for future studies to further examine causality and characterize molecular mechanisms of the observed associations in our study.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find out whether the levels of omega-6 and omega-3 fatty acids in the blood are linked to the likelihood of dying from anything, of dying from cancer and of dying from cardiovascular disease. They use a large dataset called UK Biobank where fatty acid levels were measured in blood at the start of the study and what happened to the participants over the following years (average of 12.7 years) was followed. They find that both omega-6 AND omega-3 fatty acids were linked with less likelihood of dying from anything, from cancer and from cardiovascular disease. The effects of omega-3s were stronger. They then made a ratio of omega-6 to omega-3 fatty acids and found that as that ratio increased risk of dying also increased,. This supports the idea that omega-3s have stronger effects than omega-6s.

      Strengths:

      This is a large study (over 85,000 participants) with a good follow up period (average 12.7 years). Using blood levels of fatty acids is superior to using estimated dietary intakes. The authors take account of many variables that could interfere with the findings (confounding variables) - they do this using statistical methods.

      Weaknesses:

      There are several omega-6 and omega-3 fatty acids - it is not clear which ones were actually measured in this study

      Thank you for recognizing the strengths of our study, including the large sample size, the duration of follow-up, and our methodological approach to using blood levels of fatty acids and addressing potential confounders. Regarding the weakness you've highlighted, we understand the importance of specifying which omega-6 and omega-3 fatty acids were analyzed in our study. We have revised the Method section to provide detailed information about how the exposures were measured.

      Recommendations for the author:

      Reviewer #1 (Recommendations for the Authors):

      To elevate the manuscript's scholarly rigor, I propose the following refinements:

      (1) The manuscript lacks information on the selection criteria for participants and the representativeness of the UK Biobank cohort. It is important to provide details on how participants were selected and whether it is representative of the general population, which is crucial for assessing the generalizability of the findings.

      We appreciate the opportunity to clarify the participant selection criteria and the representativeness of the UK Biobank cohort within our manuscript. In the “Methods-Study population” section, we delineated the exclusion criteria: "Participants with cancer (n=37,736) or CVD (n=100,972), those who withdrew from the study (n=879), and those with incomplete data on the plasma omega-6/omega-3 ratio (n=277,372) were excluded from this study, leaving 85,425 participants, 6,461 died during follow-up, including 2,794 from cancer and 1,668 from CVD." To further address representativeness, we performed a sensitivity analysis, examining the baseline characteristics of participants included in our study relative to those omitted due to lack of exposure information. This analysis, presented in Additional file 2: Table S13, indicates comparable baseline characteristics across both participant groups, bolstering confidence in the representativeness of our study sample with the general UK Biobank participants.

      Regarding the UK Biobank's representativeness with the general population, we acknowledge that the cohort does not mirror the broader UK demographic in terms of socioeconomic and health profiles. Participants in the UK Biobank generally exhibit better health and higher socioeconomic status than the average UK resident, potentially influencing the disease prevalence and incidence rates. Nonetheless, the UK Biobank's extensive sample size and comprehensive exposure data enable the generation of valid estimates for exposure-disease associations. These estimates have been corroborated by findings from more demographically representative cohorts, as highlighted in the studies by Batty et al., and Fry et al..

      We recognize the importance of this aspect and will incorporate a discussion on the implications of these factors for the generalizability of our findings in the “Discussion-Limitations” section of our manuscript. We are grateful for this insightful comment and believe that this addition will enhance the manuscript's contribution to the field.

      Here is what we added in the “Discussion-Limitations” section of our manuscript: “Third, we acknowledged that the cohort did not mirror the broader UK demographic in terms of socioeconomic and health profiles. Participants in the UK Biobank generally exhibited better health and higher socioeconomic status than the average UK resident, potentially influencing the disease prevalence and incidence rates. Nonetheless, the UK Biobank's extensive sample size and comprehensive exposure data enable the generation of valid estimates for exposure-disease associations. These estimates have been corroborated by findings from more demographically representative cohorts47,48.”

      References:

      Batty, G. D., Gale, C. R., Kivimäki, M., et al. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ. 2020; 368: m131.

      Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017;186(9):1026–34.

      (2) The study sample included different ancestries which may introduce confounding from genetic background. As over 90% of the participants were of European ancestry, I recommend excluding individuals of non-European ancestry in the main analysis.

      Thank you for raising the concern regarding the inclusion of different ancestries in our study sample and the potential confounding. In our research, we have adhered to the widely accepted practice of including all participants in the study to ensure a comprehensive analysis. Recognizing the predominance of European ancestry within our cohort, which exceeds 90%, we have proactively incorporated ethnicity as a covariate in our statistical models to mitigate confounding influences.

      We also considered the feasibility of conducting a stratified analysis for non-European participants. However, the small sample sizes of non-European subgroups do not provide sufficient statistical power to yield reliable or meaningful separate analyses. Consequently, to maintain the integrity and robustness of our findings, we opted to include all participants in the main analysis, adjusting for ethnicity to account for potential confounders.

      (3) I noted that a large proportion of participants were excluded due to the lack of data on plasma PUFAs. Were the characteristics of these participants similar to the current analysis sample?

      Thank you for raising this very important point. According to UK Biobank, “The EDTA plasma samples were picked randomly and are therefore representative of the 502,543 participants in the full cohort.” (As detailed in Julkunen et al.) Moreover, as noted in our reply to comment #1 above, we performed a sensitivity analysis, examining the baseline characteristics of participants included in our study relative to those omitted due to lack of exposure information.

      The results of this analysis are detailed in Additional file 2: Table S13. They demonstrate that the baseline characteristics—such as age, gender, ethnicity, socioeconomic status, and lifestyle habits—are indeed similar between the two groups. This similarity supports the representativeness of our analysis sample and suggests that the exclusion of participants without plasma PUFA data does not introduce a bias that would undermine the validity of our study's findings.

      References:

      Julkunen H, Cichońska A, Tiainen M, et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat Commun. 2023 Feb 3;14(1):604. doi: 10.1038/s41467-023-36231-7.

      (4) The methods section should include a detailed description of the measurement of plasma omega-6/omega-3 fatty acid ratio. It is important to provide information on the analytical techniques used and any quality control measures implemented to ensure the accuracy and reliability of the measurements. Importantly, were repeated measurements done?

      Thank you for raising this important point. The details of the metabolomic profiling have been described in previous UK Biobank publications. In this revision, we added a brief description of the measurement process and provided references to previous publications.

      Here is what we added in the “Methods- Ascertainment of exposure” section of our manuscript: “Metabolomic profiling of plasma samples was performed with high-throughput nuclear magnetic resonance (NMR) spectroscopy. At the time of this analysis (15 Mar 2023), UK Biobank released the Phase 1 metabolomic dataset, which covered a random selection of 118,461 plasma samples from the baseline recruitment. These samples were collected between 2007 and 2010 and had been stored in −80 °C freezers, while the NMR measurements took place between 2019 and 2020. Detailed descriptions could be found in previous publications about plasma sample preparation, NMR spectroscopy setup, quality control protocols, correction for sample dilution, verification with duplicate samples and internal controls, and comparisons with independent measurements from clinical chemistry assays20-22.”

      (5) The analysis of individual PUFAs is not appropriate because plasma levels of these PUFAs, including n-3 PUFAs and n-6 PUFAs, EPA, DHA and AA, are usually correlated. It is hard to differentiate these correlated FAs in Cox model. Whereas the ratio of n-6/n-3 is indeed more comprehensive, and the current analysis demonstrated this ratio as a good marker of mortality. Therefore, the analyses of individual PUFAs can be removed and only focus on the ratio of n-6/n-3.

      We resonate with the Reviewer regarding the importance of focusing on the ratio of n-6/n-3. Indeed, the ratio is our focus in this manuscript. We also acknowledge the Reviewer's concern regarding the inclusion of correlated covariates in one statistical model. In that specific analysis, the correlations between omega-3% and omega-6% (r = -0.12) and between DHA% and LA% (r = 0.03) are relatively low. Additionally, we also checked the model for multicollinearity and found that the variance inflation factors (VIFs) were within acceptable ranges. In the fully adjusted model that included omega-3% and omega-6%, all variables had VIFs below 1.13, with omega-3% at a VIF of 1.06 and omega-6% at a VIF of 1.12. Similarly, in the model including DHA% and LA%, all variables also exhibited VIFs under 1.13, with DHA% recording a VIF of 1.07 and LA% a VIF of 1.10. Because DHA is one of omega-3 PUFAs, we did not include them in the same model. We did not include LA and omega-6 in the same model, either. Because the ratio has two components and each component is the sum of multiple individual PUFAs, it is natural to ask which component is more important (e.g., omega-6 or omega-3?), which specific fatty acid is driving the effect of omega-3 PUFAs (e.g., ALA? Or the marine omega-3, EPA and DHA?). We received such feedback frequently when we presented our research previously. Therefore, as an effort to address them, we performed analysis of omega-3, omega-6, DHA, and LA. While we understand the complexities involved in differentiating the effects of individual fatty acids in a Cox model, we believe there is intrinsic value in exploring these relationships further. In our analysis, we have attempted to investigate the effects of individual PUFAs on mortality by including both DHA and LA within the same model due to their low correlation, making adjustments to account for confounding factors (As detailed in Additional file 2: Table S9). Our findings indicate significant inverse associations between both DHA and LA with all-cause, cancer, and cardiovascular disease (CVD) mortality. We agree with the Reviewer that the focus of our manuscript should be the ratio, but also hope the Reviewer will agree with us that keeping the results from individual PUFAs will provide additional useful information to the readers.

      (6) The definition of comorbidities (including hypertension, diabetes, and longstanding illness) is vague. Please clarify what diseases longstanding illness includes.

      We appreciate the request for clarification regarding the definition of comorbidities in our study, including the categorization of longstanding illness. The information regarding longstanding illnesses was obtained via the Assessment Centre Environment (ACE) touchscreen questionnaire. Participants were asked, "Do you have any long-standing illness, disability, or infirmity?" with the response options being “Yes,” “No,” “Do not know,” and “Prefer not to answer.” For the purposes of our analysis, participants who selected “Yes” were categorized as having a longstanding illness, while the remaining options were grouped as not having a longstanding illness.

      This method of classification aligns with our detailed explanation in the “Methods-Ascertainment of covariates” section of the manuscript, where we state that “Comorbidities, including hypertension, diabetes, and longstanding illness, were self-reported at baseline. Longstanding illness refers to any long-standing illness, disability, or infirmity, without other specific information.” It is important to note that this approach is consistent with established precedents in the field. Specifically, the paper by Li et al. in the BMJ utilized a similar definition for comorbidities, reinforcing the validity of our methodology.

      References:

      Li ZH, Zhong WF, Liu S, et al. Associations of habitual fish oil supplementation with cardiovascular outcomes and all cause mortality: evidence from a large population based cohort study. BMJ. 2020 Mar 4;368:m456.

      (7) The rationale of conducting the mediation analysis of blood biomarkers is not given. Since fatty acids can be formed as TG or bound with apolipoproteins in plasma, there is a large overlap of FAs with these biomarkers and thus it is not appropriate to analyze TG, HDL, LDL, and apolipoproteins as mediators.

      We are grateful for the insightful feedback regarding the mediation analysis of blood biomarkers. Our mediation analysis aimed to explore the possible biomarkers and biological processes that explain the effects of PUFAs on mortality. Upon reflection, we recognize the complexities introduced by the inherent overlap of fatty acids with different lipid particles and lipid classes in plasma. Considering the potential confounding this overlap presents, and in agreement with your recommendation, we have decided to remove the mediation analyses involving cholesterol, TG, HDL-C, LDL-C, Lp(a), ApoA, and ApoB from our study. We appreciate your guidance on this matter and have updated our manuscript accordingly to reflect these changes.

      Reviewer #2 (Recommendations for the Authors):

      (1) Analyzing n-3 and n-6 PUFAs separately might be less instructive given the inherent correlations among plasma levels of n-3 PUFAs and n-6 PUFAs. Also, some important specific PUFAs, such as ALA, AA, EPA, etc. were not available in the UK Biobank data though the authors tried to analyze LA and DHA. The n-6/n-3 ratio, as evidenced by the current analysis, offers a more holistic perspective and might be a superior mortality marker. Thus, I recommend shifting the focus solely to this ratio.

      Thank you for the thoughtful comment. Reviewer #1 raised a similar point (comment #5 above). We are glad that both reviewers recognized the importance of the omega-6/omega-3 ratio and agreed with us that the ratio should be the focus of the paper. Please also see our more detailed response above. Briefly, our manuscript centered on the ratio, while the supplemental analysis of omega-3%, omega-6%, DHA%, and LA% provided additional useful information. We included omega-3% and omega-6% in the same model because their correlation was relatively low (r = -0.12). We also checked the model for multicollinearity and found that the variance inflation factors (VIFs) for n-3 PUFAs and n-6 PUFAs were within acceptable ranges. In the fully adjusted model that included omega-3% and omega-6%, all variables had VIFs below 1.13, with omega-3% at a VIF of 1.06 and omega-6% at a VIF of 1.12. Similarly, in the model including DHA% and LA%, all variables also exhibited VIFs under 1.13, with DHA% recording a VIF of 1.07 and LA% a VIF of 1.10. Therefore, we decided to keep the content for omega-3 and omega-6 PUFAs. We hope that Reviewer will agree with us that this content only provides additional information to the readers.

      (2) It might not be methodologically sound to treat TG, HDL, LDL, and apolipoproteins as mediators. Since the model included comorbidities as covariates, hypercholesteremia and hypertriglyceridemia seemed to have been adjusted in the analysis. Thus, further adjusting these blood biomarkers for mediation analysis which overlapped with comorbidities is redundant.

      We appreciate your critical evaluation of our methodological approach. Your point is well-taken, especially in light of the fact that comorbidities such as hypercholesterolemia and hypertriglyceridemia have been accounted for as covariates in our model. This overlap, as you correctly identified, could indeed render the mediation analysis redundant. In concordance with your recommendation, and incorporating the comments of another reviewer, we have now omitted the mediation analysis involving these blood biomarkers from our study. We believe this adjustment strengthens the methodological soundness of our research and are thankful for your contribution to this refinement. We have updated our manuscript to reflect these changes and ensure our analysis remains robust and free from redundancy.

      (3) It's imperative to exercise caution when drawing causal conclusions from the observed correlations. The inherent constraints of observational studies, coupled with potential residual confounding or reverse causality, should be acknowledged.

      We concur with the caution against implying causality from correlations observed in our study. As such, we have carefully refrained from claiming any causal relationships within our paper. We acknowledge that the term "protective effects" could suggest a causal inference, and we have revised our language to describe these observations as "inverse associations" to more accurately reflect the nature of our findings.

      We have also addressed the inherent limitations of observational research in the Discussion section under 'limitations' of our manuscript. There, we recognize that while we have accounted for many confounders, the possibility of residual confounding cannot be entirely excluded. We also agree that reverse causality is a concern in observational studies. To mitigate this, we performed a sensitivity analysis excluding participants who died within the first year of follow-up. The results from this analysis, which are provided in Additional file 2: Table S12, show consistency with our main findings, suggesting that the observed associations are less likely to be predominantly driven by reverse causation. We are grateful for your insights, which have guided us in strengthening our manuscript and ensuring that our conclusions are presented with the appropriate scientific rigor.

      (4) To guide subsequent scholarly endeavors, the manuscript might propose potential research trajectories, such as spearheading randomized controlled trials to delve deeper into the causal nexus between plasma omega-6/omega-3 ratios and mortality outcomes or probing the mechanistic underpinnings of the observed correlations.

      We agree that conducting randomized controlled trials could illuminate the potential causal relationships between plasma PUFA biomarkers and mortality outcomes. While the primary focus of our manuscript is to report on associations, we acknowledge the importance of causal analysis in advancing the field. In our secondary analysis, we touched upon mediation effects of blood biomarkers, which could serve as a preliminary step towards establishing causality. Although our current work did not delve deeply into causal mechanisms, the results we have presented may indeed stimulate further exploration. By reporting our mediation analysis results, we aim to provide a foundation that other researchers might build upon. We hope that our work will act as a catalyst for more in-depth studies, such as RCTs or mechanistic investigations, to pursue the questions we have begun to explore.

      Following this recommendation, we have revised our Conclusion paragraph and added: “Our findings support the active management of a high circulating level of omega-3 fatty acids and a low omega-6/omega-3 ratio to prevent premature death. Future research is warranted to further test the causality, such as Mendelian randomization and randomized controlled trials. Mechanistic research, including comprehensive mediation analysis, in-depth experimental characterization in animal models or cell lines, and intervention studies, is also needed to unravel the molecular and physiological underpinnings.”

      Reviewer #3 (Recommendations for the Authors):

      (1) Line 32. Delete "a balanced" because a balanced o6:o3 cannot be defined.

      Thank you for pointing out the issue with the term "a balanced". Most authors agree with your observation that defining what constitutes a 'balanced' ratio can be ambiguous and potentially misleading. One author, JTB, disagrees that “balance” as a concept is unacceptably ambiguous or misleading. In response, we have removed the words from our manuscript.

      (2) In the abstract you should present the findings for omega-6 and omega-3 PUFAs first and then the findings for the ratio.

      We appreciate your suggestion to present the findings for omega-6 and omega-3 PUFAs prior to those for the ratio in the abstract. As laid out in the Background section, the ratio was our primary exposure of interest. So, we organized our manuscript by centering on the ratio. We are glad that both Reviewer #1 and #2 expressed a particular interest in the ratio findings and urged us to keep the ratio as the focus. We believe that this emphasis reflects the novel aspects of our research and aligns with the thematic structure of our manuscript.

      (3) Line 80. controversial should read uncertain.

      Thank you for the suggestion. We have changed “controversial” to “uncertain”.

      (4) It is unclear which fatty acids are included in total PUFAs, omega-6 PUFAs and omega-3 PUFAs. It is vital that this is specified.

      Thank you very much for your suggestion. We agree that it is important to clarify the specific fatty acids included in the analysis. In the revised manuscript, we emphasized that we analyzed “total omega-6 PUFAs” and “total omega-3 PUFAs”, while “LA is one type of omega-6 PUFAs” and “DHA is one type of omega-3 PUFAs”. We also revised the Method section of “Ascertainment of exposure” to provide more information about how the exposures were measured. Here is what we added in the “Methods- Ascertainment of exposure” section of our manuscript: “Five PUFAs-related biomarkers were directly measured in absolute concentration units (mmol/L), including total PUFAs, total omega-3 PUFAs, total omega-6 PUFAs, docosahexaenoic acid (DHA), and linoleic acid (LA). Of note, DHA is one type of omega-3 PUFAs, and LA is one type of omega-6 PUFAs. Our primary exposure of interest, the omega-6/omega-3 ratio, was calculated based on their absolute concentrations. We also performed supplemental analysis for four exposures, the percentages of omega-3 PUFAs, omega-6 PUFAs, DHA, and LA in total fatty acids (omega-3%, omega-6%, DHA%, and LA%), which were calculated by dividing their absolute concentrations to that of total fatty acids.”

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The signaling pathway upstream of Maf1 remains unknown. In eukaryotes, Maf1 is a negative regulator of RNA pol III and is regulated by external signals via the TORC pathway. Since TORC components are absent in the apicomplexan lineage, one central question that remains open is how Maf1 is regulated in P. falciparum. Magnesium is probably not the sole stimulus involved, as suggested by the observation that Ile deprivation also down-regulates RNA pol III activity.

      We agree that there is still much to uncover relating to the PfMaf1 signaling pathway. While we still do not know each component, we have been able to link external factors (of course not limited to only magnesium) to the increased nuclear occupancy of PfMaf1. Other protein interactors that potentially regulate PfMaf1, while not confirmed, have been identified in plasma sample as candidates for future experiments to validate their potential involvement of RNA Pol III inhibition.

      The study does not address why MgCl2 levels vary depending on the clinical state. It is unclear whether plasma magnesium is increased during asymptomatic malaria or decreased during symptomatic infection, as the study does not include control groups with non-infected individuals. Along the same line, MgCl2 supplementation in parasite cultures was done at 3mM, which is higher than the highest concentrations observed in clinical samples.

      This reviewer raised a valid point. The plasma magnesium levels for the wet symptomatic samples (averaging [0.79mM]) were within the normal range of a healthy individual (between [0.75-0.95mM]) while the dry asymptomatic levels were above the normal range (averaging [1.13mM]). Ideally, we would have liked to have control uninfected plasma samples from individuals from The Gambia. Unfortunately, field studies and human volunteer studies do not always have all the ideal controls that in vitro studies have. We recognize that [3mM] is higher than the normal range for magnesium levels, which is why we included a revised Supplementary Figure 3A. This figure shows that magnesium concentrations as low as [1mM] (similar to the levels found in dry asymptomatic samples) reduced the expression of RNA Pol III-transcribed genes.

      Although the study provides biochemical evidence of Maf1 accumulation in the parasite nuclear fraction upon magnesium addition, this is not fully supported by the immunofluorescence experiments.

      We agree that the resolution of IFA images does not allow to support the WB data. We believe that the importance of the IFA Supplementary Figure is to show that PfMaf1 clusters together in foci, which has not been previously reported.

      Reviewer #2 (Public Review):

      Weaknesses:

      However, most analyses are rather preliminary as only very few (3-5) candidate genes are analyzed by qPCR instead of carrying out comprehensive analyses with a large qPCR panel or RNA-seq experiments with GO term analyses. Data presentation lacks clarity, the number of biological replicates is rather low and the statistical analyses need to be largely revised. Although the in vivo data from wet (mildly symptomatic) and dry (asymptomatic) season parasites with different expression levels of Pol III-regulated genes, var genes, and MgCl2 are interesting, the link between the in vitro data and the in vivo virulence of P. falciparum, which is made in many sections of the manuscript, should be toned down. Especially since (i) the only endothelial receptor studied is CD36, which is associated with parasite binding during mild malaria, and (ii) several studies provide contradictory data on MgCl2 levels during malaria and in different disease states, which is not further discussed, but the authors mainly focused on this external stimulus in their experiments.

      We agree that, ideally, we would have liked to do full RNA-seq on The Gambia samples. However, that was out of the scope of this project. The RNA samples were limited which is why we did not use more primers. We believe that an appropriate number of replicates was done for the experiments. The wet symptomatic samples from this study were from mildly symptomatic individuals, as stated in the manuscript. Therefore, CD36 was a relevant receptor to use for our studies.

      We agree that the published studies about magnesium levels in infected individuals are not always consistent. What these studies do not consider is the time of year, whether the infection occurred during the dry or wet season. These studies were also done in different regions of the world using different technologies. For this reason, we only highlight the observed difference observed in our field study data from The Gambia.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) The signals upstream of Maf1 remain rather a black box. 4 are tested - heat shock and low-glucose, which seem to suppress ALL transcription; low-Isoleucine and high magnesium, which suppress Pol3. Therefore the authors use Mg supplementation throughout as a 'starvation type' stimulus. They do not discuss why they didn't use amino acid limitation, which could be more easily rationalised physiologically. It may be for experimental simplicity (no need for dropout media) but this should be discussed, and ideally, sample experiments with low-IsoLeu should be done too, to see if the responses (e.g. cytoadhesion) are all the same.

      We agree that deprivation of isoleucine would have been another experimental assay for our study, but it also would not have been as novel as magnesium. While understanding the exact mechanism or involvement of magnesium as a stress condition was not the scope of this manuscript, we believe that our data will be valuable into demonstrating that external stimuli act on P. falciparum virulence gene expression via RNA Pol III inhibition. Since we also had plasma level data for magnesium, and not isoleucine, we believed it made for a better external factor to use for our in vitro studies.

      (2) The proteomics, conducted to seek partners of Maf1, is probably the weakest part. From Figure S3: the proteins highlighted in the text are clearly highly selected (as ones that might be relevant, e.g. phosphatases), but many others are more enriched. It would be good to see the whole list, and which GO terms actually came top in enrichment.

      We apologize if the reviewer did not see the attached supplementary Co-IP MS data. The file includes all proteins found in each sample as well as GO term analysis. For the purpose of this work, we highlight proteins potentially involved in the canonical role of Maf1 that have been shown in model organisms to reversibly inhibit RNA Pol III (phosphatases, RNA Pol III subunits).

      (3) Figure 3 shows the Maf1-low line has very poor growth after only 5 days but it is stated that no dead parasites are seen even after 8 cycles and the merozoites number is down only ~18 to 15... is this too small to account for such poor growth (~5-fold reduced in a single cycle, day 3-5)? It would additionally be interesting to see a cell-cycle length assessment and invasion assay, to see if Maf1-low parasites have further defects in growth.

      We agree with the reviewer that the observed reduced merozoite numbers may not the only cause of the reduced growth rate. Other factors in the PfMaf1 knock-down line may contribute to the observed poor growth.

    1. Author Response

      Our answer to reviewer #1 comments:

      We attempted to perform structural characterization of the ASK1 complex with TRX1, but were unable to prepare a sufficiently stable ASK1:TRX1 complex for cryo-EM analysis, probably due to their relatively weak interactions. Therefore, we subsequently decided to use HDX-MS to characterize the structural changes of ASK1 induced by interactions with TRX1.

      Detailed information about cryo-EM data processing including 2D classification averages, local resolution of the EM map and FSC figure are shown in Supporting Information, Supplementary Table S1 and Figures S1-S3.

      We fully agree with the reviewer that the presence of hydrogen bonding cannot be reliably described at this resolution. However, if there is a sufficient electron density in a given region and a corresponding hydrogen bond donor-acceptor pair in the model, this suggests the possible presence of such an interaction.

      Our answer to reviewer #2 comments:

      We are fully aware that the use of a C-terminally truncated construct limits this study due to the presumed role of the C-terminus in ASK1 dimerization. A C-terminally truncated construct consisting of TBD, CRR, and KD (residues 88-973) was used due to the low expression yield and solubility of full-length human ASK1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We found the reviewer’s comments highly valuable and addressed them by the following additional experiments and changes in the text and the figures:

      (1) We measured the effect of ROCK MASO’s on the ROCK expression by immunostaining and observed a reduction in ROCK signal, supporting the downregulation of ROCK protein level under ROCK MASO’s (new Fig. S3).

      (2) We measured the effect of lower concertation of ROCK inhibitor, Y27632 (10µM), and observe the same phenotypes of skeletal loss, skeletal reduction and ectopic branching in this concentration (Fig. 2, S4). Importantly, these phenotypes were not observed when directly inhibiting PKA and PKC, in whole sea urchin embryos (1) and in skeletogenic cell cultures (2), further supporting the specificity of ROCK inhibitor.

      (3) We added a time course of Pl-ROCK expression and immunostaining of ROCK in the fertilized egg, that show that this gene is maternal and the protein is present in the egg Fig. 2SA-C.

      (4) We recorded F-actin in ROCK MASO’s and demonstrate that it is still detected around the spicules and their tips, similarly to ROCK inhibited embryos (new Fig.S3).

      (5) We revised the paper text and figures to provide a better description of our results, distinguish clearly between our data and our interpretations and emphasize the novelty of our findings.

      This paper demonstrates that ROCK, F-actin polymerization and actomyosin contractility play critical roles in biomineral growth and in shaping biomineral morphology in the sea urchin embryo, and that ROCK activity affects skeletogenic gene expression. Our findings together with previous reports of the role of actomyosin in Eukaryotes biomineralization, suggest that this molecular machinery is a part of the common molecular tool-kit used in biomineralization. The identification of a common molecular mechanism within the diverse gene regulatory networks, organic scaffolds and minerals that Eukaryote use to build their biominerals will be of high interest to the field of biomineralization and evolutionary biology. Furthermore, our paper portrays the interplay between the cellular and the genetic machinery that drives morphogenesis. We believe it would be of great interest to the broad readership of eLife and particularly to the fields of biomineralization, cell, developmental and evolutionary biology.

      Thank you very much for the helpful review of our paper.

      Reviewer #1 (Public Review):

      We thank the reviewer for the appreciation of our work the helpful comments that guided us to strengthen the experimental evidence for our conclusions and increase the paper’s clarity. Below are our responses to the specific comments:

      Major comments

      One MASO led to reduced skeleton formation while the other one additionally induced ectopic branching. How was the optimum concentration for the MASOs determined? Did the authors perform a dose-response curve? What is the reason for this difference? Which of the two MASOs can be validated by reduced ROCK protein abundance? Since the ROCK antibody works, I would like to see a control experiment on Rock protein abundance in control and ROCK MO injected larvae which is the gold-standard for validating the knock-down.

      We tested several MASO concentrations to identify a concentration where the control embryos injected with Random MASO were overall healthy and ROCK MASO’s showed clear phenotypes.

      To test the effect of ROCK MASO’s on ROCK protein levels we did immunostaining experiments that are now presented in new Fig. S3. We could not do Western blot for injected embryos since ROCK antibody requires thousands of embryos for Western blot, which is not feasible for injected embryos. Therefore, we tested the effect of the two translation ROCK MASO’s on ROCK abundance compared to uninjected and Random MASO injected embryos using immunostaining. We observed a reduction of ROCK signal, supporting the downregulation of ROCK protein level in these genetic perturbations (new Fig. S3).

      L212 "Together, these measurements show that ROCK is not required for the uptake of calcium into cells." But what about trafficking and exocytosis? As mentioned earlier, I think this is a really important point that needs to be confirmed to understand the function of ROCK in controlling calcification. In their previous study (reference 45) the authors demonstrated that they have superior techniques in measuring vesicle dynamics in vivo. Here an acute treatment with the ROCK inhibitor would be sufficient to test if calcein-positive vesicle motion, including the observed reduction in velocity close to the tissue skeleton interface, is affected by the inhibitor.

      We thank the reviewer for the appreciation of our previous work where we studied calcium vesicle dynamics in whole embryos (Winter et al, Plos Com Biol 2021). We agree with the reviewer that the best way to test directly the effect of ROCK on mineral deposition and vesicle kinetics is to observe it in live skeletogenic cells. However, in Winter et al 2021, we found that the skeleton (spicules) doesn’t grow when the embryos are immobilized in either control or treated embryos. We have to immobilize the embryos to record live timelapses of whole embryos. Hence, this means that we can not determine the role of ROCK or any other perturbation in vesicle trafficking and exocytosis based on experiments conducted in immobilized whole embryos, since skeletogenesis is arrested. We believe that we can do it in skeletogenic cell cultures and we are currently developing this assay for vesicle tracking, but this is beyond the scope of this current work.

      Is there a colocalization of ROCK and f-actin in the tips of the spicules? This would support the mechano-sensing-hypothesis by ROCK.

      Our studies show that F-actin is localized around the spicule cavity and in the cortex of the cells (Figs. 5 and 6) while ROCK is enriched in the skeletogenic cell bodies, with some localization near the skeletogenic cell membranes (Fig. 1). To directly address the reviewer question we immune-stained ROCK and F-actin in the same embryos, and showed that their sub-cellular localizations does not show a strong overlap (Fig. S3 Q-T). However, ROCK does not bind F-actin directly: ROCK activates another kinase, LimK that phosphorylates Cofilin that interacts with F-actin. Therefore, the fact that ROCK is not colocalized with F-actin does not support nor contradicts the possible role of ROCK in mechano-sensing.

      L 283. "F-actin is enriched at the tips of the spicules independently of ROCK activity" The results of this paragraph clearly demonstrate that ROCK inhibition has no effect on the localization of f-actin at the tips of the growing spicules. In addition, the new cell culture experiments underline this observation. Still, the central question that remains is, what is the interaction between ROCK, f-actin, and the mineralization process, that leads to the observed deformations? What does the f-actin signal look like in a branched phenotype or in larvae that failed to develop a skeleton (inhibition from Y20)?

      As we report in Fig. 6, and now on new Fig. S3, under ROCK late inhibition or in ROCK morphants, we still detect F-actin around the spicule and enriched at the tips. When ROCK is inhibited and the embryo fails to develop a skeleton, we observe Factin accumulation in the skeletogenic cells, but the F-actin is not organized (Fig. 5). As the spicule is absent in this condition, it is hard to conclude whether the effect on F-actin organization is direct or due to the absence of spicule in this condition. We stated that explicitly in the current version in the results, lines 324-326 and in the discussion, lines 405-408.

      Immunohistochemical analyses on f-actin localization and abundance should be additionally performed with ROCK knock-down phenotypes to confirm the pharmacological inhibition.

      We did that in our new Figure S3 and showed that ROCK morphant show the same F-actin localization at the tips like control and ROCK inhibited embryos.

      L 365 "...supporting its role in mineral deposition..." "...Overall, our studies indicate that ROCK activity....is essential for the formation of the spicule cavity......which could be essential for mineral deposition..." I think the authors need to do a better job in clearly separating between the potential processes impacted by ROCK perturbation. Is it stabilization and mechano-sensing in the spicule tip or the intracellular trafficking and deposition of the ACC? If the dataset does not allow for a definite conclusion, I suggest clearly separating the different possibilities combined with thorough discussion-based findings from other mineralizing systems where the interaction between ROCK and F-actin has been described.

      We thank the reviewer for this important comment. We believe that ROCK and the actomyosin are involved in both, mechano-sensing of the rigid biomineral and in the transport and exocytosis of mineral-bearing vesicles. In the current version we provide explicit explanations of these two hypotheses in the discussion section. The possible role in exocytosis and the experiments that are required to assess this role are described in lines 427-439, and the possible mechano-sensing role and effect on gene expression is described in lines 440-453.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      L185 "These SR-µCT measurements show that the rate of mineral deposition is significantly reduced under ROCK inhibition." To correctly support this statement I would suggest to calculate the real growth rates (µm3 time-1). For example, an increase in volume from 6,850 µm3 at 48 hpf to 14,673 µm3 at 72 hpf would result in a growth rate of 7823 µm3 24h-1.

      We thank the reviewer for this suggestion. We calculated the rate of spicule growth as the reviewer suggested and we added this information in lines 218-221.

      L343: "This implies that....within the skeletogenic lineage." This concluding sentence is very speculative and therefore misplaced in the results section.

      We removed this sentence from the results section into the discussion, lines 443-445.

      L382: "The participation of F-actin and ROCK in polarized tip-growth and vesicle exocytosis has been observed in both, animals and plants." L407-409: "...F-actin could be regulating the localized exocytosis of mineral-bearing vesicles...." I think this is exactly the core question that remains unresolved in this study. To reduce speculations I strongly recommend addressing the effect of ROCK inhibition on vesicle trafficking and exocytosis (Monitoring of calcein-positive Vesicles in PMCs).

      We agree with the reviewer that this is a critical question that we would have address, but as we explained above, is beyond the scope of this study.

      Figure 5: The values below the scale bars in the newly added figures U+V are extremely small. Also, the Legend for this figure sounds incorrect. Should read: "...and skeletogenic cell cultures that were treated with 30µM ROCK inhibitor that was added at 48hpf and recorded at 72hpf.

      We increased the font near the scale bars and corrected the figure caption. Thanks for this and your other helpful comments!

      Reviewer #2 (Public Review):

      We thank the reviewer for raising the important issue of inhibitor concentration which led us to do additional experiments with lower concentration that were valuable and strengthen the manuscript. We also thank the reviewer for asking us to be clearer with the interpretation of the results. Below are our responses to the specific comments:

      My concerns are the interpretation of the experiments. The main overriding concern is a possible over-interpretation of the role of ROCK. In the literature that ROCK participates in many biological processes with a major contribution to the actin cytoskeleton. And when a function is attributed to ROCK, it is usually based on the determination of a protein that is phosphorylated by this kinase. Here that is not the case. The observation here is in most cases stunted growth of the spicule skeleton and some mis-patterning occurs or there is an absence of skeleton if the inhibitor is added prior to initiation of skeletal growth. They state in the abstract that ROCK impairs the organization of F-actin around the spicules. The evidence for that as a direct role is absent.

      We agree with the reviewer that since the spicule doesn’t form under ROCK continuous inhibition, it is unclear if the absence of F-actin around the spicule in this condition is a direct outcome of the lack of ROCK activation of F-actin polymerization, or an indirect outcome due to the lack of spicule to coat. We therefore deleted this line in the abstract and explicitly stated that we cannot conclude whether the impaired F-actin organization is directly due to ROCK effect on actin polymerization in the results, lines 324-326 and in the discussion, lines 405-408.

      They use morpholino data and ROCK inhibitor data to draw their conclusion. My main concern is the concentration of the inhibitor used since at the high concentrations used, the inhibitor chosen is known to inhibit other kinases as well as ROCK (PKA and PKC). They indicate that this inhibition is specifically in the skeletogenic cells based on the isolation of skeletogenic cells in culture and spicule production either under control or ROCK inhibition and they observe the same - stunting and branching or absence of skeletons if treated before skeletogenesis commences. Again, however, the high concentrations are known to inhibit the other kinases.

      In the previous version of the paper we used the range of 30-80µM Y-27632 to block ROCK activity. These concentrations are commonly used in mammalian systems and in Drosophila to block ROCK activity (3-8). The reviewer is correct stating that at high concentration, this inhibitor can block PKA and PKC. However, the affinity of the inhibitor for these kinases is more than 100 times lower than its affinity to ROCK as indicated by the biochemical Ki values reported in the manufactory datasheet: 0.14-0.22 μM for ROCK1, 0.3 μM for ROCK2, 25 μM for PKA and 26 μM for PKC.

      Importantly, these Ki values are based on biochemistry assays where the activity of the inhibitor is tested in-vitro with the purified protein. Therefore, these concentrations are not relevant to cell or embryo cultures where the inhibitor has to penetrate the cells and affect ROCK activity in-vivo. Y-27632 activity was studied both in-vitro and in-vivo in Narumiya, Ishizaki and Ufhata, Methods in Enzymology 2000 (9). This paper reports similar concentrations to the ones indicated in the manufactory datasheet for the in-vitro experiments, but shows that 10µM concentration or higher are effective in cell cultures. We therefore tested the effect of 10µM Y-27632 added at 0hpf (continuous inhibition) and at 25hpf (late inhibition) and added this information to Figs. 2 and S3. Continuous inhibition at this concentration resulted with three major phenotypes: skeletal loss, spicule initiations and small spicules with ectopic branching. This result supports our conclusion that ROCK activity is necessary for spicule formation, elongation and prevention of branching. Late inhibition in this concentration resulted with the majority of the embryos developing branched spicules, which is very similar to the effect of MyoII inhibition with Blebbistatin. This result again, supports the inference that ROCK activity is required for normal skeletal growth and the prevention of ectopic branching. Importantly, there are two papers were PKA and PKC were directly inhibited in whole sea urchin embryos (1) and in skeletogenic cell cultures (2). In both assays, PKC inhibition resulted with mild reduction of spicule length while PKA inhibition did not affect skeletal formation. Neither skeletal loss nor ectopic branching were ever observed under PKC or PKA inhibition, supporting the specific inhibition of ROCK by Y-27362. Furthermore, both genetic and pharmacological perturbations of ROCK resulted with significant reduction of skeletal growth and with the enhancement of ectopic branching. Therefore, we believe we provide convincing evidence for the role of ROCK in spicule formation, growth and prevention of branching. We revised Fig. 2 and S3 to include the 10µM Y-27632 data and the text describing the inhibition to include the explanations and references we provided here.

      They use blebbistatin and latrunculin and show that these known inhibitors of actin cytoskeleton lead to abnormal spiculogenesis, This coincidence is suggestive but is not proof that it is ROCK acts on the actomyosin cytoskeleton given the specificity concerns.

      As stated above, we believe that in the current vesion we overcame the specificity concerns and provided solid evidence that ROCK activity is necessary for spicule formation, growth and prevention of branching. Furthermore, the skeletogenic phenotypes of late 10µM Y-27632 are highly similar to those of MyoII inhibition (Blebbistatin) while the phenotypes of higher concetrations resemble the inhibition of actin polymerization by Latrunculin. We agree with the reviewer that: “This coincidence is suggestive but is not proof that ROCK acts on the actomyosin cytoskeleton” and we revise the discussion paragraph to differentiate between our solid findings and our speculations (lines 421-426): “These correlative similarities between ROCK and the actomyosin perturbations lead us to the following speculations: the low dosage of late ROCK inhibition is perturbing mostly ROCK activation of MyoII contractility while the higher dosage affects factors that control actin polymerization (Fig. 8F). Further studies in higher temporal and spatial resolution of MyoIIP activity and F-actin structures in control and under ROCK inhibition will enable us to test this.”

      Reviewer #2 (Recommendations For The Authors):

      The following areas require attention:

      (1) You begin and end the abstract with statements on evolution in which the actomyosin cytoskeleton is associated with skeletogenesis despite different GRNs, different contributing proteins, etc. You then move to ROCK and claim to reveal that ROCK is a central player in the process. As above, in the judgement of this reviewer, you fail to establish a direct role of ROCK to the actomyosin role in skeletogenesis. Sure, the ROCK inhibitors suggest that ROCK plays some kind of role in the process but you also indicate that ROCK could act on many processes, none of which you directly associate with the necessary activity of ROCK.

      We agree that our paper provides correlative similarities between the phenotypes of ROCK and those of direct pertrubations of the actomyosin network, and lacks causal relationship. We made this point clear throughout the current version of the manuscript.

      (2) In the abstract you report that ROCK inhibition impairs the actin cytoskeleton around the skeleton. In examining your images in Fig. 5 that is not the case. Based on Phalloidin staining, actin surrounds both the control and the ROCK-inhibited skeleton. The distribution of actin is the same in both cases. Myosin is also stained in this figure and it too shows similar staining both in experimental and control. So, to this reviewer, there is insufficient evidence to suggest that the actin cytoskeleton is impaired, and there is no evidence directly relating ROCK with that cytoskeleton. I'm not questioning the observation that inhibition of ROCK causes stunting and mispatterning of the skeleton. That you show and quantify well. The issue is the precise target of ROCK. Your data does not establish the specific cause. It could be the actin cytoskeleton but your experiments do not directly address that.

      Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (3) In parts of the manuscript you use the term filopodia and in other parts I think you use pseudopodia to refer to the same structure. Since Ettensohn has provided the most evidence on the organization of the skeletogenic syncytia, I suggest you use the same term he used for those cellular extensions.

      The filopodia and the pseudopodia are two distinct structures generated by the skeletogenic cells. The filopodia is the common cellular extension described in many cells, while the term “pseudopodia cable” describes the specific structure that forms between the skeletogenic cells in which the spicule cavity forms, in agreement with Prof. Ettensohn terminology.

      (4) In trying to find relationships you cite a number of previous papers at the end of the introduction. I went back to those papers and they describe (from your work) calcium exocytosis, plus filopodia formation, plus planar cell polarity, plus CDC42, any one of which could involve an actin cytoskeleton. You even cite a paper saying that perturbations of ROCK prevent spicule formation. I went back to that paper and that isn't the case. You then summarize the Introduction by relating ROCK and the actin cytoskeleton, thereby raising reader expectation that the two will be connected. As above, in reality, your evidence here does not connect the two.

      We thank the reviewer for giving us credit for all these works, but only the paper on vesicle kinetics is from our lab (winter et al 2021). As for Croce et al, 2006 that the reviewer refers to: in Fig. 9A, 75µM of Y-27632 is used to inhibit ROCK in the same sea urchin species that we use, and the phenotype is identical to what we observe – the skeletogenic cells are there, but the spicule is not formed. As mentioned above, in the current version we distinguished clearly between our solid findings and our interpretations.

      (5) You emphasize in Fig. 1 the inhibition of ROCK in the presence of VEGFR inhibition. However, at no place in the manuscript do you say anything about how VEGFR is inhibited, when it is inhibited, or how you know it is inhibited. That oversight must be corrected. You mention axitinib but don't say anything about what it does. Some readers may know its activity but many will not.

      We now indicate that we use Axitinib to block VEGFR in the results section (line 104) and in the methods section (lines 470-471).

      (6) Fig. 2. The use of Y27632 as a selective inhibitor of ROCK. According to data sheets from the manufacturer, at the levels used in your experiments, 120 µm, 80 µm and 30 µm, those levels of inhibitor also inhibit the activity of PKA and PKC (both inhibited at around 25 µm). This is concerning because of the literature indicating that activation of the VEGFR operates through PKA. Inhibition of PKA, then, would inhibit the activity of VEGF signaling. Thus, the inhibitory effects of Y27632 may actually not be attributed specifically to ROCK. Furthermore, the heading of this section states that ROCK activity controls initiation, growth, and morphology of the spicule. Yet, even in high levels of inhibitor spicule production is initiated. Yes, the growth and the morphology are compromised, but the initiation doesn't seem to be.

      The spicule fails to form under ROCK continuous inhibition in all concentrations (Fig. 2). Also, as we explained in details above, these Ki values are based on biochemical experiments with purified proteins and are not relevant to in-vivo use of the inhibitor. Yet, these Ki values demonstrate that the affinity of the inhibitor to ROCK is 100 higher than of its affinity to PKA and PKC. Specifically to the reviewer suggestion here: direct inhibition of PKA does not have skeletogenic phenotypes, not in whole embryos (1) and not in skeletogenic cell culture (2). Since we see the same skeletogenic phenotypes at low Y-27362 concentration and the genetic and pharmacological pertrubations of ROCK reconcile, we believe that these phenotypes can be atributed directly to ROCK.

      (7) The synchrotron study is very nice with two points that should be addressed. Again, a high concentration of Y27632 was used giving a caveat on ROCK specificity. And second, the blue and green calcein pulses are very nice but the recent paper by the Bradham group should be cited.

      We added a reference to Bradham recent paper on two calcein pulses (10).

      (8) Fig. 5 is where an attempt is made to associate ROCK inhibition to alterations in actomyosin. Again, a high concentration of the inhibitor is used casting doubt on whether it specifically inhibits ROCK. However, even if the inhibition is specific to ROCK the images do not provide convincing evidence that ROCK activity normally is directed toward actomyosin. This is crucial to the manuscript.

      As stated above, we addressed the specificity in this version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (9) Again in Fig. 6 the inhibitor is used with the same concern about whether the effects noted are due to ROCK.

      Fig. 6 is now Fig. 7 – the effect of ROCK on gene expression and as explained above, we addressed the specificity in this version.

      (10) Lines 350-358. This interpretation falls apart without showing that the inhibitor is specific for ROCK as indicated above. Also, Fig. 5 is unconvincing in showing a difference in actin or myosin distribution in control vs ROCK inhibited embryos. Yes, the spicules are stunted, but whether actin or myosin have anything to do with that as a result of lack of ROCK activity is not demonstrated.

      As stated above, we addressed the specificity in the revised version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (11) Throughout, the manuscript spelling, grammar, and sentence structure will require extensive editing. The mistakes are numerous.

      We did our best to correct the spelling and grammar. If we still missed some mistakes, we would be happy to further correct them.

      References

      (1) Mitsunaga K, Shinohara S, Yasumasu I. Probable Contribution of Protein Phosphorylation by Protein Kinase C to Spicule Formation in Sea Urchin Embryos: (sea urchin/protein kinase C/spicule formation/H-7/HA1004). Dev Growth Differ. 1990;32(3):335-42.

      (2) Mitsunaga K, Shinohara S, Yasumasu I. Does Protein Phosphorylation by Protein Kinase C Support Pseudopodial Cable Growth in Cultured MicromereDerived Cells of the Sea Urchin, Hemicentrotus pulcherrimus?: (sea urchin/protein kinase C/spicule formation/phorbol ester/H-7). Dev Growth Differ. 1990;32(6):647-55.

      (3) Su Y, Huang H, Luo T, Zheng Y, Fan J, Ren H, et al. Cell-in-cell structure mediates in-cell killing suppressed by CD44. Cell Discov. 2022;8(1):35.

      (4) Kagawa H, Javali A, Khoei HH, Sommer TM, Sestini G, Novatchkova M, et al. Human blastoids model blastocyst development and implantation. Nature. 2022;601(7894):600-5.

      (5) Canellas-Socias A, Cortina C, Hernando-Momblona X, Palomo-Ponce S, Mulholland EJ, Turon G, et al. Metastatic recurrence in colorectal cancer arises from residual EMP1(+) cells. Nature. 2022;611(7936):603-13.

      (6) Becker KN, Pettee KM, Sugrue A, Reinard KA, Schroeder JL, Eisenmann KM. The Cytoskeleton Effectors Rho-Kinase (ROCK) and Mammalian DiaphanousRelated (mDia) Formin Have Dynamic Roles in Tumor Microtube Formation in Invasive Glioblastoma Cells. Cells. 2022;11(9).

      (7) Segal D, Zaritsky A, Schejter ED, Shilo BZ. Feedback inhibition of actin on Rho mediates content release from large secretory vesicles. J Cell Biol. 2018;217(5):1815-26.

      (8) Fischer RS, Gardel M, Ma X, Adelstein RS, Waterman CM. Local cortical tension by myosin II guides 3D endothelial cell branching. Curr Biol. 2009;19(3):2605.

      (9) Narumiya S, Ishizaki T, Uehata M. Use and properties of ROCK-specific inhibitor Y-27632. Methods Enzymol. 2000;325:273-84.

      (10) Descoteaux AE, Zuch DT, Bradham CA. Polychrome labeling reveals skeletal triradiate and elongation dynamics and abnormalities in patterning cue-perturbed embryos. Dev Biol. 2023;498:1-13.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The OSCA/TMEM63 channels have recently been identified as mechanosensitive channels. In a previous study, the authors found that OSCA subtypes (1, 2, and 3) respond differently to stretch and poke stimuli. For example, OSCA1.2 is activated by both poke and stretch, while OSCA3.1, responds strongly to stretch but poorly to poke stimuli. In this study, the authors use cryo-EM, mutagenesis, and electrophysiology to dissect the mechanistic determinants that underlie the channels' ability to respond to poke and stretch stimuli.

      The starting hypothesis of the study is that the mechanical activation of OSCA channels relies on the interactions between the protein and the lipid bilayer and that the differential responses to poke and stretch might stem from variations in the lipid-interacting regions of OSCA proteins. The authors specifically identify the amphipathic helix (AH), the fenestration, and the Beam Like Domain (BLD) as elements that might play a role in mechanosensing.

      The strength of this paper lies in the technically sound data - the structural work and electrophysiology are both very well done. For example, the authors produce a high-resolution OSCA3.1 structure which will be a useful tool for many future studies. Also, the study identifies several interesting mutants that seemingly uncouple the OSCA1.2 poke and stretch responses. These might be valuable in future studies of OSCA mechanosensation.

      However, the experimental approach employed by the authors to dissect the molecular mechanisms of poke and stretch falls short of enabling meaningful mechanistic conclusions. For example, we are left with several unanswered questions surrounding the role of AH and the fenestration lipids in mechanosensation: Is the AH really important for the poke response if mutating residues conserved between OSCA1.2 and OSCA3.1 disrupts the OSCA1.2 ability to respond to poke but mutating the OSCA1.2 AH to resemble that of OSCA3.1 results in no change to its "pokability"? Similar questions arise in response to the study of the fenestrationlining residues.

      We thank the reviewer for their feedback. We believe that the different OSCA1.2 mutants on their own suggest an involvement of the AH and fenestration-lining residues in its mechanosensitive response. We attribute the inability to restore the poke response of OSCA3.1 with similar mutations to its inherent high threshold to this particular stimulus and perhaps other structural differences, or a combination of them, that we did not probe in this study. We agree more work is required in the field to address these remaining questions and further dissect the difference between poke and stretch responses.

      Reviewer #2 (Public Review):

      Summary:

      Jojoa-Cruz et al. determined a high-resolution cryo-EM structure in the Arabidopsis thaliana (At) OSCA3.1 channel. Based on a structural comparison between OSCA3.1 and OSCA1.2 and the difference between these two paralogs in their mechanosensitivity to poking and membrane stretch, the authors performed structural-guided mutagenesis and tested the roles of three structural domains, including an amphipathic helix, a beam-like domain, and a lipid fenestration site at the pore domain, for mechanosensation of OSCA channels.

      Strengths:

      The authors successfully determined a structure of the AtOSCA3.1 channel reconstituted in lipid nanodiscs by cryo-EM to a high resolution of 2.6 Å. The high-resolution EM map enabled the authors to observe putative lipid EM densities at various sites where lipid molecules are associated with the channel. Overall, the structural data provides the information for comparison with other OSCA paralogs.

      In addition, the authors identified OSCA1.2 mutants that exhibit differential responses to mechanical stimulation by poking and membrane stretch (i.e., impaired response to poke assay but intact response to membrane stretch). This interesting behavior will be useful for further study on differentiating the mechanisms of OSCA activation by distinct mechanical stimuli.

      Major weakness:

      The major weaknesses of this study are the mutagenesis design and the functional characterization of the three structural domains - an amphipathic helix (AH), a beam-like domain (BLD), and the fenestration site at the pore, in OSCA mechanosensation.

      (1) First of all, it is confusing to the reviewer, whether the authors set out to test these structural domains as a direct sensor(s) of mechanical stimuli or as a coupling domain(s) for downstream channel opening and closing (gating). The data interpretations are vague in this regard as the authors tend to interpret the effects of mutations on the channel 'sensitivity' to different mechanical stimuli (poking or membrane stretch). The authors ought to dissect the molecular bases of sensing mechanical force and opening/closing (gating) the channel pore domain for the structural elements that they want to study.

      We agree with the reviewer that our data are unable to distinguish the transduction of a mechanical stimulus and channel gating. We set up to determine whether these features were involved in the mechanosensitive response. However, as the reviewer points out, evaluating whether they work as direct sensors or coupling domains would require a more involved experimental design that lies beyond the scope of this work. Thus, we do not claim in our study whether these features act as direct sensors of mechanosensitive stimuli or as coupling domains, only their involvement.

      Furthermore, the authors relied on the functional discrepancies between OSCA1.2 (sensitive to both membrane poking and stretch) and OSCA3.1 (little or weak sensitivity to poking but sensitive to membrane stretch). But the experimental data presented in the study are not clear to address the mechanisms of channel activation by poking vs. by stretch, and why the channels behave differently.

      We had hoped that when we switched regions of the OSCA1.2 and OSCA3.1 channels we would abolish poke-induced responses in OSCA1.2 and confer poke-induced sensitivity to OSCA3.1. We agree with the reviewer that we were not able to pinpoint the reason or multiple reasons, as it could be a compounded effect of several differences, that caused OSCA3.1 higher threshold and thus we could not confer to it an OSCA1.2-like phenotype. Yet, we shed some light on some of the structural differences that appear to contribute to OSCA3.1 behavior, as mutagenesis of OSCA1.2 to resemble this channel led to OSCA3.1-like phenotype.

      (2) The reviewer questions if the "apparent threshold" of poke-induced membrane displacement and the threshold of membrane stretch are good measures of the change in the channel sensitivity to the different mechanical stimuli.

      The best way to determine an accurate measure of sensitivity to mechanical stimuli is stretch applied to a patch of membrane. There are more complicating factors that influence the determination of "apparent threshold" in the whole cell poking assay, including visualizing when the probe first hits the cell (very difficult to see). With that said, the stretch assay has its own issues such as the creep of the membrane into the pipette glass which we try to minimize with positive pressure between tests.

      (3) Overall, the mutagenesis design in the various structural domains lacks logical coherence and the interpretation of the functional data is not sufficient to support the authors' hypothesis. Essentially the authors mutated several residues on the hotspot domains, observed some effects on the channel response to poking and membrane stretch, then interpreted the mutated residues/regions are critical for OSCA mechanosensation. Examples are as follows.

      In the section "Mutation of key residues in the amphipathic helix", the authors mutated W75 and L80, which are located on the N- and C-terminal of the AH in OSCA1.2, and mutated Pro in the OSCA1.2 AH to Arg at the equivalent position in OSCA3.1 AH. W75 and L80 are conserved between OSCA 1.2 and OSCA3.1. Mutations of W75 and/or L80 impaired OSCA1.2 activation by poking, but not by membrane stretch. In comparison, the wildtype OSCA3.1 which contains W and L at the equivalent position of its AH exhibits little or weak response to poking. The loss of response to poking in the OSCA1.2 W/L mutants does not indicate their roles in pokinginduced activation.

      Besides, the P2R mutation on OSCA1.2 AH showed no effect on the channel activation by poking, suggesting Arg in OSCA3.1 AH is not responsible for its weak response to poking. Together the mutagenesis of W75, L80, and P2R on OSCA1.2 AH does not support the hypothesis of the role of AH involved in OSCA mechanosensation.

      Mutagenesis of OSCA1.2 in the amphipathic helix for residues W75 and L80 suggests a role of the helix in the poke response in OSCA1.2, regardless of OSCA3.1 having the same residues. Furthermore, the lack of alteration in the response for mutant P77R suggests that specific residues of the helix are involved in this response and is not a case where any mutation in the helix will lead to a loss of function.

      OSCA3.1 WT exhibits a high-threshold response (near membrane rupture) in the poke assay without any mutations, and this could be due to other features, for example, the residues lining the membrane fenestration, as well as features not identified/probed in this study. We agree with the reviewer that the differences in the AH do not explain the different response to poke in OSCA1.2 and OSCA3.1, and we have added this statement explicitly in the discussion for clarification (line #251-252).

      In the section "Replacing the OSCA3.1 BLD in OSCA1.2", the authors replaced the BLD in OSCA 1.2 with that from OSCA3.1, and only observed slightly stronger displacement by poking stimuli. The authors still suggest that BLD "appears to play a role" in the channel sensitivity to poke despite the evidence not being strong.

      We agree with the reviewer that the experiments carried out show little difference between the response of OSCA1.2 WT and OSCA1.2 with OSCA3.1 BLD, and we have stated so (line #259: “Substituting the BLD of OSCA1.2 for that of OSCA3.1 had little effect on poke- or stretchactivated responses. Although these results suggest that the BLD may not be involved in modulating the MA response of OSCA1.2…”). However, the section of the discussion that the reviewer points out also considers evidence provided by recent reports from Zheng, et al. (Neuron, 2023) and Jojoa-Cruz, et al. (Structure, 2024) and we suggest an hypothesis to reconcile our findings with these new evidence.

      OSCA1.2 has four Lys residues in TM4 and TM6b at the pore fenestration site, which were shown to interact with the lipid phosphate head group, whereas two of the equivalent residues in OSCA3.1 are Ile. In the section "Substitution of potential lipid-interacting lysine residues", the authors made K435I/K536I double mutant for OSCA1.2 to mimic OSCA3.1 and observed poor response to poking but an intact response to stretch. Did the authors mutate the Ile residues in OSCA3.1 to Lys, and did the mutation confer channel sensitivity to poking stimuli resembling OSCA1.2? The reviewer thinks it is necessary to perform such an experiment, to thoroughly suggest the importance of the four Lys residues in lipid interaction for channel mechanoactivation.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are no longer able to perform such experiments.

      Reviewer #3 (Public Review):

      Summary:

      Jojoa-Cruz et al provide a new structure of At-OSCA3.1. The structure of OSCA 3.1 is similar to previous OSCA cryo-em structures of both OSCA3.1 and other homologues validating the new structure. Using the novel structure of OSCA3.1 as a guide they created several point mutations to investigate two different mechanosensitive modalities: poking and stretching. To investigate the ability of OSCA channels to gate in response to poking they created point mutations in OSCA1.2 to reduce sensitivity to poking based on the differences between the OSCA1.2 and 3.1 structures. Their results suggest that two separate regions are responsible for gating in response to poking and stretching.

      Strengths:

      Through a detailed structure-based analysis, the authors identified structural differences between OSCA3.1 and OSCA1.2. These subtle structural changes identify regions in the amphipathic helix and near the pore that are essential for the gating of OSCA1.2 in response to poking and stretching. The use of point mutations to understand how these regions are involved in mechanosensation clearly shows the role of these residues in mechanosensation.

      Weaknesses:

      In general, the point mutations selected all show significant alterations to the inherent mechanosensitive regions. This often suggests that any mutation would disrupt the function of the region, additional mutations that are similar in function to the WT channel would support the claims in the manuscript. Mutations in the amphipathic helix at W75 and L80 show reduced gating in response to poking stimuli. The gating observed occurs at poking depths similar to cellular rupture, the similarity in depths suggests that these mutations could be a complete loss of function. For example, a mutation to L80I or L80Q would show that the addition of the negative charge is responsible for this disruption not just a change in the steric space of the residue in an essential region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several questions regarding some of the aspects of your study:

      Mutation of the hydrophobic W75 and L80 in OSCA1.2 to charged residues significantly decreases the poke response in OSCA1.2 without affecting the stretch response. However, W75 and L80 are also present in OSCA3.1, which does not respond efficiently to poke. You conclude that these two residues are important for the poke response, but do not delve into why, if these residues are important, OSCA3.1 is not poke-sensitive.

      In addition, mutation of the OSCA1.2 AH to resemble that of OSCA3.1 does not produce channels that are less poke-sensitive. Given the data presented, if AH were a universal "poke sensor", one could also expect WT OSCA3.1 to exhibit a robust poke response, like OSCA1.2. Here I think it would be important to explain in more detail how this data might fit together.

      We thank the reviewer for bringing up this issue. We decided to test the importance of the AH due to the presence of similar structures in other mechanosensitive channels. Our data showed that single and double mutants of the AH of OSCA1.2 affected its poke response but not stretch. This supports the idea of the AH involvement in the poke response. Yet, we agree that the differences in the AH between OSCA1.2 and OSCA3.1 (P77R mutation) do not explain the higher threshold of OSCA3.1, we have explicitly added this in line #255. The particular OSCA3.1 phenotype may be due to other differences in the structure, for example, differences in the membrane fenestration area, or a combined effect of several differences, which we believe is more likely.

      I also have some questions about the protein-lipid interactions in the fenestration. A lipid has been observed in this location in both OSCA1.2 and OSCA3.1 structures. Mutation of the two OSCA1.2 lysines to isoleucines results in channels that are resistant to poke which leads to the conclusion that the interactions between the fenestration lysines and lipids are important for the poke response.

      Here, there are several questions that arise but are not answered:

      It is not shown what happens when OSCA3.1 isoleucines are mutated to lysines - do these mutants result in poke-able channels? Is the OSCA3.1 mechanosensing altered?

      We performed a preliminary test on OSCA3.1 I423K/I525K double mutant (n = 3). However, we did not see an increase in poke sensitivity. We attributed this to other unexplored differences in OSCA3.1 having an effect in channel mechanosensitivity.

      It is implied that the poke response is predicated on the lysine-lipid interaction. However, lipid densities are present in both OSCA1.2 and OSCA3.1 structures, indicating that both fenestrations interact with lipids. How can we be certain that the mutation of lysine to isoleucine does not disrupt an inter-protein interaction rather than a protein-lipid one? For example, the K435I mutation might disrupt interactions with D523 or the backbone of G527?

      The reviewer brings up a good point. We believe the phenotype seen is due to a different strength in the interaction between lipids and proteins, however, disrupted interaction with other residues is a valid alternative explanation. We agree that the suggested experiments will further clarify the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Similarly, the effects of single lysine-to-isoleucine (K435I or K536I) mutations are not explored.

      The observed effect might be caused by only one of these substitutions.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      I also wanted to take this opportunity to ask a couple of philosophical (?) questions about using a mammalian system to study ion channels that have evolved to function in plants. Your study highlights the intimate relationship between the lipid bilayer and protein function/mechanosensitivity. Plant cells contain high levels of sterols and cerebrosides that would significantly affect both cell stiffness and the specific interactions that can be formed between the protein and the lipid bilayer. I wonder if the properties of the lipid bilayer might shift the thresholds for poke and/or stretch stimuli and if structural elements that do not appear to have a major role in mechanosensation in a mammalian cell (e.g., BLD) might be very influential in a lipid environment that more closely resembles that of a plant?

      Conversely, is it possible that OSCA channels are not poke-sensitive in plant cells? These questions are beyond the scope of your study, but they might be a nice addition to your discussion.

      The reviewer poses a great question. Electrophysiological approaches for studying plant mechanosensitive channels suffer the limitation of not being able to fully reconstitute the environment of a plant cell. To be able to patch the cell, the cell wall needs to be disposed of, which eliminates the tension generated from this structure onto the membrane. In that sense, performing these assays in plant cells or another system would not give us a fully accurate picture of the physiological thresholds of these channels. Given this limitation, we performed our study with mammalian cells given our expertise with them. Like the reviewer, we are also intrigued by the effect of different membrane compositions on the behavior of OSCA channels and how these channels will behave under physiological conditions, but we agree with the reviewer that these questions are out of the scope of our work. To address this point, in line #294 we have added: “It is also important to note that the membrane of a plant cell contains a different lipid composition than that of HEK293 cells used in our assays, and thus these lipids, or the plant cell wall, may alter how these channels respond to physiological stimuli.”

      Line 313 For structural studies, human codon-optimized OSCA3.1. Could you please clarify what this means?

      We have changed the phrase to “For structural studies, the OSCA3.1 (UniProt ID: Q9C8G5) coding sequence was synthesized using optimized codons for expression in human cells and subsequently cloned into the pcDNA3.1 vector” in line #327 to clarify this sentence.

      As a final comment, in the methods you use references to previously published work. I would strongly encourage you to replace these with experimental details.

      We understand the reviewer’s argument. However, this article falls under eLIFE’s Research Advances and will be linked to the original published work to which we reference the method. As suggested in the guidelines for this type of article, we only described the methods that were different from the original paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) In line 85, provide C-alpha r.m.s.d. values for the structural alignment among OSCA3.1, OSCA1.1, and OSCA1.2 protomers.

      As requested, we have added the C-alpha RMSD in line #86.

      (2) In line 90, should the figure reference to Fig. 1d be Fig. 1e?

      We thank the reviewer for catching this error. We have corrected it in the manuscript.

      (3) In lines 89-94, what putative lipid is it resolved in the OSCA3.1 pore? Can the authors assign the lipid identity? Is this the same or different from the lipids resolved in OSCA1.2, OSCA1.1, and TMEM63?

      In the model, we have built the lipid as palmitic acid to represent a lipid tail, but the resolution in this area makes it difficult to ascertain the identity of said lipid, hence we cannot compare to lipids in other orthologs.

      (4) In lines 115-121, the authors describe the presence of AHs and their functional roles in MscL and TMEM16. It will be more informative if the authors can add figures to show the structure of MscL and highlight the analogous AH. In addition, the current Supplementary Fig. 6 is not informative so it should be improved. It is not clear to the reviewer why that stretch of helix in TMEM16 is equivalent or analogous to the AH in OSCAs, either sequence alignment or a detailed structural alignment is helpful to address this point. Also, in lines 120-121, it says this helix in TMEM16 "does not present amphipathic properties", please show the sequence or amphipathicity of the helix.

      We thank the reviewer for the feedback on this figure. Supplementary Fig. 6 has been thoroughly modified to address the reviewer’s concerns. We now include a panel showing the structure of MscL and its amphipathic helix. We have modified the alignment of OSCA3.1 to a TMEM16 homolog to make clearer the homologous positioning of the helices in question and zoom in to show their sequences.

      (5) In discussion, lines 249-257, the authors referred to a recent study that suggested three evolutionarily coupled residue pairs located on BLD and TM6b. The authors speculate that the reason they did not observe a significant effect of channel response to poke/stretch stimuli in the BLD swapping between OSCA1.2 and 3.1 is due to the 2 of 3 salt bridges remaining for the residue pairs. To test the importance of these residue pairs and their coupling for channel gating, instead of swapping the entire BLD, can the authors systematically mutate the residue pairs, disrupt the salt-bridge interactions, and analyze the effect on channel response to mechanical force?

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (6) The reviewer suggests the authors tone down the elaboration of polymodal activation of OSCA by membrane poking and stretch.

      We believe the idea of polymodal activation is sufficiently toned down as we only postulate it as a possibility and following we give an alternative explanation based on methodological limitations: “Nonetheless, the discrepancy could be due to inherent methodological differences between these two assays, as whole-cell recordings during poking involve channels in inaccessible membranes (at the cell-substrate interface) and channel interactions with extracellular and intracellular components, while the stretch assay is limited to recording channels inside the patch.”

      (7) In lines 81-83, the authors described the BLD as showing increased flexibility, and the EM map at this region is less well resolved for registry assignment. In the method for cryo-EM image processing and Supplementary Fig. 1, the authors only carried out 3D refinement and classification at the full channel level. Have the authors attempted to do focus refinement or classification at the BLD domain in order to improve the local resolution or to sort out conformational heterogeneity? The reviewer suggests doing so because the BLD domain is a hot spot that the authors have proposed to play an important role in OSCA mechanosensation. Conformational changes identified in this region might provide insights into its role in the channel function.

      We thank the reviewer for this suggestion. We have performed focused classification on the BLD with and without surrounding regions and, in our hands, it did not improve the resolution or provide further insights.

      Reviewer #3 (Recommendations For The Authors):

      Here are a few specific minor corrections that should be addressed

      (1) In lines 117-135, in the discussion of Figure 2, the data shows an apparent increase in the poking threshold to gate W75K/L80E. The substantial increase in the depth required to gate the channel suggests that these channels are less sensitive to poking. Would it be possible to compare the depth at which these two patches show activity and the depth at which the other 22 cells ruptured? Line 161 mentions that the rupture threshold of HEK cells is close to the gating of OSCA3.1 at 13.8 µm.

      The distance just before the cell ruptured in 22 cells with no response was 12.5 +/- 2.5 um. The distance at which the cells ruptured was 0.5 um more (13 +/- 2.5 n=22). We have added this last value in line #137.

      (2) Would it be possible in Figures 2 panels b and c, 3, and figure 4 to label the WT as WT OSCA1.2?

      We thank the reviewer for pointing this out. We agree this modification will improve the clarity of the figures and have changed the figures to follow the reviewer’s suggestion.

      (3) Can you provide a western blot of the mutations described in Figure 2? This would provide insight into the amount of protein at the cell surface and available to respond to poking, the stretch data shows that these channels are in the membrane but does not show if they are in the membrane in similar quantities.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (4) The functional differences between the two channels are projected to be tied to several distinct point mutations, however, the data could be strengthened by additional point mutations at all sites to show that the phenotypes are due to the mutations specifically not just any mutation in the region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript from Mukherjee et al examines potential connections between telomere length and tumor immune responses. This examination is based on the premise that telomeres and tumor immunity have each been shown to play separate, but important, roles in cancer progression and prognosis as well as prior correlative findings between telomere length and immunity. In keeping with a potential connection between telomere length and tumor immunity, the authors find that long telomere length is associated with reduced expression of the cytokine receptor IL1R1. Long telomere length is also associated with reduced TRF2 occupancy at the putative IL1R1 promoter. These observations lead the authors towards a model in which reduced telomere occupancy of TRF2 - due to telomere shortening - promotes IL1R1 transcription via recruitment of the p300 histone acetyltransferase. This model is based on earlier studies from this group (i.e. Mukherjee et al., 2019) which first proposed that telomere length can influence gene expression by enabling TRF2 binding and gene transactivation at telomere-distal sites. Further mechanistic work suggests that G-quadruplexes are important for TRF2 binding to IL1R1 promoter and that TRF2 acetylation is necessary for p300 recruitment. Complementary studies in human triple-negative breast cancer cells add potential clinical relevance but do not possess a direct connection to the proposed model. Overall, the article presents several interesting observations, but disconnection across central elements of the model and the marginal degree of the data leave open significant uncertainty regarding the conclusions.

      Strengths:

      Many of the key results are examined across multiple cell models.

      The authors propose a highly innovative model to explain their results.

      Weaknesses:

      Although the authors attempt to replicate most key results across multiple models, the results are often marginal or appear to lack statistical significance. For example, the reduction in IL1R1 protein levels observed in HT1080 cells that possess long telomeres relative to HT1080 short telomere cells appears to be modest (Supplementary Figure 1I). Associated changes in IL1R1 mRNA levels are similarly modest.

      Related to the point above, a lack of strong functional studies leaves an open question as to whether observed changes in IL1R1 expression across telomere short/long cancer cells are biologically meaningful.

      Statistical significance is described sporadically throughout the paper. Most major trends hold, but the statistical significance of the results is often unclear. For example, Figure 1A uses a statistical test to show statistically significant increases in TRF2 occupancy at the IL1R1 promoter in short telomere HT1080 relative to long telomere HT1080. However, similar experiments (i.e. Figure 2B, Figure 4A - D) lack statistical tests.

      TRF2 overexpression resulted in ~ 5-fold or more change in IL1R1 expression. Compared to this, telomere length-dependent alterations in IL1R1 expression, although about 2-fold, appear modest (~ 50% reduction in cells with long telomeres across different model systems used). Notably, this was consistent and significant across cell-based model systems and xenograft tumors (see Figure 1). Unlike TRF2 induction, telomere elongation or shortening vary within the permissible physiological limits of cells. This is likely to result in the observed variation in IL1R1 levels. For biological relevance, we further demonstrated that IL1 signalling in TNBC tissue and tumor organoids, and M2 macrophage infiltration, was significantly dependent on telomere length. Details of tests of significance were included in the individual figure legends. Based on the comment here we will expand on it in a dedicated paragraph in the methods section to make the information clearer for readers. We noticed that the stars (*) denoting statistical significance were omitted in some ChIP-experiment figures. This was likely an error during figure assembly for PDF conversion. We thank the reviewer for bringing this up; necessary changes will be made in the revised manuscript.

      Reviewer #2 (Public Review):

      This study highlights the role of telomeres in modulating IL-1 signaling and tumor immunity. The authors demonstrate a strong correlation between telomere length and IL-1 signaling by analyzing TNBC patient samples and tumor-derived organoids. Mechanistic insights revealed non-telomeric TRF2 binding at the IL-1R1. The observed effects on NF-kB signaling and subsequent alterations in cytokine expression contribute significantly to our understanding of the complex interplay between telomeres and the tumor microenvironment. Furthermore, the study reports that the length of telomeres and IL-1R1 expression is associated with TAM enrichment. However, the manuscript lacks in-depth mechanistic insights into how telomere length affects IL-1R1 expression. Overall, this work broadens our understanding of telomere biology.

      The mechanism of how telomere length affects IL1R1 expression involves sequestration and reallocation of TRF2 between telomeres and gene promoters (in this case, the IL1R1 promoter). We have previously shown this across multiple genomic sites (Mukherjee et al, 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). We have described this in the manuscript along with references citing the previous works. A scheme explaining the model was provided as Additional Supplementary Figure 1, along with a description of the mechanistic model.

      Figure 1-4 in main figures describe the molecular mechanism of telomere-dependent IL1R1 activation. This includes ChIP data for TRF2 on the IL1R1 promoter in long/short telomeres, as well as TRF2-mediated histone/p300 recruitment and IL1R1 gene expression. We further show how specific acetylation on TRF2 is crucial for TRF2-mediated IL1R1 regulation (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, entitled "Telomere length sensitive regulation of Interleukin Receptor 1 type 1 (IL1R1) by the shelterin protein TRF2 modulates immune signalling in the tumour microenvironment", Dr. Mukherjee and colleagues pointed out clarifying the extra-telomeric role of TRF2 in regulating IL1R1 expression with consequent impact on TAMs tumor-infiltration.

      Strengths:

      Upon careful manuscript evaluation, I feel that the presented story is undoubtedly well conceived. At the technical level, experiments have been properly performed and the obtained results support the authors' conclusions.

      Weaknesses:

      Unfortunately, the covered topic is not particularly novel. In detail, the TRF2 capability of binding extratelomeric foci in cells with short telomeres has been well demonstrated in a previous work published by the same research group. The capability of TRF2 to regulate gene expression is well-known, the capability of TRF2 to interact with p300 has been already demonstrated and, finally, the capability of TRF2 to regulate TAMs infiltration (that is the effective novelty of the manuscript) appears as an obvious consequence of IL1R1 modulation (this is probably due to the current manuscript organization).

      Here we studied the TRF2-IL1R1 regulatory axis (not reported earlier by us or others) as a case of the telomere sequestration model that we described earlier (Mukherjee et al., 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). This manuscript demonstrates the effect of the TRF2-IL1R1 regulation on telomere-sensitive tumor macrophage recruitment. To the best of our knowledge, no previous study connects telomeres of tumor cells mechanistically to the tumor immune microenvironment. Here we focused on the IL1R1 promoter and provided mechanistic evidence for acetylated-TRF2 engaging the HAT p300 for epigenetically altering the promoter. This mechanism of TRF2 mediated activation has not been previously reported. Further, the function of a specific post translational modification (acetylation of the lysine residue 293K) of TRF2 in IL1R1 regulation is described for the first time. Additional experiments showed that TRF2-acetylation mutants, when targeted to the IL1R1 promoter, significantly alter the transcriptional state of the IL1R1 promoter. To our knowledge, the function of any TRF2 residue in transcriptional activation had not been previously described. Taken together, these demonstrate novel insights into the mechanism of TRF2-mediated gene regulation, that is telomere-sensitive, and affects the tumor-immune microenvironment. We are considering the suggestion to reorganize the manuscript to highlight the novel aspects of our work more convincingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      (1) Please expand methods with additional details related to cell co-culture, such as cell numbers and duration.

      We thank the reviewer for the careful reading and constructive suggestions and we are sorry to make you confused. We have added the experimental details (manuscript line 551-553) related to co-culture in the revised manuscript.

      (2) Please unify the writing of the abbreviation of small extracellular vesicles in the text, figure, and caption.

      Thank you for your comments. We have unified the abbreviation of extracellular vesicles to sEVs in the revised manuscript.

      (3) The effects of components other than sEVs in mechanically stimulated osteocyte CM on the proliferation of NSCLC cells should be evaluated.

      We evaluated the effects of SF, lEVs and sEVs in osteocyte CM on NSCLC cell proliferation under mechanical stimulation, and found that sEVs had the most obvious inhibition on NSCLC cell proliferation, as shown in the revised Supplemental Figure 4c, d.

      (4) In addition to osteocytes and osteoblasts, the effects of other types of cells on the proliferation of NSCLC cells should be detected. It is recommended to add at least one type of cell from an infrequent metastatic site of NSCLC as a negative control.

      We thank the reviewer for the suggestion. We added NCM460 cell line (derived from intestinal epithelium) as a negative control and found that NCM460 had no significant effect on NSCLC cell proliferation, as shown in Figure 1d. These experiments were conducted before our last submission.

      (5) The bone microenvironment is complex. It is recommended to evaluate the effect of bone marrow-derived sEVs on NSCLC to validate whether the tumor suppressive effect of osteocyte sEVs is unique.

      We thank the reviewer for the suggestion. We agree with the reviewer’s comments that the bone microenvironment is complex. We explored the effect of bone marrow-derived sEVs on NSCLC cell proliferation and found that bone marrow-derived sEVs promoted NSCLC cell proliferation, as shown in Supplemental Figure 2g, h in the revised manuscript.

      (6) The description of exercise preconditioning is not clear enough. It is recommended to supplement the pattern diagram to improve readability. Exercise preconditioning should be further discussed by the Authors.

      Thank you for your comments and we are sorry to make you confused. We have added the pattern diagram of the exercise preconditioning in Supplemental Figure 6a.

      Reviewer #2 (Recommendations For The Authors):

      (1) The histological images are analyzed in a qualitative manner, with no description of the methodology used. A quantitative assessment of the distance and level of Ki-67+ NSCLC cells needs to be performed in human and murine tissues. Because in bone metastases cancer cells are frequently mixed with bone marrow cells, the inclusion of a cell marker to identify NSCLC cells is needed for proper interpretation of the imaging data.

      We thank the reviewer for the careful reading and constructive suggestions. We conducted the suggested quantitative assessment and descripted the methodology in the revised manuscript. The results showed that Ki-67 was lower in tumor cells adjacent to bone tissue than in the surrounding tumor cells (Figure 1a, b).

      In order to effectively identify NSCLC cells in bone metastases, GFP-expressing NSCLC cells were used in the animal model. We have added the immunofluorescence analysis of GFP and CCND3 in Supplemental Figure 4e, 4g, 5 and 6b.

      (2) The authors rely on KI-67 as a marker of proliferation. Yet, it is intriguing that some osteocytes, non-proliferating cells by definition, are often positive for this marker, which questions the specificity of the staining. The authors should provide the proper immunostaining controls to check for specificity and use additional markers of proliferation to confirm these results.

      We thank the reviewer for the suggestions. Ki-67 staining was wildly used to determine the dormancy of tumor cells in previous studies [1-4]. To confirm the results of Ki-67 staining, we used cyclin D3 (CCND3) as an additional marker of proliferation as suggested by the reviewer. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of KI-67.

      (3) The lack of proper controls in the in vivo experiments makes the interpretation of the data difficult. For instance, in the preconditioning experiment, it is likely that the bone mass increases. thus, these mice start with high bone mass than the control mice. The lack of a proper control (naive mice exposed to moderate exercise) does not allow testing if the presence of cancer cells still promotes bone loss in this group. The authors need to include naive mice or analyze the bones from the non-injected contralateral legs.

      We thank the reviewer for the thoughtful comments and we are sorry to make you confused. We absolutely agree with the reviewer that the bone mass increases after exercise preconditioning. Multiple tissues and organ systems are affected by exercise, initiating diverse homeostatic responses. Although exercise preconditioning effectively suppressed bone metastasis progression of NSCLC as mentioned in the previous manuscript, we cannot immediately conclude that it is completely dependent on osteocytes to function. The mechanism of exercise preconditioning in suppressing bone metastasis progression is complex which still need further exploration. The revised manuscript has expanded the discussion on this area (manuscript line 326-328).

      (4)Further, validating the in vivo work with other osteocyte-like cells or primary osteocytes would have strengthened the results.

      We thank the reviewer for the suggestion. We have conducted the experiments of co-culture of MLO-A5 (another type of osteogenic cell line) and NSCLC cells as shown in Supplemental Figure 1g. Not surprisingly, MLO-A5 cells also had an inhibitory effect on proliferation of NSCLC cells.

      (5) The data on miRNA99b-3p on NSCLC in Supplementary Figure 3 is not convincing. The positive cells are difficult to see and most of the osteocyte lack nuclei. Better data, in humans and the mouse model, is needed to confirm that osteocytes produce miRNA99b-3p.

      We thank the reviewer for the comments and we are sorry to make you confused. In this study, we used miRCURY LNA miRNA detection probes in ISH without staining the nuclei in the tissues, which method have been used in our previous studies with others [5-7]. Detailed experimental procedures for ISH of miRNA have been added in the revised manuscript (manuscript line 461-474).

      (6) The authors do not provide a piece of data supporting that osteocytes are responsible for any of the effects seen by the interventions done in the in vivo models. Osteocytes, as well as other bone cells, can respond to mechanical stimulation and thus could virtually be responsible for the protective effects of mechanical loading or moderate exercise. In vivo experiments demonstrating a direct role of osteocytes-produced miRNA99b-3p are needed to support the notion that osteocytes maintain tumor dormancy in NSCLC bone metastasis.

      We thank the reviewer for the thoughtful comments and suggestion. We constructed in vivo model by injecting with antagomir-NC and antagomir-99b-3p with mechanical loading [8]. The results showed that the injection of antagomiR-99b-3p could partially and effectively rescue the inhibitory effect on NSCLC cell proliferation (Figure 4i-k).

      (7) Further, the authors solely rely on Ki-67 as a marker of dormancy. Completing this analysis with an assessment of a dormant gene expression signature or in vivo studies assessing tumor dormancy directly would be needed to confirm this notion.

      We thank the reviewer for the suggestion. We conducted the suggested experiment by using CCND3 as an additional dormancy marker. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of Ki-67.

      References

      [1] Guba M, Cernaianu G, Koehl G et al. A primary tumor promotes dormancy of solitary tumor cells before inhibiting angiogenesis. Cancer Res, 2001, 61: 5575-9.

      [2] Bliss Sarah A, Sinha Garima, Sandiford Oleta A et al. Mesenchymal Stem Cell-Derived Exosomes Stimulate Cycling Quiescence and Early Breast Cancer Dormancy in Bone Marrow. Cancer Res, 2016, 76: 5832-5844.

      [3] Correia Ana Luísa, Guimaraes Joao C, Auf der Maur Priska et al. Hepatic stellate cells suppress NK cell-sustained breast cancer dormancy. Nature, 2021, 594: 566-571.

      [4] Hu Jing, Sánchez-Rivera Francisco J, Wang Zhenghan et al. STING inhibits the reactivation of dormant metastasis in lung adenocarcinoma. Nature, 2023, 616: 806-813.

      [5] Song Qiancheng, Xu Yuanfei, Yang Cuilan et al. miR-483-5p promotes invasion and metastasis of lung adenocarcinoma by targeting RhoGDI1 and ALCAM. Cancer Res, 2014, 74: 3031-42.

      [6] Carotenuto Pietro, Hedayat Somaieh, Fassan Matteo et al. Modulation of Biliary Cancer Chemo-Resistance Through MicroRNA-Mediated Rewiring of the Expansion of CD133+ Cells. Hepatology, 2020, 72: 982-996.

      [7] Lv Yan, Wang Yin, Song Yu et al. LncRNA PINK1-AS promotes Gαi1-driven gastric cancer tumorigenesis by sponging microRNA-200a. Oncogene, 2021, 40: 3826-3844.

      [8] Zhang Yun, Li Shuaijun, Jin Peisheng et al. Dual functions of microRNA-17 in maintaining cartilage homeostasis and protection against osteoarthritis. Nat Commun, 2022, 13: 2447.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans. In this manuscript, the authors generated TRIP13 null mice and Flag-tagged TRIP13 knock-in mice to study its role in meiosis. They demonstrate that TRIP13 regulates MORMA domain proteins and is essential for meiotic completion and fertility. The main impact of this manuscript is its clarification of the in vivo function of TRIP13 during mouse meiosis and its previously unrecognized role as a dose-sensitive regulator of meiosis.

      Strengths:

      Two previously reported Trip13 mutations in mice are both hypomorphic alleles with distinct phenotypes, precluding a conclusion on its function. This study for the first time generated the TRIP13 null mice, definitively revealing the function of TRIP13 in meiosis. The authors also show the novel localization of TRIP13 at SC and its independence from the axial element components. The finding of dose-sensitive regulation of meiosis by TRIP13 has implications in understanding human meiosis and disease phenotypes.

      Weaknesses:

      This manuscript would be more impactful if more mechanistic advancements could be made. For example, the authors could follow up with one of the new interactors identified by MS to offer new insight into the molecular function of TRIP13.

      We agree that it would be interesting to follow up on new candidate interactors but think that it would be more feasible to follow up on them in future studies.

      Reviewer #2 (Public Review):

      Summary and Strengths:

      In this manuscript, Chotiner and colleagues demonstrated the localization of TRIP13 and clarified the phenotypes of Trip13-null mice in mouse meiosis. The meiotic phenotypes of Trip13 have been well characterized using the hypomorph alleles in the literature. However, the null phenotypes have not been examined, and the localization of TRIP13 was not clearly demonstrated. The study fills these important knowledge gaps in the field. The demonstration of TRIP13 localization to SC in mice provides an explanation of how HOMRA domain proteins are evicted from SC in diverse organisms. This conclusion was confirmed in both IF and TRIP13-tagged Tg mice. Further, the phenotypes of Trip13-null mice are very clear. The manuscript is well crafted, and the discussion section is well organized and comprehends the topic in the field. All in all, the manuscript will provide important knowledge in the field of meiosis.

      Weaknesses:

      The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. However, the authors did not examine meiotic recombination in the Trip13-null mice.

      Meiotic recombination was extensively characterized in Trip13 severe hypomorph mutants in two previous studies: gamma-H2AX, BLM, BRCA1, ATR, RPA, RAD51, DMC1, MLH1 (Li and Schimenti, 2007; Roig et al., 2010). All the meiotic defects in our Trip13-null mice were also present in Trip13 severe hypermorph mutants: meiotic arrest, defects in chromosomal synapsis, asynapsis at chromosomal ends, and accumulation of HORMAD1/2 on the SC axis. Therefore, the defects in meiotic recombination in Trip13-null mice are expected to be similar to those in Trip13 severe hypermorph mutants and thus we did not examine the proteins involved in meiotic recombination in the Trip13-null mutant.

      Reviewer #3 (Public Review):

      Summary:

      The authors perform a thorough examination of the phenotypes of a newly generated Trip13 null allele in mice, noting defects in chromosome synapsis and impact on localization of other key proteins (namely HORMADs) on meiotic chromosomes. The vast majority of data confirms observations of several prior studies of Trip13 alleles (moderate and severe hypomorphs). The original or primary aims of the study aren't clear, but it can be assumed that the authors wanted to better study the role of this protein in evicting HORMADs upon synapsis by studying phenotypes of mutants and better characterizing TRIP13 localization data (which they find localizes to the central element of synapsed chromosomes using a new epitope-tagged allele). Their data confirm prior reports and are consistent with localization data of the orthologous Pch2 protein in many other organisms.

      Strengths:

      The quality of data is high. Probably the most important data the authors find is that TRIP13 is localized along the CE of synapsed chromosomes. However, this was not unexpected because PCH2 is also similarly localized. Also, the authors use a clear null (deletion allele), whereas prior studies used hypomorphs.

      Weaknesses:

      There is limited new data; most are confirmatory or expected (i.e., SC localization), and thus the impact of this report is not high. The claim that TRIP13 "functions as a dosage-sensitive regulator of meiosis" is exaggerated in my opinion. Indeed, the authors make the observation that hets have a phenotype, but numerous genes have haploinsufficient phenotypes. In my opinion, it is a leap to extrapolate this to infer that TRIP13 is a "regulator" of meiosis. What is the definition of a meiosis regulator? Is it at the apex of the meiosis process, or is it a crucial cog of any aspect of meiosis?

      TRIP13 is not haploinsufficient, as Trip13 heterozygotes were still viable and fertile (albeit with defects in meiosis). TRIP13 is an ATPase and changes the conformation of meiosis-specific proteins such as HORMAD proteins. TRIP13 is essential for meiosis and its mutations cause defects in both meiotic recombination and chromosomal synapsis. Reviewer 1 stated that “TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans”. Therefore, we feel that TRIP13 can be called a regulator of meiosis.

      Reviewer #1 (Recommendations For The Authors):

      A schematic illustration of SC structure, the components involved, and the main finding, would be helpful for readers to better understand the advancement made by this study.

      We have now added a schematic illustration in a new panel - Figure 7C.

      Fig. 1B, the stage with diplotene cells should be XII.

      The pachytene cells (Pac) were mis-labelled as diplotene cells. Corrected.

      Fig. 1C, color mislabeled.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript will provide important knowledge in the field of meiosis. I support the publication of this study. I have some suggestions to improve and polish the manuscript.

      Major points:

      (1) The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. Given the function of HORMAD1 in meiotic recombination, it would be informative if the authors could examine how major makers of meiotic recombination behave in Trip13-null meiosis.

      Please see our response to Weaknesses from Reviewer #2.

      (2) Relating to the above point, the complete lack of synapsis on the sex chromosomes in the Trip13-null meiosis is impressive. This result raises a question as to whether the pathway to designate XY-obligatory crossover (which can be detected with large foci of ANKRD31 and MEI4/REC114 at PAR) is affected or not. It would be interesting to examine whether the ANKRD31 and MEI4/REC114 foci are present on PAR in Trip13-null meiosis.

      We have performed immunofluorescent analysis of REC114 in spermatocytes. In Trip13-null pachytene-like spermatocytes, X and Y chromosomes are not synapsed. REC114 still formed one focus each on the unsynapsed X and Y chromosomes. We have added this new data in the Results as a new supplementary figure (Figure 4 -supplement 1).

      (3) Figure 4 can be improved if there are quantified data for each phenotype. These phenotypes look nearly complete, but it would be informative to show the penetrance of these phenotypes.

      Because some chromosomes have unsynapsed ends, resulting in two centromere or telomere foci, the total number of centromere or telomere foci is always higher in Trip13-null pachytene-like spermatocytes than wild type pachytene spermatocytes. Therefore, we did not count the foci of centromeres and telomeres. Consistently, the centromere and telomere markers localized as expected in both wild type and Trip13-null spermatocytes.

      (4) I am not fully convinced by these photos: "synapsed sister chromatids (Figure 6B)" and "Sycp2-/- spermatocytes formed short stretches of synapsis (Figure 6C)". The authors may try confocal microscopy with super-resolution deconvolution as they did for other data.

      These have been previously demonstrated. The “synapsed sister chromatids (Figure 6B)” were previously demonstrated by confocal microscopy with super-resolution deconvolution (Guan et al., 2020). The short stretches of synapsis in Sycp2-/- spermatocytes was previously demonstrated by electron microscopy (Tripartite SC structure) and SYCP1 immunofluorescence (Yang et al., 2006). We have revised the text by citing the previous evidence and the publications.

      Minor points:

      (1) Line 19-21: "Loss of TRIP13 leads to meiotic arrest and thus sterility in both sexes. Trip13-null meiocytes exhibit abnormal persistence of HORMAD1 and HOMRAD2 on synapsed SC". These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles. This information can be added to the abstract. Otherwise, it sounds like these are totally new findings, as written.

      This information is now added to the abstract: “These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles.”

      (2) The introduction section seems too long and contains unnecessary information. Some molecular details that are not touched in the result section can be deleted (e.g., Line 65-73).

      We would like to keep the molecular details on the two conformation states, as it provides biochemical background on TRIP13-HORMAD interactions.

      (3) Introduction, Line 92. A rationale can be added as to why the authors characterized the Trip13-null allele.

      a rationale has been added as follows: “To determine the effect of complete loss of TRIP13, we characterized Trip13-null mice.”

      (4) Line 205: Typo "TRRIP13". Corrected.

      Reviewer #3 (Recommendations For The Authors):

      Just a few recommendations:

      (1) In my opinion, the title is an overreach. "Regulator" invokes other concepts such as transcription factors.

      Please see our explanation in response to weaknesses from Reviewer #3.

      (2) The first sentence of the results deals with TRIP13 expression in only 3 tissues. The authors might look at more comprehensive RNA-seq data from mice and humans.

      We examined TRIP13 protein expression in 8 mouse tissues by WB and found that TRIP13 protein was abundant in testis but present at a very low level in ovary and liver (Figure 1A). We feel that readers can easily look up the relative transcript levels of Trip13 in more tissues from mice and humans from NCBI database under “Gene”.

      (3) The null allele is semi-lethal. Is body size affected? Were the mice abnormal in any other ways, given that TRIP13 has been implicated in other diseases and processes, and is expressed in other tissues (TRIP13 stands for Thyroid receptor interacting protein).

      The body weight of 2-3 month-old males was not significantly different between wild type (24.3±2.8 g, n=5) and Trip13 KO mice (22.8±1.7 g, n=5, p=0.3, Student’s t-Test). We have included the body weight information in the revised manuscript. We didn’t observe abnormal somatic defects in the viable Trip13-null mice, nor did the authors report any in the Trip13 hypomorph mutants in two previous studies (Li and Schimenti, 2007; Roig et al., 2010).

      (4) Line 276 : It would be nice to elaborate on the "spatial explanation."

      We meant that TRIP13 localizes to SC while HORMAD proteins are removed from SC upon chromosomal synapsis, thus providing a spatial explanation. However, we have now deleted “spatial”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      However, there are several concerns to be explained more in this study. In addition, some results should be revised and updated.

      Thank you for your comments. The concerns were addressed by the description and experiment.

      Some results were revised and updated accordingly.

      Reviewer #2 (Public Review):

      The minor weakness of the study is inconsistent use of terminology throughout the manuscript, occasional logic-jump in their flow, and missing detailed description in methodologies used either in the text or Materials and Methods section, which can be easily rectified.

      Thank you for your review. We have revised the manuscript and corrected errors according to your comments.

      Reviewer #3 (Public Review):

      Importantly, besides the Miwi ubiquitination experiment which is performed in a heterologous and therefore may not be ideal for extracting conclusions, the possible involvement of ubiquitination was not shown for any other proteins that the authors found that interact with FBXO24. Could histones and transition proteins be targets of the proposed ubiquitin ligase activity of FBXO24, and in its absence, histone replacement is abrogated?

      Thank you for your comments. The histones and transition proteins were not found in the immunoprecipitates of FBXO24, suggesting they are not the direct targets of FBXO24, shown in Figure S3G.

      Miwi should be immunoprecipitated and Miwi ubiquitination should be detected (with WB or mass spec) in WT testis.

      We agree with this suggestion. In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      Therefore, the claim that FBXO24 is essential for piRNA biogenesis/production (lines 308, 314) is not appropriately supported.

      We appreciate the comment. We have revised the description and modified the claim on page 11.

      Reviewing Editor's note for revision

      (1) As noted by all three reviewers, as currently written the rationale to focus on MIWI is not entirely clear. A transitional narrative to focus on MIWI needs to be provided as well as an explanation for how the absence of FBXO24 as an E3 ubiquitin ligase is responsible for the observed mRNA and protein differential expression.

      We appreciate your comments. We have supplemented the transitional narrative by focusing on MIWI and explained mRNA and protein differential expression upon FBXO24 deletion, shown on Page 7 and Page 13, respectively.

      (2) As it can be indirect, mass spec detection of MIWI in testis co-IP and MIWI ubiquitination should be detected (with WB or mass spec) in WT testis.

      In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      (3) Please tone down the claim that FBXO24 is essential for piRNA biogenesis/production as it requires further evidence.

      We have revised the description and modified the claim on page 11.

      (4) Ontology analysis of the genes with abnormally spliced mRNAs to provide an explanation for developmental defects.

      In the revision, we have performed the ontology analysis and provided new data regarding the abnormally spliced genes, as shown in Figure S4D.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) The authors performed mainly with the WT (or knock-in) and Fbxo24-knockout mouse model. Do the heterozygous males and their sperm have any physiological defects like FBXO24-deficient mice?

      This is a good question. We did the phenotype analysis and found that heterozygous males are all fertile, and their sperm do not have any physiological defects.

      (2) Fbxo24-KO sperm carries swollen mitochondria. How do the mitochondria affect sperm function?

      Thank you for raising this interesting question. Based on our data and published literature, the defective mitochondria were associated with energetic disturbances and reduced sperm motility, as shown on Page 12.

      (3) TEM images show that Fbxo24-KO spermatids carry swollen mitochondria and enlarged chromatoid bodies. How the swollen mitochondria and enlarged chromatid are defective for sperm motility and flagellar development, requires more explanation. In addition, it is unclear how the enlarged diameter of the chromatoid body is critical for normal sperm development.

      Thank you for your comments. The chromatoid bodies are considered to be engaged in mitochondrial sheath morphogenesis. Analysis of the chromatoid bodies' RNA content reveals enrichment of PIWI-interacting RNAs (piRNAs), further emphasizing the role of the chromatoid bodies in post-transcriptional regulation of spermatogenetic genes. We added this explanation on Page 12-13.

      (4) The authors only show band images to compare the protein amounts between WT and KO sperm and round spermatids. As the blots for loading controls are not clear, the authors should quantify the protein levels and perform a statistical comparison.

      We quantified the protein levels and performed a statistical comparison, as shown in Figure S3B.

      (5) The authors show the defective sperm head structure from Fbxo24-KO sperm in Figure 5. However, the Fbxo24-KO sperm heads seem quite normal in Figure 3. How many sperm show defective sperm head structure? In addition, the authors observed altered histone-to-protamine conversion in sperm, but it is unclear whether the altered nuclear protein conversion causes morphological defects in the sperm head.

      We appreciate the comments. In our study, we found over 80% of Fbxo24 KO sperm showed defective structure in the sperm head. Altered histone-to-protamine conversion caused the decondensed nucleus of Fbxo24 KO sperm. Notably, in many knockout mice studies, impaired chromatin condensation is frequently associated with abnormal sperm head morphology, as shown in reference 15 of Page 8.

      (6) The authors compare the protein levels of RNF8, PHF7, TSSK6, which participate in nuclear protein replacement in sperm. However, considering the sperm is the endpoint for the nuclear protein conversion, it is unclear to compare the protein levels in mature sperm. The authors might want to compare the protein levels in developing germ cells.

      Thank you for your comment. Yes, we actually detected the protein levels of RNF8, PHF7, and TSSK6 in the testes, not in sperm. We have corrected it in the Figure 5E. We apologize for our carelessness.

      (7)This reviewer suggests describing more rationales for how the authors focus on the MIWI protein. Also, it is wondered whether MIWI is also detected from testis co-IP mass spectrometry.

      We agree with this suggestion. Since MIWI was a core component of CB and also identified as an FBOX24 interacting partner from our immunoprecipitation-mass spectrometry (IP-MS) (Table S1), we focused on the examination of MIWI expression between WT and Fbxo24 KO testes. We have added this description in the revision (see lines 191-193 on page 7).

      (8) The authors need to provide a more detailed explanation for how the altered piRNA production affects physiological defects in germ cell development. In addition, it will be good to describe more how the piRNAs affect a broad range of mRNA levels.

      Thank you for your comments. The previously published studies have demonstrated that piRNAs could act as siRNAs to degrade specific mRNAs during male germ cell development and maturation. We have cited these studies on lines 369-372 of Page 13.

      (9) The authors observed an altered splicing process in the absence of FBXO24. However, it is a little bit confusing how the altered splicing events affect developmental defects. Therefore, the authors should state which mRNAs have undergone abnormal splicing processes and provide ontology analysis for the genes.

      We have performed the ontology analysis and showed the new data in Figure S4D.

      Minor comments

      (1) Figure 1A-C - Statistical comparison is missed. Numbers for biological replication should be described in corresponding legends.

      Thank you for your careful review. We have provided the statistical comparison and the numbers for biological replication in the legends of Figure 1A-C.

      (2) Figure 1E, F - Current images can't clearly resolve the nuclear localization of the FBXO24 testicular germ cells. To clarify the intracellular localization, the authors should provide images with higher resolution.

      The resolution of Figure 1E, F was improved, as suggested. Thank you!

      (3) Figure 1E, F - Scale bar information is missing.

      The scale bars of Figure 1E, F were provided.

      (4) It will be much better to show the predicted frameshift and early termination of the protein translation in Fbxo24-knockout mice.

      The predicted frameshift of Fbxo24-knockout mice was added and shown in Figure S1B.

      (5) It is required to provide primer information for qPCR.

      The primer information for qPCR was provided, as shown in Table S7.

      (6) The authors describe that Fbxo24-KO sperm show abrupt bending of the tail. However, the description is unclear and the sperm shown in Figure 3C seems quite normal. The authors should clarify the abnormal bending pattern of the tail and show quantified results.

      Thank you for pointing out this issue. In Fbxo24 KO sperm, abnormal bending of the sperm tails mainly included neck bending and midpiece bending. We have shown them in Figure S3A.

      (7) The authors mention that Fbxo24-KO sperm have swollen mitochondria at the midpiece, but this is also unclear. How many mitochondria are swollen in Fbxo24-KO sperm?

      This is a good question. However, since it is very difficult to observe all of the mitochondria in each sperm using the electronic microscope, we could not quantify the swollen mitochondria in Fbxo24 KO sperm.

      (8) Scale bar information is missed - Fig 3C insets, Fig 3D, Fig 3F insets, 4A insets, Figure 4C insets.

      All the scale bars have been added.

      (9) How many sperm have annulus defects? In Figure 3F, WT sperm does not have an annulus, which could be damaged during sample preparation. Is the annulus defects in Fbxo24-KO sperm consistent?

      Thank you for asking these questions. Based on our results, about 30% of Fbxo24 KO sperm showed defective annulus structure. Since both TEM (Figure 3F) and SEM (Figure 3G) results clearly showed the defective annulus structure of Fbxo24 KO sperm, we believe the annulus defects are consistent and highly unlikely caused by sample preparation.

      (10) A Cross-section image for the endpiece of Fbxo24-KO sperm is not suitable. There is a longitudinal column structure of the principal piece.

      Thank you for your comments. It is difficult to observe a completely longitudinal structure of sperm tail under TEM. The cross-section of the endpiece and principal piece allowed us know the structure of the axoneme, ODFs and fibrous sheath (FS).

      (11) The endpiece of Fbxo24-KO sperm seems to have a normal axoneme. Do all endpieces of Fbxo24KO sperm have normal axoneme? Also, the authors need to describe whether an axonemal structure is damaged and disrupted in all Fbxo24-KO sperm.

      Our TEM data showed the axonemal structure was impaired in the endpiece of Fbxo24 KO sperm (See right panels of Figure 3H). Moreover, based on the ultrastructure analysis of TEM, we found over 90% of Fbxo24 sperm had a damaged axonemal structure.

      (12) Reference blots in Fig 3I, 3J, 4E (left), 5C and 5E are quite faint. The authors should replace the blot images.

      Thank you for pointing out this. We have rerun Western blot multiple times but could not obtain better images due to antibody sensitivity. However, we quantified the protein levels and performed a statistical comparison, as shown in Figure S3B, to establish a good readout from these images for the readers.

      (13) Loading controls are required - 7D-H.

      Done as suggested. Thanks!

      (14) How do the authors measure the midpiece length? From where to where? This should be clarified.

      Good question. We measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this on Page 16.

      (15) How are the bands for Fbxo24 shifted during IP in Fig 7A?

      The protein modification in the interaction may cause the band shift.

      (16) There are several typos throughout the manuscript. Please check carefully and fix them.

      Thank you for your careful review. We have corrected and fixed all the typos as far as we can.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      (1) Please provide a schematic of HA-Fbxo24 knock-in construct and strategy together with knockout (Figure S2) or even separately early in Figure S1. The description of using the transgenic mouse is mentioned even earlier than the knockout but there are no citations or methods provided in the text other than that listed in Materials and Methods.

      Thank you for your suggestion. As suggested, the schematic of the HA-Fbxo24 knock-in strategy has been supplemented in Figure S2A. The description of using the transgenic mouse has been added to the results, as shown on page 4 of lines 102-103.

      Also, it is not clear to what extent the phenotypic and molecular characterization of HA-transgenic mice is performed. For example, Lines 134-139: The use of Fbxo24-HA labeled transgenic mice results in the rescue of spermatogenesis and fertility as shown in Figure 2F by measuring the litter size. It is not clear how this observation leads the author to state that this rescues defects in spermiogenesis. Please clarify how and what other measures are taken to support this conclusion. Is the observed infertility due to defects in spermatogenesis or spermiogenesis?

      Thank you for your question. We crossed FBXO24-HATag males with FBXO24−/− females to obtain FBXO24−/−; FBXO24-HATag males. We examined the testes volume and histological morphology of FBXO24−/−; FBXO24-HATag males and found that they were similar to FBXO24+/−; FBXO24-HATag littermates, indicating that spermatogenesis was restored, as shown in Figure S2H.

      (2) Line 107 vs Line 114: Please use the terminology spermatogenesis and spermiogenesis consistently throughout the text. Earlier in the introduction, the authors clearly defined that spermatogenesis involves three phases, with the third phase referred to as spermiogenesis. However, the author concludes in the first line that "FBXO24 plays a role during spermatogenesis" while summarizing at the end of the paragraph that this protein is "expressed in haploid spermatids specifically during spermiogenesis". Therefore, it is not clear whether the authors conclude that FBXO24 is important for all of spermatogenesis (line 107) or only for part of spermiogenesis (line 114). Another example is line 219 vs. 238: At this point in the manuscript, it is again unclear whether the authors want to study molecular changes during spermatogenesis or spermiogenesis upon FBXO24 depletion. Many examples of such cases throughout the text, and it is recommended to be consistent in using more restrictive terminology whenever applicable for a clear interpretation.

      We thank you for your careful review. We have double-checked the terminology of spermatogenesis and spermiogenesis and made it consistent throughout the text of the revised manuscript.

      (3) It is not clear how rampant/frequent the Fbxo24-knockout sperm show defects in head morphology based on Figures 3C, 3F, and 5A since it seems that there are some sperm showing relatively normallooking sperm heads. Please provide quantification.

      We have performed the quantification and found that over 80% of Fbxo24 KO sperm showed defective structures in the sperm head.

      (4) Figure 3B: The authors describe in the figure legend that 3 mice were analyzed in each group. The standard deviation for the WT analysis is missing, or if the author wanted to set the WT value to 100%, the bar and scale shown on the y-axis do not fit. The value for WT looks more like 95%.

      We have indeed analyzed sperm motility based on the WT value set at 100% and have revised Figure 3B in the revision. We apologize for this oversight.

      (5) Figure 3 B and C: It is not clear how the motility is measured. Is CASA used (not described in Methods). The conclusion about abnormal flagellar bending in KO spermatozoa cannot be drawn from the static microscopic images alone. Please provide more details of motility analysis together with videos of live cell imaging.

      The sperm motility was measured manually using a hemocytometer, according to the reference.

      We provided the details of sperm motility analysis in the Materials and Methods section on Page 16.

      (6) Figure 3 I and J: These are one of a few figures that are not supported by statistical analysis. In particular, for 3I, GAPDH controls of WT and KO protein do not show equal loading, which could explain the lower expression of the KO protein. Please show normalized bar graphs with multiple biological replicates or at least show a representee technical replicat that shows equal loading of GAPDH to better support the conclusion.

      Thank you for your suggestion. Statistical comparison of relative protein expression was supplemented, as shown in new Figure S3B.

      (7) Line 184: It is not clear how the authors define a swollen mitochondrion? Are there any size criteria (roundness) that can be measured to distinguish between a swollen and a non-swollen mitochondrion? It is recommended to use another terminology as often 'swollen' implies there is a difference in osmolarity but there is no experiment to support this implication.

      Thank you for your comment. We have changed the “swollen” to “vacuolar” in the revision, as shown on Page 7.

      (8) Figure S4, without a bright field image, it is hard to see the purity and morphology of the isolated prep. Please provide the bright field images together or as overlaid images.

      We agree with your comment. We have provided the overlaid images in new Figure S4A.

      (9) There is a big logic jump in what prompts the authors to look MIWI protein level and link the observation to MIWI/piRNA pathway in both Introduction and Results while it is one of the main findings. It is recommended to provide a better rationale and logical flow in the text.

      Thank you for your suggestion. We have added a sentence explaining why we wanted to focus on studying MIWI expression (see lines 190-193 on page 7).

      Minor comments

      (1) Please keep all the conventions of gene vs. protein nomenclature. For example, write the genes mentioned in the figures in italics with the first letter in Capital, as it is done in the main part. Proteins should be in ALL CAPITAL like FBXO24.

      The names of gene and protein have been revised in the revision, as suggested.

      (2) In the MM section, the name of the manufacturer and the location of the materials used are missing in several sections. Please go back through the MM section and add this information in the appropriate places.

      Done as suggested. Thank you!

      (3) On page 4, the authors mentioned that "Further qPCR analysis of developmental testes and purified testicular cells showed that FBXO24 mRNA was highly expressed in the round spermatids and elongating spermatids (Fig 1B-C)". Please include statistical analyses for Fig 1B-C as well as for Fig 1A to support the written statements.

      Statistical comparison was supplemented, as shown in Figure 1. P-values are denoted in figures by *p < 0.05.

      (4) Figure 3E: Please describe in more detail how the length of the midpiece was measured. Was it based on TEM images or based on fluorescent images using MitoTracker?

      As we responded to Reviewer #1, we measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this in the Method and Material section on Page 16.

      (5) Line 431: In the "Electron Microscopy" section of the MM part, the author should indicate the ascending ethanol series (%) used.

      Done as suggested. Thank you!

      (6) Line 432: The thickness of the sections prepared is missing, as well as an indication of the microtome used.

      We have added thickness and the microtome in the Method and Material section on Page 16.

      (7) Line 433: If the generated tiff files have been processed with Adobe Photoshop, this information is missing.

      We have provided information on the usage of Adobe Photoshop for the generation of tiff files on Page 17.

      (8) Lines 445, 452, 467: In some places in the paper, the temperature is written with a space between the number and {degree sign}C, and sometimes it is not. Please go through the paper and make it consistent. The usual spelling is 4{degree sign}C.

      We have gone through the manuscript and checked all the spelling of temperature writing to make them consistent. Thank you for careful review.

      (9) Line 469: The gel documentation system used is not mentioned.

      Done as suggested. Thank you!

      (10) Line 469: The 'TM' should be superscripted.

      Done as suggested.

      (11) Line 489: A space is missing between the changes and the parenthesis.

      Done as suggested.

      (12) Line 495-496: The authors write that the fractions enriched with round spermatids after sedimentation were collected manually. Was a determination of cell concentration - e.g., 2 x106 cells/ml -performed after collection of the cells? How were the cells stored until use? Please add the sedimentation time and used temperature.

      Store the cell in the 1´ Krebs buffer on ice. The cell sediment was through a BSA density gradient for 1.5 h at 4°C. The cell concentration was determined after collection, as shown on Page 18.

      (13) Line 505: spelling error. Instead of " manufacturer's procedure" it is written manufactures' instructions.

      The spelling error was corrected.

      (14) Line 520: Please write a short sentence on how the purification of the 16-40 nt long RNA was performed.

      The length of 16–40 nt RNA was enriched by polyacrylamide gel electrophoresis. We added this information on Page 19 of line 531.

      (15) Line 528: The version of the used GraphPad software is missing.

      The version of GraphPad software was supplemented, as shown on Page 19.

      (16) Line 677: For qPCR analyses, the number of mice analyzed (N) and a statistical evaluation are missing.

      The statistical comparison and the numbers for biological replication were added, as shown on Page 26.

      (17) Figure 3D: Please add a scale bar.

      Done as suggested. Thanks!

      (18) Line 371 and Line 377: Two times "in summary" is written. Please make one summary for the whole paper.

      This sentence was revised, as shown in Page 13.

      (19) Line 382: To be consistent in the whole paper, please write Figure 10 in bold letters.

      Done as suggested.

      (20) Please make the size and font of the references consistent with the main text.

      Done as suggested. Thanks again for your careful review.

      Reviewer #3 (Recommendations For The Authors):

      I would like to see the description of the FBXO24 immunoprecipitation experiment performed in HEK293T cells. This somatic cell line does not normally express Miwi, so how Miwi was detected in FBXO24 mCherry IP beads? It is not mentioned if Miwi is expressed from a recombinant vector in this experiment. Similarly, I would like to see a better description of the experiment described in the same paragraph towards the end of it with the ubiquitin peptides, it is not clear.

      Thank you for your comments. FBXO24-mCherry was expressed in HEK293T cells and the immunoprecipitates was incubated with the protein lysate of the testes (see lines 268-272 on Page 10). The description of the ubiquitin experiment was added as well, as shown in lines 283-286 on Page 10.

      Line 263: I think the term ectopic here is not appropriate, a correction is needed.

      We have changed “ectopic” to “increased” in the revision (see line 268 on Page 10).

      I would like the authors to provide a tentative explanation or evidence of why FBXO24 KO males are completely sterile, even though there are still mature sperm produced with some motility. Since there are defects in nuclear condensation it will be very relevant to check DNA damage/fragmentation, which could contribute to the sterility phenotype.

      This is a good suggestion. We reanalyzed the sperm DNA damage by TUNEL staining and shown the new data in Figure S3E-F.

      Line 213: There have been some conflicting reports about the role of RNF8 in spermiogenesis, but a recent report has shown that RNF8 is not involved in histone PTMs that mediate histone to protamine transition (Abe et al Biol Reprod 2021 https://doi.org/10.1093%2Fbiolre%2Fioab132).

      Thank you for your comment. We have cited this critical reference and discussed it in Discussion section on Page 12.

      Figure 7: I would like to see zoomed-out views of the affected exons, so that flanking unaffected exons can be used as a reference for unaffected splicing. Most of the genome browser views in this image only show affected exons and it is impossible to see if these alone are affected or if the reduced RNAseq coverage in those exons is a result of overall reduced mapped reads in these genes. Also, a fixed Y axis with the same max value should be shown for these genome browser snapshots so that the expression level is comparable between the two genotypes.

      Thank you for your comments. Loading control of RT-PCR and scale range of Y axis were added in new Figure 7.

      Minor corrections:

      Line 70: correct "..functions as protein-protein interaction..".

      Thank you for your careful review. We have corrected this sentence (see line 69 on Page 3).

      Line 101: correct "..qPCR analysis of developmental testis..".

      We have corrected this sentence (see line 100 on Page 4). Thanks again.

      Line 116: correct "..results in detective..".

      Corrected.

      Line 186: correct ".. explored..".

      Corrected.

      Line 218: correct ".. gene expressions.

      Corrected.

      Line 221: correct "..genes significantly differentiated expressed".

      Corrected.

      Line 241: FBXO24 was shown earlier in both cytoplasm and nucleus.

      We have changed “FBXO24 is mainly confined to the nucleus” to “FBXO24 expressed in the nucleus”, as shown in line 247 on Page 9.

      Line 501-502: correct "..reverse transcriptional".

      “reverse transcriptional” was changed into “reverse transcription”, showing in Page 18.

      Line 686: correct ".. deficiency male..".

      Corrected.

      Line 769: correct "..Western blots were adopted..".

      Corrected.

      Line 784: correct "..WT tesis..".

      Corrected.

      I cannot understand exactly what is shown in Figure 9B. Some elements marked on the X-axis are single base locations (-2K, TSS, +2K) and others are stretches of sequences so they cannot be equivalent. Why there is only an intron shown? There should be a measure of normalized expression on the Y-axis.

      Thank you for your questions. The X-axis means that genome segments were scaled to the same size and were calculated the signal abundance, which was analyzed by computeMatrix. Aim to know the piRNA source, piRNA was mapped to the gene body, including introns, CDS and UTRs. The value of the Y-axis is the normalized count.

      Figure 6F is not needed.

      Figure 6F was used to illustrate the number of different types of mRNA splicing upon FBXO24 deletion in the round spermatids. To better understand the splicing for the reader, we decided to keep it.

      The last two paragraphs of the discussion seem to be redundant.

      Thank you for pointing out this. We have revised the last two paragraphs of the discussion.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      R. C. Edgar, et al., Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version did not necessarily fail on our dataset due to its size (ALE ran, but provided unrealistic parameter estimates and was not able to output possible reconciliations, as mentioned in our Material and Methods section). We think it most likely did not run because there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent transfers is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. Following a suggestion from reviewer #3, we are going to try running the dated version of ALE independently on the alpha and beta-coronaviruses, resulting in smaller datasets. This will help us elucidate whether the dated version of ALE fails due to data size or the absence of a codiversification pattern.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host transfers involve unsampled intermediate hosts. To address the reviewer's comment, we will better underline the importance of sampling biases in our main text and include the suggested references. We will also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text.

      We agree that distinguishing between alpha and beta coronaviruses will provide useful additional insights; we are going to run separate cophylogenetic analyses for these two sub-clades. We will report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that we will now discuss.

    1. Author Response

      Reviewer #1:

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:

      The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

      We thank the reviewer for their enthusiastic and positive comments on our work.

      Reviewer #2:

      Summary:

      In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:

      As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      We thank the reviewer for their careful critique of our work. Below we address each major concern.

      Major comments:

      (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      In this manuscript, as well as in our two previous publications (Singh et al., 2019; Fan et al.,2022), we have presented the results of no enzyme controls, +/- ZB dimers, no ATP controls, or AMP-PNP controls for our FRET-based, H2A.Z deposition assay (see also Figure S3). We do not observe significant levels of photobleaching in this assay, either during ensemble measurements or in an smFRET experiment. To aid the reader, we have added the AMP-PNP data for the experiment shown in Figure 1B. The results show there is less than a 10% decrease in FRET over 30’, and the signal from the double acidic patch disrupted nucleosome is identical to this negative control.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      We agree with the reviewer, that loss of FRET can be due to DNA unwrapping from the nucleosome. We have previously demonstrated this activity by SWR1C in our smFRET study (Fan et al., 2022). However, DNA unwrapping is highly reversible and has a time duration of only 1-3 seconds. We and others have not observed stable unwrapping of nucleosomes by SWR1C, but rather the stable loss of FRET reports on dimer eviction. We assume the reviewer is concerned about the rather large decrease in FRET signal shown in the AMP-PNP controls for Figure S3, panels A and D. For the other 7 panels, the decrease in FRET with AMP-PNP are minimal. In fact, if we average all of the AMP-PNP data points, the rate of FRET loss is not statistically different from no enzyme control reactions (nucleosome plus ZB dimers).

      Data for panels A and D used a 77NO nucleosomal substrate, with Cy3 labeling the linker distal dimer. This is our standard DNA fragment, and it was used in Figure 1B. The only difference between data sets is that the data shown in Fig 1B used nucleosome reconstituted with a Cy5-labelled histone octamer, rather than the hexasome assembly method used for Fig S3. Three points are important. First, for all of these substrates, we assembled 3 independent nucleosomes, and the results are highly reproducible. Two, we performed a total of 6 experiments for the 77NO-Cy5 substrates to ensure that the rates were accurate (+/-ATP). Third, and most important, we do not see this decrease in FRET signal in the absence of SWR1C (no enzyme control). This data was included in the data source file. Thus, it appears that there is significant SWR1C-induced nucleosome instability for these two hexasome-assembled substrates. We now note this in the legend to Figure S3. Key for this work, however, is that there is a large increase in the rate of FRET loss in the presence of ATP, and this rate is faster when a ZB dimer was present at the linker proximal location. In response to the last point, we state in the first paragraph of the results: “The dimer exchange activity of SWR1C is monitored by following the decrease in the 670 nm FRET signal due to eviction of the Cy5-labeled AB-Cy5 dimer (Figure 1A).”

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      See response to item 2 above

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      We apologize for not making this more explicit for each figure. The error bars report on 95% confidence intervals from at least 3 sets of experiments. This statement has been added to the legend.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      Our pincher model is based on three, independent sets of data, not just Figure 1C. First, as noted by the reviewer, we find that disruption of either acidic patch cripples the dimer exchange activity of SWR1C in the FRET-based assay. Whether the defect is identical to that of the double APM mutant nucleosome does not seem pertinent to the model. In a second set of assays, we used fluorescence polarization to quantify the binding affinity of SWR1C for wildtype nucleosomes, a double APM nucleosome, or each single APM nucleosome. Consistent with the pincher model, each single APM disruption decreases binding affinity at least 10-fold (below the sensitivity of the assay). Finally, we monitored the ability of different nucleosomes to stimulate the ATPase activity of SWR1C. Consistent with the pincher model, a single APM disruption was sufficient to eliminate nucleosome stimulation.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      For all data shown in our manuscript, at least three different nucleosome preparations were used. The impact of a ZB dimer on the rates of dimer exchange was highly reproducible among different nucleosome preparations and experiments. We also see reproducible ZB stimulation for three different substrates – with ZB on the linker proximal side, the linker distal side, and on one side of a core particle. We do not believe that our data are inconsistent with previous studies. First, the previous work referenced by the reviewer performed dimer exchange reactions with a large excess of nucleosomes to SWR1C (catalytic conditions), whereas we used single turnover reactions. Secondly, our study is the first to use a homogenous, ZA heterotypic nucleosome as a substrate for SWR1C. All previous studies used a standard AA nucleosome, following the first and second rounds of dimer exchange that occur sequentially. And finally, we observe only a 20-30% increase in rate by a ZB dimer (e.g. 77N0 substrates), and such an increase was unlikely to have been detected by previous gel-based assays.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      Removed

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      The term ‘inviable’ has been replaced with ‘poor’ or ‘slow growth’

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      We apologize for missing this mistake in the Figure 8 legend. We had inadvertently copied this from the euroscarf entry and forgot to edit the entry. We decided not to add all the plasmid names to the figure, as it was too cluttered. We state in the figure legend that the panels show growth of swc5 deletion strains harboring the indicated swc5 alleles on CEN/ARS plasmids.

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

      In our discussion, we had noted that the published cryoEM structure had suggested that the Swc2 subunit likely interacts with the acidic patch on the dimer that is not targeted for replacement, and we proposed that Swc5 interacts with the acidic patch on the exchanging H2A/H2B dimer. We have now made this more clear in the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We wish to thank the reviewers for their helpful insightful comments. Their concerns were mainly related to the interpretation of the data, help in clarifying our statements and improving our discussion.

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting study It involves the utilization of hippocampal neuronal cultures from syntaxin 1 knock-out mice. These cultures serve as a platform for monitoring changes in synaptic transmission through electrophysiological recording of postsynaptic currents, upon lentiviral infection with various isoforms, chimeras, and point mutations of syntaxins.

      The authors observe the following:

      (1) Syntaxin2 restores neuronal viability and can partially rescue Ca2+-evoked release in syntaxin1 knock-out neurons that it is much slower (cumulative charge transfer differences) and with a clearly smaller RRP than when rescued with syntaxin1. In contrast, syntaxin2-mediated rescue leads to a high increase in spontaneous release (Figure 1). Convincingly, the authors conclude that syntaxin 1 is optimized for fast phasic release and for clamping of spontaneous release, in comparison with syntaxin2.

      (2) The replacement of the SNARE domain (or its C-terminal part) of syntaxin1 by the SNARE domain of syntaxin2 (or its C-terminal part) rescues the fast kinetics, but not the amplitude, of Ca2+-evoked release. This is associated with a decrease in the size of the RRP and an increase in spontaneous release. The probability of vesicular release (PVR) is a little bit increased, which is intriguing because a little decrease would be expected instead according to the reduced RRP, indicating that an enhancement of Ca2-dependent fusion is occurring at the same time by unknown mechanisms as the authors properly point out. The replacement of the Analogous experiments in which the SNARE domain of syntaxin1 is replaced into syntaxin2, reveals the exitance of differential regulatory elements outside the SNARE domain.

      (3) Different constructs of syntaxin 1 and syntaxin 2 display different expression levels. On the other hand, the expression levels of Munc-18 are associated with the characteristics of the transfected specific syntaxin construct. In any case, the electrophysiological phenotypes cannot be consistently explained by changes in Munc-18.

      (4) Mutations in several residues of the outer surface of the C-terminal half of the syntaxin1 SNARE domain lead to alterations in the RRP and the frequency of spontaneous release, but the changes cannot attributed to a change in the net surface charge, because the alterations occur even in paired mutations in which electrical neutrality is conserved.

      Comments:

      (1) This is a comment regarding the interpretation of the results. In general, the decrease in the RRP size is associated with the increased frequency of spontaneous release due to unclamping. The authors claim that both phenomena seem to be independent of each other. In any case, how can the authors discard the possibility that the unclamping of spontaneous release leads to a decrease in the RRP size?

      The main argument against the reduction of the RRP being caused by the observed increase in the mEPSC frequency is based on kinetics of refilling and depletion. The average time a vesicle fuses spontaneously after it becomes primed is 500 – 1000 seconds (spontaneous vesicle release rate – STX1 Figure 1, Figure 2 and Figure 3). The time it takes to refill the RRP after depletion is in the order of 3 seconds (Rosenmund and Stevens, 1996). Therefore, the refilling of the RRP is more than 100 times faster. Even when the spontaneous release would increase 5 fold, this would lead to less than 5 % of the steady state depletion of the RRP.

      (2) The authors have analyzed the kinetics of mEPSCs and found differences (Fig2-Supp. Fig1; Fig2-Supp. Fig1). It would be interesting and pertinent to discuss these data in the context of potential phenotypes in the fusion pore kinetics involving syntaxin1 and syntaxin2 and their SNARE domains. Indeed, the figure will improve by including averaged traces of mEPSCs.

      We thank the reviewer for the idea. Upon closer examination of the changes in mEPSC rise time and mEPSC decay time we noticed a minor slowing in the mEPSC rise time from 0.443ms (SEM0.0067) of STX1A to 0.535ms (SEM0.0151) for STX1A-2(SNARE) or 0.507ms (SEM0.01251) for STX1A-2(Cter), while the mEPSC half widths did not change significantly. It is possible that the measured change is related to the detection algorithm as mEPSC detection at elevated frequencies becomes more difficult due to increased overlap of event, and we therefore prefer to refrain from making any mechanistic claims.

      Minor comments:

      (1) Fig2 J; Fig 3 J. It is difficult to distinguish between different colors and implementing a legend within the graph will be very helpful.

      (2) Fig3 H. Please change the color of the box plot for Stx1 A to improve the contrast with the individual data points.

      (3) Page 6. Line 225. "Figure 2D and E" should be corrected to "Figure 2C and D"

      (1) Colors were changed for clearer visualization. (2) Unfortunately, changing the color did not improve the contrast with the individual plots. However, the numerical data is all included in the data sheets of the corresponding figure. (3) The mistake was corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 135-136: Are cited numbers cited in the text mean and SEM? Please indicate.

      Line 139 and Figure 1G: The difference between purple and blue was very hard to see on my hard copy.

      Line 152: Reference to Figure 1L should probably be 1K.

      Line 183: Reference to Figure 2C should probably be Figure 2F.

      Line 225: Reference to Figure 2D and 2E should probably be 2C and 2D.

      Line 239: Reference to Figure 3I should probably be 3H.

      All typos were addressed and colors were changed for better visualization.

      Line 210-211: Sentence ("One of the benefits..") is hard to understand.

      Thank you for noticing this mistake, agreeably the the sentence did not add any important or new information and so it was deleted. Additionally, the message of the mentioned sentence was already clearly stated in lines 209-211.

      Figure 4E-H misses data for STX2, for the figure to be arranged like Figure 5.

      Given that STX1 is the endogenous syntaxin in hippocampal neurons, we use it at a control for all the analysis done in STX2 and STX2-chimera experimental groups, thus it is included in Figure 3 and 5.

      It appears that the authors do not present or discuss the Western Blot in Fig. 4D. Are the quantitative results of the Western Blot consistent with or different from the quantification of the immunostainings (Fig. 4B-C)? A similar question for Figure 5D, which also seems not to be presented.

      In terms of quantification, we have relied mainly on the ICC experiments because they test also for putative impairments in transport to the presynaptic compartment. Our WB data are overall consistent with the results, but were not used to quantitate expression of our syntaxin chimeras and mutations in the STX1-null hippocampal neuron model.

      Figure 6F-G: The normalization of spontaneous vesicular release rates is not clear, because the vesicular release rates already contain a normalization (mEPSC rate divided by RRP size). Is a further normalization of the STX1A condition informative? The authors should consider presenting the release rates themselves. In any case, the normalization should be presented/explained, at least in the legends.

      The reviewer is in principle correct. Due to the large number of experimental groups we had to perform recordings from multiple cultures, where not all experimental groups were present, while the WT STX1 was present as a consistent control. The reduce culture to culture variability, additional normalization to the WT control group was performed. However, we also included the raw data numerical values in the data-source sheets (Normalized and absolute), which produce a similar overall outcome.

      References to Figure 7 subpanels (A, B, and C) are missing.

      Thank you for the comment. We have integrated all panels into one for better representation and understanding since they are representative of one another.

      Lines 330-339 and Figure 7 in Discussion: the authors discuss that adding the non-cognate STX2 SNARE-domain to syntaxin-1 might destabilize the primed state and decrease the fusion energy barrier (as indicated in Figure 7C). What is the evidence that the decrease in RRP size is not caused solely by the depletion of the pool due to the increased spontaneous fusion?

      Please see the comments to major point 2 of reviewer 1.

      Statistics: Missing is the number of observations (n) for all data. Even if all data points are displayed, this should be stated.

      N numbers are included in the data sheets attached to each figure.

      The statement (start of Discussion,) that the SNARE-domain of STX1 'plays a minimal role in the regulation for Ca2+-evoked release' is somewhat puzzling, since without the SNARE-domain in STX1 there would be no Ca2+-evoked release. I guess these statements (similar statements are found elsewhere) are due to the interesting finding that STX2 leads to a decrease in release kinetics, compared to STX1, and this is not (entirely) due to differences in the SNARE-domain. I would suggest rephrasing the finding in terms of release kinetics. Also, the statement in the last sentence of the Abstract is not clear.

      Thank you for pointing this out and we agree that our experiments showed strong impact of the syntaxin isoform exchange on release kinetics and overall release output. A similar comment came also from reviewer #3 and so, we have addressed both comments as one.

      Our confusing statement resulted from the order of the presented results and our summarizing remarks for each section. Our statement reflected our finding that mutating residues in the C-terminal part of the STX1 SNARE motif affected only spontaneous release and RRP size but not release efficacy. We now state (pg. 6 lines 231-233) that the data observed from the comparison of “the results obtained from the Ca2+-evoked release between STX1 and STX2 support major regulatory differences of the domains outside of the SNARE domain between isoforms”.

      We have changed the abstract pg. 2 lines 55-56

      We have changed the introduction pg. 3 lines 102-105 for a better contextualization.

      We have changed the start of the discussion pg. 9 lines 250-252 for better contextualization.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, Salazar-Lázaro et al. presented interesting data that C-terminal half of the Syx1 SNARE domain is responsible for clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release. The authors routinely utilized the chimeric approach to replace the SNARE domain of Syx1 with its paralogue Syx2 and analyzed the neuronal activity through electrophysiology. The data are straightforward and fruitful. The conclusions are partly reasonable. One obvious drawback is that they did not explore the underlying mechanism. I think it is easy for the authors to carry out some simple assays to verify their hypothesis for the mechanism, instead of just talking about it in the discussion section. In all, I appreciate the data presented in the manuscript. If the authors could supply more data on the mechanisms, this would be important research in the field. Some critical comments are listed below:

      We thank the reviewer for his/her comments and suggestions.

      Major comments:

      (1) In pg.3, lines 102-104, the authors stated that 'We found that the C-terminal half of the SNARE domain of STX1.. ..while it is minimally involved in the regulation of Ca2+-evoked release.' But in pg.5, lines 174-176, they wrote that 'Replacement of the full-SNARE domain (STX1A-2(SNARE)) or the C-terminal half (STX1A-2(Cter)) of the SNARE domain of STX1A with the same domain from STX2 resulted in a reduction in the EPSC amplitude (Figure 2B).' and in pg.5-6, lines 197-199, they wrote that 'Taken together our results suggest that the C-terminal half of the SNARE domain of STX1A is involved in the regulation of the efficacy of Ca2+-evoked release, the formation of the RRP and in the clamping of spontaneous release.' It puzzles me a lot as to what the authors are really trying to express for the relationship between C-half of the SNARE complex and Ca2+-evoked release (i.e., minimally involved or significantly participate in the process?). Please clarify and reorganize the contexts.

      Please see our reply to the last comment of reviewer 2.

      (2) Figure 1-figure supplement 1, the authors should analyze Syx1/VGlut1 level additionally. And, if possible, compare the difference between Syx1/VGlut1 and Syx2/VGlut1.

      The levels of STX1/VGlut1 and STX2/VGlut1 were analyzed in detail in Figures 4 and 5.

      The direct comparison between the expression levels of these two proteins is not possible since affinities of the antibodies to the target proteins are different and can induce potential biases. While this could be overcome by the use of a FLAG-tag to the syntaxin proteins, we have not utilized this approach in this publication. We in addition inferred sufficient and comparable expression of both syntaxins from their ability to rescue some of syntaxin1 loss of function phenotypes.

      (3) Figure 2D only analyzed the EPSC half-width, could the author alternatively analyze the rise/decay time? Also, in Figure 3-figure supplement 1, does it refer to the kinetic parameters of Syx2-1A in Figure 3? It is very confused.

      We have changed the text accordingly and each parameter is referenced to its corresponding figure for clarity. As for the decay and rise time of STX1 and STX1-chimeras, they are in Figure 2-figure supplement 1A and B.

      (4) On pg.4, lines 151-152, 'Finally, no change was observed in the paired-pulse ratio (PPR) between STX1A and STX2 groups (Figure 1L).' does not contain any explanations and comments for this observation in the texts.

      The small EPSC amplitudes and altered kinetics on the STX2 constricts (Figure 1 and Figure 3) have made it more difficult to quantitate paired pulse experiments. Therefore, we preferred not to overinterpret these measurements. The findings that the paired pulse data were not significantly different, fit with the vesicular release probability measurements which showed no major changes. We have made our statement on this basis.

      (5) On pg.6, lines 235-236, the authors wrote that 'Additionally, we found that only STX2-1A(SNARE) and STX2-1A(Cter) could rescue the RRP to around double of what we measured from STX2 and STX2-1A(Nter) (figure 3F)'. However, in Figure 3F, the authors indicated 'n.s.' (p>0.05) for the differences between STX2 and STX2-1A(SNARE)/STX2-1A(Cter). It is perplexing how the authors interpret their data. Definitely, the p-value could not be arbitrarily used as a criterion of difference. An easier way is that indicating the exact p-values for each comparison (indicate in figure legends or list in tables).

      We apologize for any confusion, and hope the modification gives more clarity in our interpretation. The calculated p-values are included in attached data source tables and hope this will provide clarity to our comparative analysis. We have changed the text in pg 7 lines 238-241 and are cautious to overinterpret these results and rely more on the data observed in STX1A-chimeras, which show significant changes in the RRP.

      (6) I noticed that the authors preferred using 'xx% increase/decrease' or 'xx-fold increase/decrease' to interpret their inter-group data. I would doubt whether the interpretations are appropriate. First, it seems that most of the individual scatters from one set were not subject to Gaussian distribution; also, the authors utilized non-parameter tests to compare the differences. Second, the authors did not explicitly indicate the method to calculate the % or fold, e.g., by comparing mean value or median. I think it is a bad choice to use the median to calculate fold changes; meanwhile, the mean value would also be biased, given the fact that the data were not Gaussian-distributed. The authors should be cautious in interpreting their data.

      We thank the reviewer for pointing the inaccuracy of our descriptions and have included the parameter used to calculated the percentage and fold increase/decrease in the materials and methods section. Specifically, the mean. Our intention is to plainly state the amount of change seen in a parameter based on the observed changes in the mean value. We agree with the reviewer that interpreting this could be problematic if we are speculating possible mechanisms. Further test should be conducted as to state whether similar increase/decrease changes in a parameter are due to the disturbance of the same mechanisms or different. E.g., we discussed whether the regulation of SYT1 might be or not be the mechanism affected in some of the chimeras that show an increase in the spontaneous release rate, for the release rate observed in some is massively higher than that seen in SYT1-KO (Bouazza-Arostegui et al., 2022). It is tempting to speculate that it could be due to other mechanisms based on the differences in the changes. For this reason, we have given an array of possible mechanisms affected when we manipulate the SNARE domain of STX1.

      (7) The authors routinely analyzed the levels of Munc18-1 in neuronal lysates by WB and Munc18-1/VGlut1 by immunofluorescence in various Syx1 mutants. However, in my view, these assays were slightly indirect. It is evident that the SNARE domain of Syx1 participates in the binding to Munc18-1 according to the atomic structures (pdb entries: 3C98 and 7UDB). Meanwhile, Han et al. reported that K46E mutation (located in domain 1 of Munc18-1) strongly impairs Syx1 expression, Syx1-interaction, vesicle docking and secretion (Han et al., 2011, PMID: 21900502). Intriguingly, the residue K46 of Munc18-1, which is close to D231/R232 of Syx1, may have potential electrostatic contacts to D231 and R232 of Syx1. This is reminiscent of the possibility that Syx1D231/R232 and some Syx1-2 chimeras lost their normal function through their defective binding to Munc18-1.nmb, To better understand the underlying mechanism, the authors may need to carry out in vivo and/or in vitro binding analysis between syntaxin mutants/chimeras and Munc18-1. They also need to conduct more discussions about the issue.

      We express our gratitude for the identification of a previously overlooked aspect in our investigation of the interplay between Munc18-1 and STX1. In response, we have incorporated additional discourse on this matter in pg11 lines 419-431.

      Additionally, we appreciate the thoughtful suggestion regarding additional experiments to further explore the molecular relationship between Munc18-1 and STX1. We agree that co-immunoprecipitation experiments (either by using an antibody against Munc18-1 or STX1 and STX2) would offer greater insight into whether the binding of these proteins is affected in the isoform or the mutants. Notably, we performed immunoprecipitation experiments by using neuronal lysates of the corresponding groups and using STX1A and STX2 antibodies for the pull-downs. However, we were unable to co-IP Munc18-1 when doing so. Changing the conditions of the experiment did not yield better results and so these experiments remained inconclusive for the moment. For this reason, we included it as an open question and a potential concluding hypothesis of the molecular mechanism. However, Shi et al., 2021, have performed co-IP assays using Munc18-1-wt and a mutant form which affects the binding to the C-terminal half of the SNARE domain of STX, and STX1-wt and a STX mutants targeting some of our residues of interest and showed a decrease in the pulled-down levels of Munc18-1 using HeLa cells. We have made sure to mention the conclusion of this important publication in our discussion.

      (8) The third possible mechanism (i.e., interaction with Syt1) proposed by the authors seems more reasonable. However, the discussions raised by the authors were not enough. For instance, plenty of literature has indicated that Syt1 may participate in synaptic vesicle priming through stabilizing partially or fully assembled SNARE complex (Li et al., 2017, PMID: 28860966; Bacaj et al., 2015, PMID: 26437117; Mohrmann et al., 2013, PMID: 24005294; Wang et al., 2011; PMID: 22184197; Liu et al., 2009, PMID: 19515907); complexins are also SNARE binding modules that regulate synaptic exocytosis. Lack of complexins could lead to unclasping of spontaneous fusion of synaptic vesicles, though it causes severe Ca2+-triggered release at the same time (Maximov et al., 2009, PMID: 19164751). Meanwhile, different domains of complexin may accomplish different steps of SV fusion, early research had indicated that the C-terminal sequence of complexin is selectively required for clamping of spontaneous fusion and priming but not for Ca2+-triggered release (Kaeser-Woo et al., 2012, PMID: 22357870). Likewise, if possible, the authors may need to carry out in vivo and/or in vitro binding analysis to confirm their hypothesis.

      The exploration of complexin´s involvement was limited in our study primarily due to our methodological focus on comprehending molecular mechanisms concerning the sequence disparities between STX1 and STX2. Our laboratory has studied the role of Complexin extensively, and we certainly have had a possible involvement in mind. However, since the sites identified on syntaxin are either conserved between STX1 and STX2 or not close to the central or accessory helical domains of complexin, we did not perform experiments to test putative interactions, and we refrained from discussing complexin in this paper.

      (9) Lastly, I would suspect that whether the defects of Syx2 and Syx1 chimeras were caused by the SNARE complex itself, from another point of view that is different from the hypothesis raised by the authors. Changing the outward residues (or we say the solvent-accessible residues) of the SNARE complex may affect the stability, assembly kinetics, and energetics (Wang and Ma, 2022, PMID: 35810329; Zorman et al., 2014, PMID: 25180101), especially for the C-terminal halves. Is this another possible mechanism through which the C-terminus of Syx1 might contribute to SV priming and clamping of spontaneous release? The authors should at least conduct some discussions about the point.

      Thank you for this suggestion. We indeed assumed that since the hydrophobic layers of the SNARE domains that form the hydrophobic pocket of STX2 and STX1 are mainly conserved, that the intrinsic stability of the SNARE complex is largely unchanged. Additionally, Li et al., (2022) PMID: 35810329 examined the stability of the alfa-helix structure of the SNARE domain of SNAP25. And while they found no changes in the stability and formation of the alfa-helix when mutating outwards-facing residues for methodological purposes (bimane-tryptophan quenching), their study did not selectively explore the effect of mutations of outer-surface residues on the stability of the alfa-helix.

      Zorman et al., (2014) PMID: 25180101, as noted by the reviewer, observed that changes in the sequence of the SNARE domain (by using SNARE proteins from different trafficking systems (neuron, GLUT4, yeast…) correlated with changes in the step-wise SNARE complex assembly. However, they also did not selectively mutate the outer solvent-accessible residues, hindering conclusive speculations in the contribution of said residues on the kinetics and energetics of assembly and intrinsic stability of the SNARE complex.

      Upon petition of the reviewer, we have added this paragraph to discuss an additional mechanism:

      “As a final remark, it is possible that the changes in the spontaneous release rate and the priming stability may stem from a reduced stability of the SNARE complex itself through putative interactions between outer surface residues. Studies of the kinetics of assembly of the SNARE complex which mutate solvent-accessible residues in the C-terminal half of the SNARE domain of SYB2 have shown reduction in the stability of the SNARE complex assembly and are correlated with impaired fusion (Jiao et al., 2018). However, STX1 mutations of outward residues were inconclusive and were always accompanied by hydrophobic layer mutations (Jiao et al., 2018), which affect the assembly kinetics and energetics of the SNARE complex (Ma et al., 2015). Single molecule optical-tweezer studies have focused on the impact of regulatory molecules on the stability of assembly such as Munc18-1 (Ma et al., 2015; Jiao et al., 2018) and complexin (Hao et al., 2023), or on the intrinsic stability of the hydrophobic layers in the step-wise assembly of the SNARE complex (Gao et al., 2012; Ma et al., 2015; Zhang et al., 2017). Although the conserved hydrophobic layers in the SNARE domains of STX1A and STX2 (Figure 1) suggest unchanged zippering and intrinsic stability of the complex, further studies addressing the contribution of surface residues on the stability of the alfa-helix structure of the SNARE domain of STX1 (Li et al., 2022) or the stability of the SNARE complex should be conducted.”

      Minor comments:

      (1) In pg.6, line 236, 'figure 3F', the initial 'f' should be uppercased.

      (3) On pg.11, line 396, the section title 'The interaction of the C-terminus of de SNARE domain of STX1A with Munc18-1 in the stabilization of the primed pool of vesicles.' The word 'de' is confusing, please check.

      (4) In pg.12, line 446, the section title, should 'though' be 'through'?

      These comments have been acknowledged and changed. Thank you

      (2) In pg.7, line 239, '..had an increased PVR (Figure 3G), no change in the release rate (Figure 3I)', should Figure 3I be Figure 3H? and line 240, 'and an increase in short-term depression during 10Hz train stimulation (Figure 3I)', should Figure 3I be Figure 3J? If so, Figure 3I will not be cited in the texts and lack adequate interpretations. Please check.

      We apologize for the oversight in not referencing this specific subpanel of the figure and have incorporated the reference in the text. Additionally, our interpretation of this data is connected to the mechanisms that govern efficacy of Ca2+-evoked response, and its dependence on the integrity of the entire-SNARE domain. We wish to highlight the modifications made to the discussion on the regulation of the Ca2+-evoked response based on previous reviewer comment #1, and a similar comment from reviewer #2 (as stated previously).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Visual Perceptual Learning (VPL) results in varying degrees of generalization to tasks or stimuli not seen during training. The question of which stimulus or task features predict whether learning will transfer to a different perceptual task has long been central in the field of perceptual learning, with numerous theories proposed to address it. This paper introduces a novel framework for understanding generalization in VPL, focusing on the form invariants of the training stimulus. Contrary to a previously proposed theory that task difficulty predicts the extent of generalization - suggesting that more challenging tasks yield less transfer to other tasks or stimuli - this paper offers an alternative perspective. It introduces the concept of task invariants and investigates how the structural stability of these invariants affects VPL and its generalization. The study finds that tasks with high-stability invariants are learned more quickly. However, training with low-stability invariants leads to greater generalization to tasks with higher stability, but not the reverse. This indicates that, at least based on the experiments in this paper, an easier training task results in less generalization, challenging previous theories that focus on task difficulty (or precision). Instead, this paper posits that the structural stability of stimulus or task invariants is the key factor in explaining VPL generalization across different tasks

      Strengths:

      • The paper effectively demonstrates that the difficulty of a perceptual task does not necessarily correlate with its learning generalization to other tasks, challenging previous theories in the field of Visual Perceptual Learning. Instead, it proposes a significant and novel approach, suggesting that the form invariants of training stimuli are more reliable predictors of learning generalization. The results consistently bolster this theory, underlining the role of invariant stability in forecasting the extent of VPL generalization across different tasks.

      • The experiments conducted in the study are thoughtfully designed and provide robust support for the central claim about the significance of form invariants in VPL generalization.

      Weaknesses:

      • The paper assumes a considerable familiarity with the Erlangen program and the definitions of invariants and their structural stability, potentially alienating readers who are not versed in these concepts. This assumption may hinder the understanding of the paper's theoretical rationale and the selection of stimuli for the experiments, particularly for those unfamiliar with the Erlangen program's application in psychophysics. A brief introduction to these key concepts would greatly enhance the paper's accessibility. The justification for the chosen stimuli and the design of the three experiments could be more thoroughly articulated.

      Response: We appreciate the reviewer's feedback regarding the accessibility of our paper. In response to this feedback, we plan to enhance the introduction section of our paper to provide a concise yet comprehensive overview of the key concepts of Erlangen program. Additionally, we will provide a more thorough justification for the selection of stimuli and the experimental design in our revised version, ensuring that readers understand the rationale behind our choices.

      • The paper does not clearly articulate how its proposed theory can be integrated with existing observations in the field of VPL. While it acknowledges previous theories on VPL generalization, the paper falls short in explaining how its framework might apply to classical tasks and stimuli that have been widely used in the VPL literature, such as orientation or motion discrimination with Gabors, vernier acuity, etc. It also does not provide insight into the application of this framework to more naturalistic tasks or stimuli. If the stability of invariants is a key factor in predicting a task's generalization potential, the paper should elucidate how to define the stability of new stimuli or tasks. This issue ties back to the earlier mentioned weakness: namely, the absence of a clear explanation of the Erlangen program and its relevant concepts.

      Response: Thanks for highlighting the need for better integration of our proposed theory with existing observations in the field of VPL. Unfortunately, the theoretical framework proposed in our study is based on the Klein’s Erlangen program and is only applicable to geometric shape stimuli. For VPL studies using stimuli and paradigms that are completely unrelated to geometric transformations (such as motion discrimination with Gabors or random dots, vernier acuity, spatial frequency discrimination, contrast detection or discrimination, etc.), our proposed theory does not apply. Some stimuli employed by VPL studies can be classified into certain geometric invariants. For instance, orientation discrimination with Gabors (Dosher & Lu, 2005) and texture discrimination task (F. Wang et al., 2016) both belong to tasks involving Euclidean invariants, and circle versus square discrimination (Kraft et al., 2010) belongs to tasks involving affine invariance. However, these studies do not simultaneously involve multiple geometric invariants of varying levels stability, and thus cannot be directly compared with our research. It is worth noting that while the Klein’s hierarchy of geometries, which our study focuses on, is rarely mentioned in the field of VPL, it does have connections with concepts such as 'global/local', 'coarse/fine', 'easy/difficulty', 'complex/simple': more stable invariants are closer to 'global', 'coarse', 'easy', 'complex', while less stable invariants are closer to 'local', 'fine', 'difficulty', 'simple'. Importantly, several VPL studies have found ‘fine-to-coarse’ or ‘local-to-global’ asymmetric transfer (Chang et al., 2014; N. Chen et al., 2016; Dosher & Lu, 2005), which seems consistent with the results of our study.

      In the introduction section of our revised version and subsequent full author response, we will provide a clear explanation of the Erlangen program and elucidate how to define the stability of new stimuli or tasks. In the discussion section of our revised version, we will compare our results to other studies concerned with the generalization of perceptual learning and speculate on how our proposed theory fit with existing observations in the field of VPL.

      • The paper does not convincingly establish the necessity of its introduced concept of invariant stability for interpreting the presented data. For instance, consider an alternative explanation: performing in the collinearity task requires orientation invariance. Therefore, it's straightforward that learning the collinearity task doesn't aid in performing the other two tasks (parallelism and orientation), which do require orientation estimation. Interestingly, orientation invariance is more characteristic of higher visual areas, which, consistent with the Reverse Hierarchy Theory, are engaged more rapidly in learning compared to lower visual areas. This simpler explanation, grounded in established concepts of VPL and the tuning properties of neurons across the visual cortex, can account for the observed effects, at least in one scenario. This approach has previously been used/proposed to explain VPL generalization, as seen in (Chowdhury and DeAngelis, Neuron, 2008), (Liu and Pack, Neuron, 2017), and (Bakhtiari et al., JoV, 2020). The question then is: how does the concept of invariant stability provide additional insights beyond this simpler explanation?

      Response: We appreciate the alternative explanation proposed by the reviewer and agree that it presents a valid perspective grounded in established concepts of VPL and neural tuning properties. However, performing in the collinearity and parallelism tasks both require orientation invariance. While utilizing the orientation invariance, as proposed by the reviewer, can explain the lack of transfer from collinearity or parallelism to orientation task, it cannot explain why collinearity does not transfer to parallelism.

      As stated in the response to the previous review, in the revised discussion section, we will compare our study with other studies (including the three papers mentioned by the reviewer), aiming to clarify the necessity of the concept of invariant stability for interpreting the observed data and understanding the mechanisms underlying VPL generalization.

      • While the paper discusses the transfer of learning between tasks with varying levels of invariant stability, the mechanism of this transfer within each invariant condition remains unclear. A more detailed analysis would involve keeping the invariant's stability constant while altering a feature of the stimulus in the test condition. For example, in the VPL literature, one of the primary methods for testing generalization is examining transfer to a new stimulus location. The paper does not address the expected outcomes of location transfer in relation to the stability of the invariant. Moreover, in the affine and Euclidean conditions one could maintain consistent orientations for the distractors and targets during training, then switch them in the testing phase to assess transfer within the same level of invariant structural stability.

      Response: Thanks for raising the issue regarding the mechanism of transfer within each invariant conditions. We plan to design an additional experiment that is similar in paradigm to Experiment 2, aiming to examine how VPL generalizes to a new test location within a single invariant stability level.

      • In the section detailing the modeling experiment using deep neural networks (DNN), the takeaway was unclear. While it was interesting to observe that the DNN exhibited a generalization pattern across conditions similar to that seen in the human experiments, the claim made in the abstract and introduction that the model provides a 'mechanistic' explanation for the phenomenon seems overstated. The pattern of weight changes across layers, as depicted in Figure 7, does not conclusively explain the observed variability in generalizations. Furthermore, the substantial weight change observed in the first two layers during the orientation discrimination task is somewhat counterintuitive. Given that neurons in early layers typically have smaller receptive fields and narrower tunings, one would expect this to result in less transfer, not more.

      Response: We appreciate the reviewer's feedback regarding the clarity of our DNN modeling experiment. We acknowledge that while DNNs have been demonstrated to serve as models for visual systems as well as VPL, the claim that the model provides a ‘mechanistic’ explanation for the phenomenon still overstated. In our revised version,

      We will attempt a more detailed analysis of the DNN model while providing a more explicit explanation of the findings from the DNN modeling experiment, emphasizing its implications for understanding the observed variability in generalizations.

      Additionally, the substantial weight change observed in the first two layers during the orientation discrimination task is not contradictory to the theoretical framework we proposed, instead, it aligns with our speculation regarding the neural mechanisms of VPL for geometric invariants. Specifically, it suggests that invariants with lower stability rely more on the plasticity of lower-level brain areas, thus exhibiting poorer generalization performance to new locations or stimulus features within each invariant conditions. However, it does not imply that their learning effects cannot transfer to invariants with higher stability.

      Reviewer #2 (Public Review):

      The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that, to my knowledge, have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost too clean. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be publicly available. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. Below I offer some more concrete comments:

      (1) The justification for the designs is not well explained. The authors simply tell the audience in a single sentence that they test projective, affine, and Euclidean geometry. But despite my familiarity with these terms -- familiarity that many readers may not have -- I still had to pause for a very long time to make sense of how these considerations led to the stimuli that were created. I think the authors must, for a point that is so central to the paper, thoroughly explain exactly why the stimuli were designed the way that they were and how these designs map onto the theoretical constructs being tested.

      (2) I wondered if the design in Experiment 1 was flawed in one small but critical way. The goal of the parallelism stimuli, I gathered, was to have a set of items that is not parallel to the other set of items. But in doing that, isn't the manipulation effectively the same as the manipulation in the orientation stimuli? Both functionally involve just rotating one set by a fixed amount. (Note: This does not seem to be a problem in Experiment 2, in which the conditions are more clearly delineated.)

      (3) I wondered if the results would hold up for stimuli that were more diverse. It seems that a determined experimenter could easily design an "adversarial" version of these experiments for which the results would be unlikely to replicate. For instance: In the orientation group in Experiment 1, what if the odd-one-out was rotated 90 degrees instead of 180 degrees? Intuitively, it seems like this trial type would now be much easier, and the pattern observed here would not hold up. If it did hold up, that would provide stronger support for the authors' theory.

      It is not enough, in my opinion, to simply have some confirmatory evidence of this theory. One would have to have thoroughly tested many possible ways that theory could fail. I'm unsure that enough has been done here to convince me that these ideas would hold up across a more diverse set of stimuli.

      Response: (1) We appreciate the reviewer’s feedback regarding the justification for our experimental designs. We recognize the importance of thoroughly explaining how our stimuli were designed and how these designs correspond to the theoretical constructs being tested. In our revised version, we will enhance the introduction of Erlangen program and provide a more detailed explanation of the rationale behind our stimulus designs, aiming to enhance the clarity and transparency of our experimental approach for readers who may not be familiar with these concepts.

      (2) We appreciate the reviewer’s insight into the design of Experiment 1 and the concern regarding the potential similarity between the parallelism and orientation stimuli manipulations.

      The parallelism and orientation stimuli in Experiment 1 were first used by Olson & Attneave (1970) to support line-based models of shape coding and then adapted to measure the relative salience of different geometric properties (Chen, 1986). In the parallelism stimuli, the odd quadrant differs from the rest in line slope, while in the orientation stimuli, in contrast, the odd quadrant contains exactly the same line segments as the rest but differs in direction pointed by the angles. The result, that the odd quadrant was detected much faster in the parallelism stimuli than in the orientation stimuli, can serve as evidence for line-based models of shape coding. However, according to Chen (1986, 2005), the idea of invariants over transformations suggests a new analysis of the data: in the parallelism stimuli, the fact that line segments share the same slope essentially implies that they are parallel, and the discrimination may be actually based on parallelism. Thus, the faster discrimination of the parallelism stimuli than that of the orientation stimuli may be explained in terms of relative superiority of parallelism over orientation of angles—a Euclidean property.

      The group of stimuli in Experiment 1 has been employed by several studies to investigate scientific questions related to the Klein’s hierarchy of geometries (L. Chen, 2005; Meng et al., 2019; B. Wang et al., n.d.). Due to historical inheritance, we adopted this set of stimuli and corresponding paradigm, despite their imperfect design.

      (3) Thanks for raising the important issue of stimulus diversity and the potential for "adversarial" versions of the experiments to challenge our findings. We acknowledge the validity of your concern and recognize the need to demonstrate the robustness of our results across a range of stimuli. We plan to design additional experiments to investigate the potential implications of varying stimulus characteristics, such as different rotation angles proposed by the reviewer, on the observed patterns of performance.

    1. Author Response

      We would like to thank the editors and reviewers who took their valuable time to evaluate the manuscript from various perspectives. We are delighted that our technique was found appealing to biologists and imaging technologists. However, we received several comments that the principles and effectiveness of our techniques are often vague and difficult to understand. They also pointed out that the explanations and representations for several figures were not appropriate. We will revise the manuscript to address these issues and make the manuscript more clear and rigorous.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Comment 1.1: “Did the UKB or HCHS datasets have information on accurate markers of insulin resistance, such as HbA1c or HOMA-IR (if fasting glucose was not available)? Looking at that data would allow us to determine the contribution of insulin resistance to the observed cortical phenotype.”

      Reply 1.1: We appreciate the insightful suggestion from the reviewer. In response, we incorporated the HbA1c into our analysis, enhancing its sensitivity to potential effects of insulin resistance. Subsequently, our analysis was reperformed, integrating HbA1c alongside non-fasting blood glucose in the PLS. This addition did not alter our main results, i.e., that of the PLS, virtual histology, and network contextualization analysis. Notably, as a result of the inclusion of HbA1c, the second latent variable now accounted for a greater shared variance (22.13%), with HbA1c showing the highest loading among MetS component variables. The manuscript has been thoroughly revised to incorporate these results.

      Comments 1.2: “(Results, p.13, 291-292) "A correlation matrix relating all considered MetS component measures is displayed in supplementary figure S12. Please clarify in this figure labels whether this was non-fasting glucose. If this is non-fasting glucose, it is not a MetS-related risk factor. The reader might be misled into thinking that fasting-glucose has a weak correlation, while its contribution (and the effect of insulin resistance) was not studied here.”

      “Table S8 and Table S9: Is the glucose metric here measured following fasting? If not, this should not be listed as a metabolic syndrome criterion. Or it should be specified that it isn't fasted glucose, otherwise, it sounds misleading.”

      Reply 1.2: We thank the reviewer for bringing this ambiguity to our attention. The initial analysis included only non-fasting plasma glucose in the PLS, as fasting plasma glucose data was unavailable for UKB and HCHS participants. Following your suggestion in reply 1.1, we have now incorporated HbA1c, a more indicative marker of insulin resistance. We retained non-fasting blood glucose in our analysis, recognizing its relevance as a diagnostic variable for type 2 diabetes mellitus, although it is less informative than fasting plasma glucose, HbA1c, or HOMA-IR. This decision is substantiated by the significant correlation found between non-fasting plasma glucose and HbA1c in our sample (r=.49).

      To enhance clarity, we have revised the methods section to explicitly mention that the study investigates non-fasting blood glucose. The revised sentence reads: “Here, we related regional cortical thickness and subcortical volumes to clinical measurements of MetS components, i.e., obesity (waist circumference, hip circumference, waist-hip ratio, body mass index), arterial hypertension (systolic blood pressure, diastolic blood pressure), dyslipidemia (high density lipoprotein, low density lipoprotein, total cholesterol, triglycerides) and insulin resistance (HbA1c, non-fasting blood glucose).”

      Additionally, we have updated the caption of supplementary figure S13 (formerly supplementary figure S12) to clearly indicate the investigation of non-fasting plasma glucose. The table detailing diagnostic MetS criteria (supplementary table S2) has also been amended to clarify the absence of fasting plasma glucose data in our study and to indicate that only data on antidiabetic therapy and diagnosis of type 2 diabetes mellitus were used as criteria for insulin resistance in the case-control analysis.

      Comment 1.3: “I do not understand how the authors can claim there is a deterministic relationship there if all the results are only correlational or comparative. Can the differences in functional connectivity and white matter fiber tracts observed not be caused by the changes in cortices they relate to? How can the authors be sure the network organisation is shaping the cortical effects and not the opposite (the cortical changes influence the network organisation)? This should be further discussed or explained.”

      Reply 1.3: We agree with the reviewer's comment on the non-causative nature of our data and have accordingly revised the discussion section to reflect a more cautious interpretation of our findings. We have carefully reframed our language to avoid any implications of causality, ensuring the narrative aligns with the correlational nature of our data. Nevertheless, we believe that exploring causal interpretations can offer valuable clinical insights. Therefore, while moderating our language, we have maintained certain speculative discussions regarding potential causative pathomechanistic pathways.

      Comment 1.4: “The hippocampus is also an area where changes have consistently been observed. Why did the authors limit their analysis to the cortex.”

      Reply 1.4: We appreciate this reviewer comment. In response, we have added volumes of Melbourne Subcortical Atlas parcels (including the hippocampus) to the analysis. Corresponding results are now shown in figure 2. The subcortical bootstrap ratios indicated that higher MetS severity was related to lower volumes across all investigated subcortical structures.

      Comment 1.5: “Which field ID of the UK biobank are the measures referring to? If possible, please specify the Field ID for each of the UKB metrics used in the study.”

      Reply 1.5: We thank the reviewer for the recommendation. The Field IDs used in our study are now listed in supplementary figure S1.

      Comment 1.6: “Several Figures were wrongly annotated, making it hard to follow the text.”

      Reply 1.6: Thank you for bringing the annotation issues to our awareness. We have thoroughly edited all annotations which should now correctly reference the figure content.

      Reviewer 2

      Comment 2.1: “Do the authors have the chance to see how the pattern relates to changes in cognitive function in the UKBB and possibly HCHS? This could help to provide some evidence about the directionality of the effect.” Reply 2.1: Thank you for your suggestion. We acknowledge the potential value of investigating gray matter morphometric data alongside longitudinal information on cognitive function. Although we concur with the significance of this approach, we are constrained by the ongoing processing of the UKB's imaging follow-up data and the pending release of the HCHS follow-up data. Consequently, our current analysis cannot incorporate this aspect for now. We plan to explore the relationship between MetS, cognition and brain morphology using longitudinal data as soon as it becomes available.

      Comment 2.2: “Also, you could project new data onto the component and establish a link with cognition in a third sample which would be even more convincing. I can offer LIFE-Adult study for this aim.”

      Reply 2.2: We are grateful for your recommendation to enhance our study's robustness by including a third sample to establish a cognitive link. While we recognize the merit of such a sensitivity analysis, we believe that our current dataset, derived from two large, independent cohorts, is sufficiently comprehensive for the scope of our current analysis. However, we are open to considering this approach in future studies and appreciate your offer of the LIFE-Adult study. We would welcome further conversation with you regarding future joint projects.

      Comment 2.3: “The sentences (p.17, ll.435 ff) seem to repeat: "Interestingly, we also observed a positive relationship between cortical thickness and MetS in the superior frontal, parietal and occipital lobe. Interpretation of this result is, however, less intuitive. We also noted a positive MetS-cortical thickness association in superior frontal, parietal and occipital lobes, a less intuitive finding that has been previously reported [60,61].”

      Reply 2.3: Thank you for making us aware of this duplication. We have deleted the first part of the section. It now reads “We also noted a positive MetS-cortical thickness association in superior frontal, parietal and occipital lobes, a less intuitive finding that has been previously reported.”

      Comment 2.4: “I would highly appreciate empirical evidence for the claim in ll. 442 "In support of this hypothesis, the determined cortical thickness abnormality pattern is consistent with the atrophy pattern found in vascular mild cognitive impairment and vascular dementia" Considering the previous reports about the co-localization of obesity-associated atrophy and AD neurodegeneration (Morys et al. 2023, DOI: 10.3233/JAD-220535), that most dementias are mixed and that MetS probably increases dementia risk through both AD and vascular mechanisms, I feel such "binary" claims on VaD/AD-related atrophy patterns should be backed up empirically.”

      Reply 2.4: Thank you for highlighting the need for clarity in differentiating between vascular and Alzheimer's dementia. We recognize the intricate overlap in dementia pathologies. Acknowledging the prevalence of mixed dementia and the influence of MetS on both AD and vascular mechanisms, we realize our original statement might have implied a specificity to vascular dementia, which was not intended.

      To address your concern, we have revised our statement to avoid an exclusive focus on vascular pathology, ensuring a more balanced representation of dementia types. Additionally, we have included Morys et al. 2023 as a reference. The section now reads: “In support of this hypothesis, the determined brain morphological abnormality pattern is consistent with the atrophy pattern found in vascular mild cognitive impairment, vascular dementia and Alzheimer’s dementia.”

      Comment 2.5: “I wonder how specific the cell-type results are to this covariance pattern. Maybe patterns of CT (independent of MetS) show similar associations with one or more of the reported celltypes? Would it be possible to additionally show the association of the first three components of general cortical thickness variation with the cell type densities?”

      Reply 2.5: Thank you for your query regarding the specificity of the cell-type results to the observed covariance pattern. To address this, we have conducted a virtual histology analysis of the first three latent variables of the main analysis PLS. The findings of this extended analysis have been detailed in the supplementary Figure S21. The imaging covariance profile of latent variable 2 was significantly associated with the density of excitatory neurons of subtype 3. The imaging covariance profile linked to latent variable 3 showed no significant association of cell type densities. Possibly, latent variable 3 represents only a noise component as it explained only 2.12% of shared variance. We hope this addition provides a clearer understanding of the specificity of our main results.

      Comment 2.6: “I agree that this multivariate approach can contribute to a more holistic understanding, yet I would like to see the discussion expanded on how to move on from here. Should we target the MetS more comprehensively or would it be best to focus on obesity (being the strongest contributor and risk factor for other "downstream" conditions such as T2DM)? A holistic approach is somewhat at odds with the in-depth investigation of specific mechanisms.”

      Reply 2.6: We value your suggestion to elaborate on the implications of our findings. Our study indicates that obesity may have the most pronounced impact on brain morphology among MetS components, suggesting it as a key contributor to the clinical-anatomical covariance pattern observed in our analysis. This highlights obesity as a primary target for future research and preventive strategies. However, we believe that our results warrant further validation, ideally through longitudinal studies, before drawing definitive clinical conclusions.

      Additionally, our study endorses a comprehensive approach to MetS, highlighting the importance of considering the syndrome as a whole to gain broader insights. We want to clarify, however, that such an approach is meant to complement, rather than replace, the study of individual cardiometabolic risk factors. The broad perspective our study adopts is facilitated by its epidemiological nature, which may not be as applicable in experimental settings that are vital for deriving mechanistic disease insights.

      To reflect these points, we have expanded the discussion in our manuscript to include a more detailed consideration of these implications and future research directions.

      Comment 2.7: “Please report the number of missing variables.”

      Reply 2.7: Thank you for your request to report the number of missing variables. We would like to direct your attention to table 1, where we have listed the number of available values for each variable in parentheses. To determine the number of missing variables, one can subtract these numbers from the total sample size.

      Comment 2.8: “Was the pattern similar in pre-clinical (pre-diabetes, pre-hypertension) vs. clinical conditions?“

      Reply 2.8: Thank you for your interest in the applicability of our findings across different MetS severity levels. Our analysis employs a continuous framework to encompass the entire range of vascular and cardiometabolic risks, including those only mildly affected by MetS. The linear relationship we observed between MetS severity and gray matter morphology patterns, as illustrated in Figure 2d, supports the interpretation that our findings apply to the entire spectrum of MetS severities.

      Comment 2.9: “How did you deal with medication (anti-hypertensive, anti-diabetic, statins..)?”

      Reply 2.9: Information on medication was considered for defining MetS for the case-control sensitivity analysis but was not included in the PLS. Detailed information can be found in table 1.

      Comment 2.10: “It would be really interesting to determine the genetic variations associated with the latent component. Have you considered doing a GWAS on this, potentially in the CHARGE consortium or with UKBB as discovery and HCHS as replication sample?”

      Reply 2.10: Thank you for your valuable suggestion regarding the implementation of a GWAS. We agree that incorporating a GWAS would provide significant insights, but we also recognize that it extends beyond the scope of our current analysis. However, we are actively planning a follow-up analysis. This subsequent analysis will encompass a comprehensive examination of both genetic variation and imaging findings in the context of MetS.

      Comment 2.11: “Please provide more information on which data fields from UKBB were used exactly (e.g. in github repository).”

      Reply 2.11: We appreciate your recommendation. The details regarding the Field IDs used in our study have been included as supplementary table S1.

      Reviewer 3

      Comments 3.1: “After a thorough review of the methods and results sections, I found no direct or strong evidence supporting the authors' claim that the identified latent variables were related to more severe MetS to worse cognitive performance. While a sub-group comparison was conducted, it did not adequately account for confounding factors such as educational level.”

      “Page 18-19 lines 431-446: the fifth paragraph in the discussion section. - As previously mentioned in the "Weaknesses" section, this study did not conduct a direct association analysis between MetS and cognitive levels without considering subgroup comparisons. Hence, I recommend the content of this paragraph warrants careful reconsideration.”

      Reply 3.1: We acknowledge the reviewer's constructive feedback regarding our analysis of cognitive data. We have performed a mediation analysis relating the subject-specific clinical PLS score of latent variable 1 representing MetS severity and cognitive test performances and testing for mediating effects of the imaging PLS score capturing the MetS-related brain morphological abnormalities. The imaging score was found to statistically mediate the relationship between the clinical PLS score and executive function and processing speed, memory, and reasoning test performance. These findings highlight brain structural differences as a relevant pathomechanistic correlate in the relationship of MetS and cognition. Corresponding information can now be found in figure 3, methods section 2.6.2, result section 3.3 and discussion section 4.2.

      Moreover, we would like to apologize for any confusion caused by previous unclear presentation. Our study further incorporates association analyses between MetS, brain structure, and cognition using MetS components, regional brain morphological measures, and cognitive performance data in a PLS to investigate whether cognitive measures contribute to the latent variable. These analyses were separately performed on the UK Biobank and HCHS datasets, due to their distinct cognitive assessments. We adjusted for age, sex, and education in the subgroup analyses by removing their effects from the input variables. These relationships are detailed in supplementary figures S16b and S17b, with loadings close to zero for age, sex, and education, confirming effective deconfounding.

      In sum, we greatly appreciate the suggestion to conduct a mediation analysis, which has substantially enhanced the strength and relevance of our analysis.

      Comment 3.2: “I would suggest the authors provide a more comprehensive description of the metrics used to assess each MetS component, such as obesity (incorporating parameters like waist circumference, hip circumference, waist-hip ratio, and body mass index) and arterial hypertension (detailing metrics like systolic and diastolic blood pressure), etc.”

      Reply 3.2: Thank you for your suggestion regarding a more detailed description of the metrics for assessing each component of MetS. We would like to point out that the specific metrics used, including those for obesity (such as waist circumference, hip circumference, waist-hip ratio, and body mass index) and arterial hypertension (including systolic and diastolic blood pressure), are comprehensively detailed in table 1 of our manuscript. We hope this table provides the clarity and specificity you are seeking regarding the MetS assessment metrics in our study.

      Comment 3.3: “I recommend the inclusion of an additional, detailed flowchart to further illustrate the procedure of virtual histology analysis. This would enhance the clarity of the methodological approach and assist readers in better comprehending the analysis method.”

      Reply 3.3: Thank you for your suggestion. Recognizing the challenges in visually representing many of our analysis steps, we have instead supplemented our manuscript with additional references. These references provide a clearer understanding of our virtual histology approach, particularly focusing on the processing of regional microarray expression data.

      The corresponding sentence reads: “Further details on the processing steps covered by ABAnnotate can be found elsewhere (https://osf.io/gcxun) [42]”

      Comment 3.4: “Why were both brain hemispheres used instead of solely utilizing the left hemisphere as the atlas, especially considering that the Allen Human Brain Atlas (AHBA) only includes gene data for the right hemisphere for two subjects?”

      Reply 3.4: Thank you for your query regarding our decision to use both brain hemispheres instead of solely the left hemisphere, especially considering the Allen Human Brain Atlas (AHBA) predominantly featuring gene data from the left hemisphere. Given the AHBA's limited spatial coverage of expression data in the right hemisphere, our approach involved mirroring the existing tissue samples across the left-right hemisphere boundary using the abagen toolbox,1 a practice supported by findings that suggest minimal lateralization of microarray expression.2,3 Further details are provided in previous work employing ABAnnotate.4 These studies are now referenced in our methods section.

      Comment 3.5: “The second latent variable was not further discussed. If this result is deemed significant, it warrants a more detailed discussion. "

      Reply 3.5: Thank you for the suggestion. We have added a paragraph to the discussion that discusses the second latent variable in greater detail. It reads: “The second latent variable accounted for 22.33% of shared variance and linked higher insulin resistance and lower dyslipidemia to lower thickness and volume in lateral frontal, posterior temporal, parietal and occipital regions. The distinct covariance profile of this latent variable, compared to the first, likely indicates a separate pathomechanistic connection between MetS components and brain morphology. Given that HbA1c and blood glucose were the most significant contributors to this variable, insulin resistance might drive the observed clinicalanatomical relationship.”

      Comment 3.6: “I suggest appending positive MetS effects after "..., insular, cingulate and temporal cortices;" for two reasons: a). The "positive MetS effects" might represent crucial findings that should not be omitted. b). Including both negative and positive effects ensures that subsequent references to "this pattern" are more precise.”

      Reply 3.6: We concur with the notion that the positive MetS effects should be highlighted as well. We modified the first discussion paragraph now mentioning them.

      Comment 3.7: “I would appreciate further clarification on this sentence and the use of the term "uniform" in this context. Does this suggest that despite the heterogeneity in the physiological and pathological characteristics of the various MetS components (e.g., obesity, hypertension), their impacts on cortical thickness manifest similarly? How is it that these diverse components lead to "uniform" effects on cortical thickness? Does this observation align with or deviate from previous findings in the literature?”

      Reply 3.7: Thank you for highlighting the ambiguity in our previous explanation. We agree that the complexity of the relationship between MetS components and brain morphology requires clearer articulation. To address this, we have revised the relevant sentence for better clarity. It now reads: „This finding indicates a relatively uniform connection between MetS and brain morphology, implying that the associative effects of various MetS components on brain structure are comparatively similar, despite the distinct pathomechanisms each component entails.“

      Comment 3.8: “Figure 1 does not have the labels "c)" and "d)". ”

      Reply 3.8: Thank you. We have modified figure 1 and made sure that the caption correctly references its content.

      Comment 3.10: “Incorrect figure/table citation:

      • Page 18 line 418: "(figure 2b and 1c)" à (figure 2b and 2c).

      • Page 18 line 419: "(supplementary figures S8 and S12-13)" à (supplementary figures S11 and S1516).

      • In the supplementary material, "Text S5 - Case-control analysis" section contains several figure or table citation errors. Please take a moment to review and correct them.”

      Reply 3.10: Thank you for bringing this to our attention. We have corrected the figure and table citation errors.

      Comment 3.11: “Page 8 line 184: The more commonly used term is "insulin resistance" rather than "insuline resistance.”

      Reply 3.11: We now use “insulin resistance” throughout the manuscript.

      Comment 3.12: “Nevertheless, variations in gene sets may introduce a degree of heterogeneity in the results (Seidlitz, et al., 2020; Martins et al., 2021). Consequently, further validation or exploratory analyses utilizing different gene sets can yield more compelling results and conclusions.”

      Reply 3.12: Thank you for your insightful comment regarding the potential heterogeneity introduced by variations in gene sets. We agree that exploring different gene sets could indeed enhance the robustness and generalizability of our findings. However, we think conducting a comprehensive methodological analysis of the available cell-type specific gene sets is a substantial effort and warrants its own investigation to thoroughly implement it and assess its implications. We also like to highlight that we are adhering to previous practices in our analysis setup.4,5

      References

      (1) Markello RD, Arnatkeviciute A, Poline JB, Fulcher BD, Fornito A, Misic B. Standardizing workflows in imaging transcriptomics with the abagen toolbox. Jbabdi S, Makin TR, Jbabdi S, Burt J, Hawrylycz MJ, eds. eLife. 2021;10:e72129. doi:10.7554/eLife.72129

      (2) Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489(7416):391-399. doi:10.1038/nature11405

      (3) Hawrylycz M, Miller JA, Menon V, et al. Canonical genetic signatures of the adult human brain. Nat Neurosci. 2015;18(12):1832-1844. doi:10.1038/nn.4171

      (4) Lotter LD, Saberi A, Hansen JY, et al. Human cortex development is shaped by molecular and cellular brain systems. Published online May 5, 2023:2023.05.05.539537. doi:10.1101/2023.05.05.539537

      (5) Lotter LD, Kohl SH, Gerloff C, et al. Revealing the neurobiology underlying interpersonal neural synchronization with multimodal data fusion. Neuroscience & Biobehavioral Reviews. 2023;146:105042. doi:10.1016/j.neubiorev.2023.105042

    1. Author Response

      Reviewer #2 (Public Review):

      This study aims to test the role of awake replay in short-term memory, a type of memory that operates on the timescale of seconds and minutes. Replay refers to a time-compressed burst of neuronal population activity during a particular oscillatory local field potential event in the hippocampus, called the sharp-wave ripple (SWR). SWRs are found during sleep and in the awake state and are always associated with the animal being quiescent. The paper compares results from three different behavioral tasks ranging in memory requirements and memory timescales. First, rats were trained on either a spatial match-to-sample task (MTS), a non-match-to-sample task (NMTS), or a task requiring the memorization of sequences (maze arms to be visited in a specific temporal order). In this initial training phase, the animals were allowed to learn the maze structure and the rules governing these tasks for all these behavioral paradigms. Then, awake sharp-SWRs were disrupted as the animal performed these tasks (both during instruction and test phases) via an online detection system combined with closed-loop electrical stimulation of the ventral hippocampal commissure. Notably, this manipulation appeared not to affect performance in all three tasks, as determined using various behavioral parameters. Trials with no stimulation or delayed stimulation serve as controls. Thus, the authors conclude that awake SWRs are not involved in these short-term memory-guided behaviors. I do have a few comments that the authors should discuss or address:

      (1) This study adds to a large number of studies investigating the role of awake SWRs in spatial learning and memory tasks. The results of these previous studies are quite contradictory and range from awake SWRs are not crucial in guiding decisions at all to SWRs are only essential during task rule learning to SWRs do guide behavior. Could the authors comment on these seemingly contradictory results? Why are these experiments now the right ones?

      The reviewer is correct that there is a large body of literature investigating awake SWRs. Most commonly, interpretations about the role of SWRs and associated replay are made based on correlations of their occurrence with behavior. These correlations do, however, not necessarily indicate that SWRs contribute to a particular cognitive process. That is why interventional studies like ours are important to clarify the contribution of SWRs.

      The acquisition of a novel task involves a number of cognitive processes, including short- and long-term memory, building a map of the environment, exploration of the solution space and incorporating (non-)rewarding feedback. Based on available evidence, SWRs could contribute to many of these processes. Our experiments were designed to exclude the long-term memory aspect and focus on the memorization of locations on a short time-scale which as we now demonstrate is not dependent on SWRs. Since the use of short-term spatial memory is one of the possible explanations for the learning deficit seen by Jadhav et al. (2012) following SWR disruption in an alternation task, our results may also narrow down the exact contribution of SWR in these studies.

      (2) None of the experiments presented here test the role of replay. I suggest making this distinction in the paper and the title clear. As the results are presented now, is it possible that the SWR content is not affected sufficiently to have a behavioral effect or that there is a bias towards detecting specific SWRs, e.g., longer SWRs?

      The reviewer is right that our experiments do not say anything about replay directly. We adapted the text to make this distinction clear.

      We address the possibility that SWR content may not be disrupted sufficiently to cause a behavioral effect in response to recommendation 1.

      Reviewer #3 (Public Review):

      In this manuscript, the authors seek to shed light on the role of awake hippocampal replay during memory tasks that are claimed to be short-term memory. For this, they make use of a real-time detection and disruption system of awake hippocampal ripples, which are used as a proxy for awake neuronal replay. The manuscript describes extensively the tasks as well as the disruption system and controls used during the experiments. The authors present numerous and solid analyses of the behavioral data acquired during the tasks. Nonetheless, the current version of the manuscript is lacking a more complete discussion in which the results are contrasted to previous similar findings, as well as mentioning the role of the awake ripple in the stabilization of hippocampal maps. Some extra analyses are also suggested below. The manuscript would also be enriched if the authors suggested alternative mechanisms for memory rehearsal. Finally, some claims of "we are first" seem inappropriate when compared to the previous literature.

      Major comments:

      How does one define short-term memory (STM) in rodents? The examples and papers cited in the first paragraphs refer mostly to human working memory tasks, from which it is known that a non- rehearsed STM lasts typically 20-30 seconds. Could the authors mention how this concept is translated to rodents? Could you clarify until what point memory is considered STM and what is the criteria to consider it has turned into long-term memory or when is it simply working memory or habit/skill?

      We agree with the reviewer that the definition of short-term memory is fluid and may differ between researchers and model systems. To avoid confusion, we reframed our study in a different context and hope that this makes the timeframes we are talking about clearer.

      Further, why should these tasks be classified as testing STM while Jadhav et al. tasks are working memory or as they now mention in this article rule learning?

      Note that short-term memory and working memory are closely related, but not identical, concepts. Whereas short-term memory refers to the retaining of information for a short period of time, working memory is generally considered to also include some manipulation of that information. Unfortunately, in the rodent literature, (spatial) working memory and short-term memory are often used interchangeably.

      Many (animal) spatial memory tasks do not test a single cognitive faculty, but likely involve a combination of short-term memory, working memory, and rule learning (among other abilities) to acquire or solve the task. As such, an unequivocal classification of behavioral tasks is not generally possible. For example, in the continuous version of the spatial alternation task used in Jadhav et al., animals may learn the rule “if I in the center arm and I came from the left goal arm, then I will next find reward in the right goal arm”. The execution of this rule would require maintaining in (short-term) memory the most recent visited goal arm. Alternatively, animals may learn the rule to turn left twice and right twice to successfully perform the task.

      One of our goals in our study was to attempt to isolate rule learning components and short-term memory components in our tasks (to be clear: we are not claiming that our tasks are pure short- term memory tasks).

      We have rewritten the introduction to reframe our study, which hopefully clarifies the points above.

      In humans, the retention of memory after a certain time is achieved by retrieving a long-term memory. How do we know if the considerable training the rats received has not allowed the use of a long-term memory strategy which allows the rats to perform well even in the absence of rehearsal (replay)? These are conceptual explanations that would help understand the key concept of STM in greater detail.

      Our experiments aimed to distinguish between the process of learning general task rules through training and the need to retain information specific to each trial or session. For example, in the NMTS task, the animals may have a long-term memory of the overall task design, but they cannot anticipate or recall in advance which specific arms will be baited in the instruction phase since they vary from one trial to another. Therefore, to complete a trial successfully, the animals must have formed some type of (short-term) memory of the instruction arms and/or of the arms that still need to be visited in the test phase. Although extended training may have resulted in a more optimized and less demanding strategy to memorize the necessary information, evidence in the literature indicates that even then (for this particular task), a functional hippocampus is required (Sasaki 2021). The question we address in our experiments is whether hippocampal SWRs (and by association, replay) are instrumental in the formation or maintenance of this memory, whether through rehearsal or other mechanisms. The rewritten introduction explains these concepts more clearly.

      Further, claims of "first" should be adjusted, since I do not see a large difference between the w (m) maze of Jadhav and these tasks. The main difference between the two projects would rather be that Jadhav tests when animals are still newer to the task while here overtrained animals are used. In Jadhav, it's unlikely that just rule learning is affected since the inbound component is not affected by disruption, which also tests rule learning. Therefore, it is still likely that the effect seen in Jadhav et al is a deficit in working memory/short-term memory. And here it is more likely, that no effect was seen since with overtrained animals other strategies (cortical, striatal, etc) were used. The authors should compare in more detail how overtrained animals were in these different projects as well as in the articles they cite for replay analysis.

      The training of the animals on the general task rules prior to SWR disruption manipulations is by design, as it better isolates the short-term memory demands required to solve the task in each trial/session. In our tasks, the rats are required to memorize a randomly chosen combination of goal arms on each day (MTS & SEQ task) or in every trial (NMTS task). Unlike the continuous alternation paradigm used by Jadhav et al. (2012), our tasks can not be solved using a stereotypical or habitual (striatal) strategy that is acquired through extended training. We can not exclude that the rats acquired an optimized and less cognitively demanding strategy that is mainly dependent on cortical structures outside the hippocampus, however evidence in the literature still indicates the requirement for a functional hippocampus (Sasaki, 2021; Okaichi and Oshima 1990; Blokland, Honig, and Raaijmakers, 1992).

      The reviewer is correct that the inbound component of the continuous alternation task in Jadhav et al. (2012) can be considered rule learning and was not affected by SWR disruption. However, we do not believe that this should be generalized to all rule learning and it is very well conceivable that SWRs contribute to the learning of more complex rules that also feature ambiguity (such as the outbound component in the continuous alternation task). We elaborate on these points in the discussion (lines 425-455).

      The main conclusion of the authors is that hippocampal replay is not the rehearsal mechanism expected in STM given that its disruption doesn't lead to behavioral changes. Could the authors hypothesize in their discussion what other neural mechanisms different from hippocampal replay may be involved in this rehearsal?

      Thank you for this suggestion. We added an extra paragraph speculating on this aspect (lines 499- 518).

      The discussion also lacks closure with respect to how the findings fit in the study of STM in human memory. This would make the article more interesting to a larger audience and highlight its translational aspect.

      We agree with the reviewer and added our insight to the discussion.

      The results describe deeply the behavioral performance of the rats and the validation of the ripple detection/disruption system. However, one important aspect missing is how the hippocampal activity and its encoding of space may be affected by the awake ripple disruption. The authors don't cite the work by Roux et al., Nature Neuroscience. 2017 where optogenetic stimulation of hippocampal neurons provided evidence that neuronal activity associated with awake hippocampal ripples during goal-directed behavior is required for both stabilizing and refining hippocampal place fields, while memory performance was not affected during ripple-locked stimulations compared to a ripple-delayed stimulation control (See supplementary Figure 7 of the mentioned article). I would like the authors to comment on their own findings and contrast them with those of Roux et al.

      We agree that it is interesting to include the results of Roux et al. in our discussion (lines 470 and 463-466).

      Line 64: Could the authors clarify what they mean by "indirect" causal evidence when discussing the contribution of papers by Jadhav, Igata, and Fernandez? Is it the fact that rodents' learning speed changed instead of showing a complete absence of learning? Or is it the fact that the disruption/prolongation is done on the hippocampal ripple and not strictly in the replay sequence?

      We apologize for the confusion and rewrote large parts of the introduction to clarify the contributions of the papers by Jadhav, Igata, and Fernandez and the difference with what our manipulations contribute. In the process, we removed the phrase ‘indirect causal evidence’.

      I would also highlight this latter difference, given that the above-mentioned authors describe their methodological approaches in terms of ripples and not in terms of replay content. For example, the use of "replay" instead of "ripple" in Line 61 results in methodological inaccurate terms such as replay disruption and replay prolongation.

      Thank you for pointing this out. We adapted the manuscript to always use ‘ripple’ or ‘sharp-wave ripple’ (SWR) when describing our results.

      Despite its apparent lack of statistical significance, the reported mean ripple detection rate during the trial and non-trial periods tend to be always higher in the disruption condition of all tasks by observing the median of the boxplots in Figure 1J, Figure 2H, and Figure 3J. It is worth investigating this further using the same linear regression method as Girardeau et al. Journal of Neuroscience, 2014 which may reduce the variability and allow comparing slopes of a cumulative number of ripples over time. This may reveal a compensatory homeostatic-like increase in the rate of ripples during the disrupted sessions, which may suggest a need for the ripple/replay occurrence in spite of it not having an effect on the rats' performance during the task.

      The reviewer makes an interesting observation and we appreciate the suggestion for further investigation. However, note that a clear trend for higher ripple rates in disruption trials/sessions is not present when comparing to non-stimulated control trials/session. Part of the variability in the observed ripple rates is likely due to the variability in the animals’ behavioral state (e.g., moving, pausing but alert, grooming, consuming reward) and the corresponding varying propensity for SWRs to occur. The behavioral variability makes application of the linear regression approach of Girardeau et al. (2014) not straightforward (note that Girardeau et al. looked at SWRs during sleep). For these reasons, we have decided to not further look into the potential disruption-induced increase of the SWR rate.

      In line 425, the authors report a median relative delay of 52.9 of their disruption system. Such a value would indicate that only around 47% of the ripple is being blocked. Is there any data from the authors or others that could reassure the reader that the 52.9% of the ripple that "leaks" is not enough for the replay phenomenon to occur? Considering the findings of Fernandez-Ruiz et al. 2019 on large-duration ripples, could the authors report the relative delay for both short and long ripples (>100 ms) separately?

      The reviewer is correct that the initial part (~35 ms) of SWRs remains intact, which is inherent to the online detection and disruption approach. In relative terms, a larger fraction of long SWRs is disrupted. As requested, we have adapted figure 4c to separately show the distribution of relative detection delays for long (duration >100ms) and short SWRs.

      As we and others have shown, the electrical stimulation temporarily suppresses spiking activity in CA1 and thus abruptly interferes with any ongoing replay, but any beginning of replay sequences before the stimulation will not be affected. Previous studies that use the same methodology to disrupt SWRs reported a behavioral performance deficit despite the detection delays (Michon et al. 2019; Girardeau et al. 2009; Jadhav et al. 2012). This suggests that the initial part of SWRs (and replay) is not sufficient to support the behavior. The delays in the current study are quantitatively similar to what we have reported before in Michon et al. (2019) and thus we are confident that we should have been able to observe a behavioral effect if present. We now elaborate on this topic in the Discussion (lines 489-498) .

      Line 494: The authors define long ripples as (>120 ms) but this doesn't coincide with the 100ms threshold from Fernandez Ruiz et al. 2019.

      Thank you for pointing this out, it is corrected in the text both in the Results (line 389) and Discussion (line 486).

      The online ripple detector used filtered the traces in the 135-255 Hz range. This is a narrower frequency range compared to online detectors used by Jadhav et al. 2012 (100-400 Hz) and Fernandez-Ruiz et al. 2019 (80-300 Hz). What motivated the use of this narrow range? Would the omittance of ripples below 135 Hz have implications in the results? Could the authors add to the supplement a figure similar to Figure 4B (FDR vs TPR) using a wider frequency range similar to the authors above in the offline detection of ripples?

      The frequency of hippocampal ripple oscillation in rat generally lies in the range of 160-225 Hz (Buzsaki, 1992). We have added a power spectrum in Figure 1d that confirms this frequency range in our experiments. Filters that include frequencies below this range (as in the studies referenced by the reviewer) likely also pass through high-frequency gamma oscillations, and filters that include frequencies above this range likely also pass through multi-unit spiking activity. The challenge for a real-time ripple detection system is to design a filter that has an acceptable trade-off between filtering in a specific (narrow) frequency range and introducing a long delay. In our study, we specifically designed a filter that is specific to the ripple frequency band and still has an acceptable low delay.

      It is unclear what criterion was used to train the rats in the NMTS task. Line 216 specifies a learning criterion of 80% fully correct trials in one session for three days in a row, while the methods in line 852 mention an average performance below 50% for at least three days in a row.

      Thank you for pointing this out. We corrected the learning criterium description in the results section (lines 108-110) to match the description in the Methods section.

      In the methods section, it is not mentioned if there was a specific region in the cortex where the tetrode was placed (Line 908).

      The detections in this tetrode were used to mark events as "false positives". The authors should be careful in line 933 when they make the statement "ripples are not present in the cortex". There have been recent publications that challenge this affirmation. See Khodagholy, Science. 2017, Nitzan, Nature Comm. 2020.

      Thank you for pointing this out. We have added the cortical region in the methods (line 882) and clarified that, as far as we know, no ripples in that part of the cortex (parietal associate cortex) have been described that are synchronous with hippocampal ripples.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a useful characterization of the biochemical consequences of a disease-associated point mutation in a nonmuscle actin. The study uses solid and well-characterized in vitro assays to explore function. In some cases the statistical analyses are inadequate and several important in vitro assays are not employed.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The authors first perform several important controls to show that the expressed mutant actin is properly folded, and then show that the Arp2/3 complex behaves similarly with WT and mutant actin via a TIRF microscopy assay as well as a bulk pyrene-actin assay. A TIRF assay showed a small but significant reduction in the rate of elongation of the mutant actin suggesting only a mild polymerization defect.

      Based on in silico analysis of the close location of the actin point mutation and bound cofilin, cofilin was chosen for further investigation. Faster de novo nucleation by cofilin was observed with mutant actin. In contrast, the mutant actin was more slowly severed. Both effects favor the retention of filamentous mutant actin. In solution, the effect of cofilin concentration and pH was assessed for both WT and mutant actin filaments, with a more limited repertoire of conditions in a TIRF assay that directly showed slower severing of mutant actin.

      Lastly, the mutated residue in actin is predicted to interact with the cardiomyopathy loop in myosin and thus a standard in vitro motility assay with immobilized motors was used to show that non-muscle myosin 2A moved mutant actin more slowly, explained in part by a reduced affinity for the filament deduced from transient kinetic assays. By the same motility assay, myosin 5A also showed impaired interaction with the mutant filaments.

      The Discussion is interesting and concludes that the mutant actin will co-exist with WT actin in filaments, and will contribute to altered actin dynamics and poor interaction with relevant myosin motors in the cellular context. While not an exhaustive list of possible defects, this is a solid start to understanding how this mutation might trigger a disease phenotype.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      • Potential assembly defects of the mutant actin could be more thoroughly investigated if the same experiment shown in Fig. 2 was repeated as a function of actin concentration, which would allow the rate of disassembly and the critical concentration to also be determined.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for wt (Figure 5A, Table 1).

      • The more direct TIRF assay for cofilin severing was only performed at high cofilin concentration (100 nM). Lower concentrations of cofilin would also be informative, as well as directly examining by the TIRF assay the effect of cofilin on filaments composed of a 50:50 mixture of WT:mutant actin, the more relevant case for the cell.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      • The more appropriate assay to determine the effect of the actin point mutation on class 5 myosin would be the inverted assay where myosin walks along single actin filaments adhered to a coverslip. This would allow an evaluation of class 5 myosin processivity on WT versus mutant actin that more closely reflects how Myo5 acts in cells, instead of the ensemble assay used appropriately for myosin 2.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. We expect only a small incremental gain in knowledge about the extent of changes by performing additional experiments with an inverted assay geometry, given that under physiological conditions the motor properties of Myo5A and other cytoskeletal myosins are modulated by other factors such as the presence of tropomyosin isoforms and other actin binding proteins.

      Reviewer #2 (Public Review):

      Greve et al. investigated the effects of a disease-associated gamma-actin mutation (E334Q) on actin filament polymerization, association of selected actin-binding proteins, and myosin activity. Recombinant wildtype and mutant proteins expressed in sf9 cells were found to be folded and stable, and the presence of the mutation altered a number of activities. Given the location of the mutation, it is not surprising that there are changes in polymerization and interactions with actin binding proteins. Nevertheless, it is important to quantify the effects of the mutation to better understand disease etiology.

      We thank the reviewer for the positive evaluation of our work.

      Some weaknesses were identified in the paper as discussed below.

      • Throughout the paper, the authors report average values and the standard-error-of-the-mean (SEM) for groups of three experiments. Reporting the SEM is not appropriate or useful for so few points, as it does not reflect the distribution of the data points. When only three points are available, it would be better to just show the three different points. Otherwise, plot the average and the range of the three points.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was inaccurate. We corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E showed the mean ± SEM. As suggested by the reviewer, we corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The description and characterization of the recombinant actin is incomplete. Please show gels of purified proteins. This is especially important with this preparation since the chymotrypsin step could result in internally cleaved proteins and altered properties, as shown by Ceron et al (2022). The authors should also comment on N-terminal acetylation of actin.

      We added an additional figure showing the purification strategy for the recombinant cytoskeletal γ –actin WT and p.E334Q protein with exemplary SDS-gels from different stages of purification (Figure 1 – figure supplement 1).

      In a previous paper, we reported the mass spectrometric analysis of the post-translational modifications of recombinant human β- and γ-cytoskeletal actin produced in Sf-9 cells. (Müller et al., 2013, Plos One). Recombinant actin showing complete N-terminal processing resulting in cleavage of the initial methionine and acetylation of the following aspartate (β-actin) or glutamate (γ-actin) is the predominant species in the analyzed preparations (> 95 %). While the recombinant actin in the 2013 study was produced tag-free and purified by affinity chromatography using the column-immobilized actin-binding domain of gelsolin (G4-G6), we have no reason to assume that the purification strategy using the actin-thymosin-β4 changes the efficiency of the N-terminal processing in Sf-9 cells. This is supported by our, yet unpublished, mass-spectrometric studies on recombinant human α-cardiac actin purified using the actin- thymosin-β4 fusion construct, which revealed actin species with an acetylated aspartate-3. This N-terminal modification of α-cardiac actin is catalyzed by the same actinspecific acetyltransferase (NAA80) as the acetylation of asparate-2 or glutamate-2 in cytoskeletal actin isoforms (Varland et al., 2019, Trends in Biochemical Sciences). Furthermore, additional studies that used the actin-thymosin-β4 fusion construct for the production of recombinant human cytoskeletal actin isoforms in Pichia pastoris reported robust N-terminal acetylation, when the actin was co-produced with NAA80 (In contrast to Sf-9 cells, NAA80 is not endogenously expressed in Pichia pastoris) (Hatano et al., 2020, Journal of Cell Science).

      We therefore, added the following statement to the manuscript:

      “Purification of the fusion protein by immobilized metal affinity chromatography, followed by chymotrypsin–mediated cleavage of C–terminal linker and tag sequences, results in homogeneous protein without non–native residues and native N-terminal processing, which includes cleavage of the initial methionine and acetylation of the following glutamate. “

      • The authors do not use the best technique to assess actin polymerization parameters. Although the TIRF assay is excellent for some measurements, it is not as good as the standard pyrene-actin assays that provide critical concentration, nucleation, and polymerization parameters. The authors use pyrene-actin in other parts of the paper, so it is not clear why they don't do the assays that are the standard in the actin field.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for WT (Figure 5A, Table 1).

      • The authors' data suggest that, while the binding of cofilin-1 to both the WT and mutant actins remains similar, the major defect of the E334Q actin is that it is not as readily severed/disassembled by cofilin. What is missing is a direct measurement of the severing rate (number of breaks per second) as measured in TIRF.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Figure 4 shows that the E334Q mutation increases rather than decreases the number of filaments that spontaneously assemble in the TIRF assay, but it is unclear how reduced severing would lead to increased filament numbers, rather, the opposite would be expected. A more straightforward approach would be to perform experiments where severing leads to more nuclei and therefore enhances the net bulk assembly rate.

      Figure 4 shows polymerization experiments that were started from ATP-G-actin in the presence of cofilin-1. These experiments show clearly that, especially at the higher cofilin-1 concentration (100 nM), the filament number is strongly increased in experiments performed with mutant actin. Inspection of the corresponding videos of these TIRFM experiments suggest that the increased number of filaments must result from an increased number of de novo nucleation events and not primarily from a mutation-induced change in severing susceptibility. The observation of a cofilin-stimulated increase in the de novo nucleation efficiency of actin was initially described by Andrianantoandro & Pollard (2006, Molecular Cell) using TIRFMbased experiments and is thought to arise from the stabilization of thermodynamically unfavorable actin dimers and trimers by cofilin. While the exact role of this cofilin-mediated effect in vivo is not completely clear, it is thought to contribute to cofilin-meditated actin dynamics synergistically with cofilin-mediated severing. It is therefore necessary, to clearly distinguish between the two effects of cofilin in vitro: stimulation of de novo nucleation and stimulation of filament disassembly. Our data indicated that the E334Q mutation affects these two effects differentially, as we state in the abstract and in the discussion.

      Abstract: “E334Q differentially affects cofilin-mediated actin dynamics by increasing the rate of cofilin-mediated de novo nucleation of actin filaments and decreasing the efficiency of cofilin-mediated filament severing.”

      Discussion: “Cofilin-mediated severing and nucleation were previously proposed to synergistically contribute to global actin turnover in cells (Andrianantoandro & Pollard, 2006; Du & Frieden, 1998). Our results show that the mutation affects these different cofilin functions in actin dynamics in opposite ways. Cofilin-mediated filament nucleation is more efficient for p.E334Q monomers, while cofilin-mediated severing of filaments containing p.E334Q is significantly reduced. The interaction of both actin monomers and actin filaments with ADF/cofilin proteins involves several distinct overlapping reactions. In the case of actin filaments, cofilin binding is followed by structural modification of the filament, severing and depolymerizing the filament (De La Cruz & Sept, 2010). Cofilin binding to monomeric actin is followed by the closure of the nucleotide cleft and the formation of stabilized “long-pitch” actin dimers, which stimulate nucleation (Andrianantoandro & Pollard, 2006)”.

      We interpret the reviewer's suggestion to mean that additional pyrene-actin-based bulk polymerization experiments should be performed to investigate the bulk-polymerization rate of ATP-G-actin in the presence of cofilin-1. In our understanding, these experiment would not provide additional value as 1) An observed increase of the bulk-polymerization rate cannot be directly correlated to a change of the efficiency of de novo nucleation or severing and 2) the effect of the mutation on cofilin-mediated filament disassembly was extensively analyzed in other experiments starting from preformed actin filaments. Moreover, our results are consistent with in silico modelling and normal mode analysis of the WT and mutant actin-cofilin complex.

      • Figure 5 A: in the pyrene disassembly assay, where actin is diluted below its critical concentration, cofilin enhances the rate of depolymerization by generating more free ends. The E334Q mutation leads to decreased cofilin-induced severing and therefore lower depolymerization. While these data seem convincing, it would be better to present them as an XY plot and fit the data to lines for comparison of the slopes.

      We now present the data as suggested by the reviewer. Furthermore, we determined the apparent second-order rate constant for cofilin-induced F-actin depolymerization (kc) to quantify the observed differences between WT, mutant and heterofilaments, as suggested by the reviewer.

      The paragraph describing these results was changed accordingly:

      “The observed rate constant values are linearly dependent on the concentration of cofilin–1 in the range 0–40 nM, with the slope corresponding to the apparent second– order rate constant (kC) for the cofilin-1 induced depolymerization of F–actin. In experiments performed with p.E334Q filaments, the value obtained for kC was 4.2-fold lower (0.81 × 10-4 ± 0.08 × 10-4 nM-1 s-1) compared to experiments with WT filaments (3.42 × 10-4 ± 0.22 × 10-4 nM-1 s-1). When heterofilaments were used, the effect of the mutation was reduced to a 2.2-fold difference compared to WT filaments (1.54 × 10-4 ± 0.11 × 10-4 nM-1 s-1).”

      • Figure 5 B and C: the cosedimentation data do not seem to help elucidate the underlying mechanism. While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance. Importantly, example gels from these experiments should be shown, if not the complete set included in the supplement. In B, the higher cofilin concentrations would be expected to stabilize the filaments and thus the curve should be Ushaped.

      We do not completely agree with the reviewer on this point. We think the co-sedimentation experiments are useful, as they show that cofilin-1 efficiently binds to mutant filaments, but is less efficient in stimulating disassembly in these endpoint-experiments. This information is not provided by the analysis of the effect of cofilin-1 on the bulk-depolymerization rate and adds to our understanding of the defect of the actin-cofilin interaction for the mutant.

      While we agree with the reviewer on the point that co-sedimentation experiments must be repeated several times to produce reliable data, we cannot fully grasp the reasoning behind the statement “While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance.”. We interpret this statement as advice to be cautious when extrapolating the observed perturbances of cofilin-mediated actin dynamics in vitro to the in vivo context. We think we are cautious about this throughout the manuscript.

      The author expects a U-shape curve, as high cofilin concentrations are reported to stabilize actin filaments by completely decorating the filament before severing-prone boundaries between cofilin-decorated and undecorated regions are generated. We have also performed these experiment with cytoskeletal β-actin and human cofilin-1 and never observed this U shape. This indicates that significant filament disassembly also happens at high cofilin concentrations, most likely directly after mixing of F-actin and cofilin. We cannot rule out that the incubation time plays an important role and that the U-shape only appears after longer incubation times. We also want to direct the reviewer to the publication “A Mechanism for Actin Filament Severing by Malaria Parasite Actin Depolymerizing Factor 1 via a Low Affinity Binding Interface” (Wong et al. 2013, JBC) in which comparable co-sedimentation experiments were performed (Figure 5E-G) with rabbit skeletal α-actin and human cofilin-1 and also no Ushaped curves were observed, even at higher molar excess of cofilin-1 compared to our experiments and with longer incubation times (1 hour vs. 10 minutes).

      We now included an exemplary gel showing co-sedimentation experiments performed with WT, mutant actin and different concentrations of cofilin at pH 7.8 in the manuscript (Figure 5 – figure supplement 2)

      • Figure 5 D: these data show that the binding of cofilin to WT and E334Q actin is approximately the same, with the mutant binding slightly more weakly. It would be clearer if the two plots were normalized to their respective plateaus since the difference in arbitrary units distracts from the conclusion of the figure. If the difference in the plateaus is meaningful, please explain.

      As suggested by the reviewer, we normalized the data for a better understanding of the message conveyed.

      • Figure 6: It is assumed that the authors are trying to show in this figure that cofilin binds both actins approximately the same but does not sever as readily for E334Q actin. The numerous parameters measured do not directly address what the authors are actually trying to show, which presumably is that the rate of severing is lower for E334Q than WT. It is therefore puzzling why no measurement of severing events per second per micron of actin in TIRF is made, which would give a more precise account of the underlying mechanism.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Actin-activated steady-state ATPase data of the NM2A with mutant and WT actin would have been extremely useful and informative. The authors show the ability to make these types of measurements in the paper (NADH assay), and it is surprising that they are not included for assessing the myosin activity. It may be because of limited actin quantities. If this is the case, it should be indicated.

      Indeed, the measurement of the steady-state actin-activated ATPase with recombinant cytoskeletal actin is very material-intensive and therefore costly, as a complete titration of actin is required for the generation of meaningful data. Since the vast majority of our assays involving a myosin family member were performed with NM2A-HMM, we decided to perform a full actin titration of the steady-state actin-activated ATPase of NM2A-HMM with WT and mutant filaments. The results of these experiments are now shown in Figure 8C. The panel showing the results used for determining the dissociation rate constants (k-A) for the interaction of NM2C-2R with p.E334Q or WT γ –actin in the absence of nucleotide was moved to the supplement (Figure 8 – figure supplement 2).

      We added the following paragraph to the Material and Methods section concerning the Steady-State ATPase assay:

      “For measurements of the basal and actin–activated NM2A–HMM ATPase, 0.5 µM MLCKtreated HMM was used. Phalloidin–stabilized WT or mutant F-actin was added over the range of 0–25 µM. The change in absorbance at 340 nm due to oxidation of NADH was recorded in a Multiskan FC Microplate Photometer (Thermo Fisher Scientific, Waltham, MA, USA). The data were fitted to the Michaelis-Menten equation to obtain values for the actin concentration at half-maximal activation of ATP-turnover (Kapp) and for the maximum ATP-turnover at saturated actin concentration (kcat).”

      Furthermore, we added a description of the results of the experiments to the Results section of the manuscript:

      “Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at half-maximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1).”

      • (line 310) The authors state that they "noticed increased rapid dissociation and association events for E334Q filaments" in the motility assay. This observation motivates the authors to assess actin affinities of NM2A-HMM. Although differences in rigor and AM.ADP affinities are found between mutant and WT actins, the actin attachment lifetimes (many minutes) are unlikely to be related to the rapid association and dissociation event seen in the motility assay. Rather, this jiggling is more likely to be related to a lower duty ratio of the myosins, which appears to be the conclusion reached for the myosin-V data. These points should be clarified in the text.

      We changed the text in accordance with the reviewer’ suggestion. It reads now: Cytoskeletal –actin filaments move with an average sliding velocity of 195.3 ± 5.0 nm s–1 on lawns of surface immobilized NM2A–HMM molecules (Figure 8A, B). For NM2A-HMM densities below about 10,000 molecules per μm2, the average sliding speed for cytoskeletal actin filaments drops steeply (Hundt et al, 2016). Filaments formed by p.E334Q actin move 5fold slower, resulting in an observed average sliding velocity of 39.1 ± 3.2 nm/s. Filaments copolymerized from a 1:1 mixture of WT and p.E334Q actin move with an average sliding velocity of 131.2 ± 10 nm s–1 (Figure 8A, B). When equal densities of surface-attached WT and mutant filaments were used, we observed that the number of rapid dissociation and association events increased markedly for p.E334Q filaments (Figure 8 – video supplement 7– 9).

      Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at halfmaximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1). To investigate the impact of the mutation on actomyosin–affinity using transient–kinetic approaches, we determined the dissociation rate constants using a single–headed NM2A–2R construct (Figure 8D). …..

      • (line 327) The authors report that the 1/K1 value is unchanged. There are no descriptions of this experiment in the paper. I am assuming the authors measured the ATP-induced dissociation of actomyosin and determined ATP affinity (K1) from this experiment. If this is the case, they should describe the experiment and show the data, provide a second-order rate constate for ATP binding, and report the max rate of dissociation (k2). This is a kinetic experiment done frequently by this group, so the absence of these details is surprising.

      In the previous version of the manuscript, the method used to determine 1/K1 (ATP-induced dissociation of the actomyosin complex) was described in the Material and Methods paragraph “Transient kinetic analysis of the actomyosin complex” and the values obtained for 1/K1 were given in Table 1. We now included the experimental data as an additional figure in the manuscript (Figure 8 – figure supplement 3). Furthermore, we also give the maximal dissociation rate k+2 and the apparent second-order rate constant for ATP-binding (K1k+2) for the WT and mutant actomyosin complex in Table 1. Therefore, we changed the paragraph in the Results section concerning this experiment to:

      “The apparent ATP–affinity (1/K1), the maximal dissociation rate of NM2A from F-actin in the presence of ATP (k+2), and the apparent second-order rate constant of ATP binding (K1k+2) showed no significant differences for complexes formed between NM2A and WT or p.E334Q filaments (Table 1, Figure 8 – figure supplement 3).”

      and the section in the Material and Methods to:

      “The apparent ATP–affinity of the actomyosin complex was determined by mixing the apyrase–treated, pyrene–labeled, phalloidin–stabilized actomyosin complex with increasing concentrations of ATP at the stopped–flow system. Fitting an exponential function to the individual transients yields the ATP–dependent dissociation rate of NM2A–2R from F–actin (kobs). The kobs–values were plotted against the corresponding ATP concentrations and a hyperbola was fitted to the data. The fit yields the apparent ATP–affinity (1/K1) of the actomyosin complex and the maximal dissociation rate k+2.

      The apparent second–order rate constant for ATP binding (K1k+2) was determined by applying a linear fit to the data obtained at low ATP concentrations (0 – 25 µM).”

      For a better understanding of the numerous rate and equilibrium constants, we have now included a figure showing the kinetic reaction scheme of the myosin ATPase cycle (Figure 8 – figure supplement 1).

      Recommendations for the authors:

      Reviewer #1:

      • The subdomains of actin are mislabeled in Fig. 1A.

      The labeling of the subdomains has been corrected.

      • Additional experimental data addressing the 3 weaknesses noted in the public review would be informative but are not essential in my opinion. Examining the effect of cofilin on severing by the TIRF assay in more detail and using a processivity assay for myosin V (immobilized actin) would be the two aspects I would most value.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. Given that Myo5A is only one of many cytoskeletal myosin motors and that the motor properties of all myosins are modulated by the presence of tropomyosin isoforms and other actin binding proteins, we expect only a small incremental gain in knowledge by performing additional experiments with an inverted assay geometry.

      Reviewer #2:

      • The authors should address the concerns regarding the statistical methodologies.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was wrong and we corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E indeed showed the mean ± SEM. As the reviewer rightly points out, this is not the appropriate way to deal with such sample sizes. We therefore corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The authors should present the actin titration of the steady state ATPase activity for at least one of the myosins, or preferably all of them.

      An actin titration of the steady state ATPase activity of NM-2A has been included in the revised version of the manuscript (Fig 8C).

      • The authors should consider the use of pyrene-actin in measuring the assembly/disassembly of actin.

      Values for the rate of actin assembly/disassembly measured with pyrene-actin are given in Table 1. Based on the small changes observed, we did not determine the critical actin concentration for the mutant construct.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We thank reviewer #1 for identifying the major caveats of the paper, and have split them out into separate comments below to address them.

      Comment 1) The caveats are that ecosystem processes beyond water availability are not investigated although they are brought into play in the title and in the paper

      Author response: We disagree that water availability is the only ecosystem process investigated in this study, as herbivory, plant mortality, and the maintenance of diversity in higher trophic levels are important processes within ecosystems. We have added text to the abstract and introduction clarifying that we consider these response measures to be ecosystem processes. Further language to this effect already exists in the abstract, methods, and discussion.

      Comment 2) That herbivory beyond leaf damage was not reported (there might be none, the reader needs to be shown the evidence for this)

      Author response: This is typically how herbivory is assessed in ecological studies, and our focus is on folivores. There may be additional herbivory in the form of fluid-sucking insects, shoot/root herbivory, etc., but these were not assessed. It would be interesting to assess these other forms of herbivory to see if they respond similarly with additional studies.

      Comment 3) That herbivore diversity is defined by leaf damage (authors need to give evidence that this is a valid inference)

      Author response: We thank reviewer #1 for pointing out the lack of written support for this claim. We have modified the methods (lines 138-139; 214-217) to clarify that this is a useful proxy for insect richness in the Piper system, and have added citations demonstrating it has been found to correlate well with insect richness in tropical forests.

      Comment 4) That the plots were isolated from herbivores beyond their borders

      Author response: This was not an assumption of the study. We have modified the methods (line 200) to make this clearer to the reader.

      Comment 5) That the effects of extreme climate events were isolated to Peru

      Author response: This was not an assumption of the study, rather it is an observation. While we consider it important to include observed climate differences between sites in the interpretation of our results, it was not necessary for there to be extreme climate events at other sites as we consider manipulated water availability to represent changes in precipitation that are expected to occur at these sites with climate change.

      Comment 6) That intraspecific variation in the host plants needs to be explained and interpreted in more detail

      Author response: We thank reviewer #1 for identifying that our current explanations needed development. We have modified the introduction to explore potential mechanisms relating intraspecific diversity to ecosystem function based on recent studies, and have modified the discussion to bring focus to why the effects of intraspecific differ from interspecific.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1) Pare this material down to simpler results. The most significant to me is the intraspecific variation in damage. Were this broken out and reported in some detail it could be quite interesting. I find the results to be a confusing blizzard of multiple factors that differ among sites; after reading the paper twice I could not recall the takeaway lesson beyond that drought wrecks the diversity of herbivores and sometimes even kills the host plant.

      Author response: We agree that the results are complicated given the variation in effects among sites, but this variation and complexity is important – and is in itself is one of the takeaway points. Unfortunately, nature is not simple. We have made several large edits to the results section, including the removal of methodological and otherwise redundant information, to hopefully bring the major takeaways into focus.

      Reviewer #2 (Public Review):

      Comment 1) This is an important and large experimental study examining the effects of plant species richness, plant genotypic richness, and soil water availability on herbivory patterns on Piper species in tropical forests.

      A major strength is the size of the study and the fact that it tackled so many potentially important factors simultaneously. The authors examined both interspecific plant diversity and intraspecific plant diversity. They crossed that with a water availability treatment. And they repeated the experiment across five geographically separated sites.

      The authors find that both water availability and plant diversity, intraspecific and interspecific, influence herbivore diversity and herbivory, but that the effects differ in important ways across sites. I found the study to be solid and the results to be very convincing. The results will help the field grapple with the importance of environmental change and biodiversity loss and how they structure communities and alter species interactions.

      Author response: We thank reviewer #2 for their kind words.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1) I was confused about why the authors measured species diversity/richness as a proportion of the species pool. This means that the metric of richness decreases if species are added to the species pool but not the plot/experiment. I think I understand it, but I suggest the authors explain this choice.

      Author response: We thank reviewer #2 for pointing out that this was confusing. We have clarified the methods (lines 228-232) to explain that this choice was made to allow easier comparison between intra- and interspecific richness.

      Comment 2) One of the stronger estimated relationships was a positive effect of plant species richness on insect richness. I found it a little hard to interpret this relationship. Is this just because there are host species specialists? So, with more host species there are more herbivore species? Or does insect richness increase multiplicatively with increasing plant species richness? One way to look for this would be for the authors to examine the relationship between plant species richness and the average number of herbivore damage types per plant species.

      Author response: We agree that this is important for the reader to understand and have added text to the introduction and discussion sections explaining that this is the expectation based on theory and other empirical studies. We have additionally added text to the discussion (lines 386-388) pointing out that this pattern was not observed at all sites. While we agree that it would be interesting to explore if this effect was additive or multiplicative, we do not believe this is in the scope of the paper due to the methods used to measure insect richness.

      Comment 3) Unless I missed it, some important information about the models was missing. E.g., what distributions were assumed for each of the variables? Any transformations?

      Author response: We thank reviewer #2 for pointing this out, this information has been added to the methods (lines 272-274)

      Comment 4) Why is there no model with water addition affecting insect richness directly but not percent herbivory directly?

      Author response: While we originally decided to not include this model due to lack of theoretical support and low statistical performance, we have added references to this model (now model II) in the methods and results for consistency and to make model performance clearer to the reader. We have additionally moved supplemental table S1 to the main text to make the models and hypotheses tested by each model more accessible.

      Comment 5) Fig. 2. What are the percentages above the figures? Maybe PD values?

      Author response: These values are now clarified in the figure caption

      Comment 6) L364 "can differ dramatically" This is vague and confusing. Differ in what way? From each other? Did the authors really expect plant richness to have the same effect on herbivory and plant survival? What would it mean anyway for plant richness to have the same effect on herbivory and plant survival?

      Author response: We agree that the language here is confusing and thank reviewer #1 for drawing our attention to it. We have modified the discussion (lines 363-365) to clarify that the direction of effect of intraspecific richness can vary from the direction of effect of interspecific richness, rather than the effects on different response variables varying from each other.

      Comment 7) L 375 "only meaningful differences" This statement feels a little overly strong. It seems like there is a good argument for this, but there could be other things going on.

      Author response: We agree that the language here was unnecessarily strong, and have modified the discussion (lines 398-403) to focus on the lack of difference between methodologies at these two sites, and the observed differences in climate and community structure at each site.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors aimed to investigate how cells respond to dynamic combinations of two stresses compared to dynamic inputs of a single stress. They applied the two stresses - carbon stress and hyperosmotic stress - either in or out of phase, adding and removing glucose and sorbitol.

      Both a strength and a weakness, as well as the main discovery, is that the cells' hyperosmotic response strongly requires glucose. For in-phase stress, cells are exposed to hyperosmotic shock without glucose, limiting their ability to respond with the well-studied HOG pathway; for anti-phase stress, cells do have glucose when hyperosmotically shocked, but experience a hypo-osmotic shock when both glucose and sorbitol are simultaneously removed. Responding with the HOG pathway and so amassing intracellular glycerol amplifies the impact of this hypo-osmotic shock. Counterintuitively then, it is the presence of glucose rather than the stress of its absence that is deleterious for the cells.

      The bulk of the paper supports these conclusions with clean, compelling time-lapse microscopy, including extensive analysis of gene deletions in the HOG network and measurements of both division and death rates. The methodology the authors develop is powerful and widely applicable.

      Some discussion of the value of applying periodic inputs would be helpful. Cells are unlikely to have previously seen such inputs, and periodic stimuli may reveal behaviours that are rarely relevant to selection.

      We thank the referee for his review. To answer the reviewer’s last comment, our main objective was not to study conditions that are ecologically relevant, but rather to perturb the system in an original way to reveal new mechanisms and properties of the system. The main advantage of periodic inputs over more complex or unpredictible types of temporal fluctuations is that they can be defined with few parameters that are easy to interpret and to integrate in biophysical models. For instance, by using periodic inputs we were able to investigate how changing the phasing of two stresses impacted fitness while keeping other parameters constant (the duration of each stress was kept constant). We added two sentences at the beginning of the discussion to highlight the value of using periodic inputs.

      We do not fully agree with the reviewer’s statement that periodic stimuli may reveal behaviours that are rarely relevant to selection. Indeed, many parameters of natural environments are known to vary periodically, such as light, temperature, predation, tides. Even if the periodic stimuli we use are artificial, they can still be a valuable tool to reveal new molecular processes. For instance, null mutants have been invaluable to understand biological systems despite being unlikely to reveal behaviours relevant to selection.

      The authors' findings demonstrate the tight links that can exist between metabolism and the ability to respond to stress. Their study appears to have parted somewhat from their original aim because of the HOG pathway's reliance on glucose. It would be interesting to see if the cells behaviour is simpler in periodically varying sorbitol and a stress where there is little known connection to the HOG network, such as nitrogen stress.

      The use of periodic nitrogen stress is a very interesting suggestion from both reviewers. However, we think it represents a large amount of work that deserves its own study. In particular, it would require first identifying a relevant period at which nitrogen fluctuations have an impact on division rate similar to what we observed for glucose fluctuations before performing experiments in AS and IPS conditions.

      Nitrogen starvation is known to induce filamentous growth via activation of components of the HOG pathway (Cullen and Sprague, 2012), with potential cross-talk between filamentous growth and hyperosmotic stress response. Therefore, periodic osmotic stress and periodic nitrogen starvation may interact in a complex way.

      Reviewer #2 (Public Review):

      The authors have used microfluidic channels to study the response of budding yeast to variable environments. Namely, they tested the ability of the cells to divide when the medium was repeatedly switched between two different conditions at various frequencies. They first characterized the response to changes in glucose availability or in the presence of hyper-osmotic stress via the addition of sorbitol to the medium. Subsequently, the two stresses were combined by applying the alternatively or simultaneously (in-phase). Interestingly, the observed that the in-phase stress pattern allowed more divisions and low levels of cell mortality compared to the alternating stresses where cells were dividing slowly and many cells died. A number mutants in the HOG pathway were tested in these conditions to evaluate their responses. Moreover, the activation of the MAPK Hog1 and the transcriptional induction of the hyper-osmotic stress promoter STL1 were quantified by fluorescence microscopy.

      Overall, the manuscript is well structured and data are presented in a clear way. The time-lapse experiments were analyzed with high precision. The experiments confirm the importance of performing dynamic analysis of signal transduction pathways. While the experiments reveal some unexpected behavior, I find that the biological insights gained on this system remain relatively modest.

      In the discussion section, the authors mention two important behaviors that their data unveil: resource allocation (between glycolysis and HOG-driven adaptation) and regulation of the HOG-pathway based on the presence of glucose. These behaviors had been already observed in other reports (Sharifan et al. 2015 or Shen et al. 2023, for instance). I find that this manuscript does not provide a lot of additional insights into these processes.

      We thank the referee for his review. We agree with the reviewer that the interaction between glucose availability and osmotic stress response has been investigated in previous studies. However, this interaction was investigated using experimental procedures that differed from our approach in critical ways, and therefore the behaviors observed were not the same. In Sharifian et al. (2015), the authors identified a new negative feedback loop regulating Hog1 basal activity and described underlying molecular mechanisms. This feedback loop is unlikely to explain differences of cell fitness we observed in IPS and AS conditions, because 1) differences of division rate was still observed in hog1 mutant cells and 2) differences of death rate involve glycerol synthesis, which is independent of the feedback loop described in Sharifian et al. (2015). In Shen et al. (2023), the authors observed a stronger expression of Hog-responsive genes at lower glucose concentrations, which seems contradictory with our observation of very low pSTL1-GFP expression in absence of glucose. However, they did not use fluctuating conditions and they did not report expression of stress-response genes when glucose was totally depleted (the lower glucose concentration they used was 0.02%) as we did, which may explain the different outcomes. We added three sentences in the discussion to compare our findings to those of Shen et al. (2023).

      One clear evidence that is presented, however, is the link between glycerol accumulation during the sorbitol treatment and the cell death phenotype upon starvation in alternating stress condition. However, no explanations or hypothesis are formulated to explain the mechanism of resource allocation between glycolysis and HOG response that could explain the poor growth in alternating stresses or the lack of adaptation of Hog1 activity in absence of glucose.

      In the revised version of the manuscript, we included a new result section and a supplementary figure (Figure 4 – figure supplement 2) where we tested three hypotheses to explain the lower division rate observed in AS condition relative to IPS condition. We found no evidence supporting these hypotheses, and the mechanisms responsible for the reduced growth in AS condition therefore remains elusive.

      Another key question is to what extent the findings presented here can be extended to other types of perturbations. Would the use of alternative C-source or nitrogen starvation change the observed behaviors in dynamic stresses? If other types of stresses are used, can we expect a similar growth pattern between alternating versus in-phase stresses?

      As mentioned above in our response to the other reviewer, these are very interesting questions that we think go beyond the scope of our study due to the amount of work involved.

      Recommendations for the authors:

      Reviewer #1

      My comments are only minor.<br /> - More paragraphs would improve legibility.

      To improve legibility, we split the longer section of the Results in three paragraphs (page 12, section entitled “Osmoregulation is impaired under in-phase stresses but not under alternating stresses.” However, we kept it as one section with a single title for global coherency: each section of the results corresponds to one main figure and have one main conclusion.

      • I found AS and IPS confusing because what becomes important is whether sorbitol appears with glucose or not. For me, an acronym that makes that co-occurrence clear would be better or even better still no acronyms at all.

      We tried several alternative names for the two conditions in previous drafts of the manuscript. Based on colleagues feedback, AS and IPS acronyms appeared as a good compromise between concision and clarity. To avoid confusion, the two acronyms are precisely defined when they are first used in the Results section. We think it is more important to emphasize the co-occurrence (or not) of the two stresses, rather than the co-occurrence of glucose and sorbitol. Indeed, standard yeast medium contains glucose but no sorbitol, and therefore we defined the two periodic conditions based on differences from standard medium. Even though we avoided using acronyms as much as possible in the manuscript, the use of these two acronyms to refer to the dual fluctuations of the environment seemed essential for concision. Indeed, IPS and AS acronyms are used many times in the results (16 occurrences on page 12 alone), figures and figure legends.

      • I would consider moving some of Fig S2 to the main text: it helps clarify where Fig 2 is coming from and is referenced multiple times.

      We fully agree with the reviewer and we moved panels A-D from Figure S2 to the main Figure 2.

      • On page 10, "constantly facing a single stress that changes over time" is confusing. Perhaps "repetitively facing a single stress" instead?

      We agree this sentence could be wrongly interpreted the way it was written. We changed it to: “cells grow more slowly when facing periodic alternation of the two stresses (AS) than when facing periodic co-occurrence of these stresses (IPS)”.

      • Is there any knowledge on how cells resist hyperosmotic stress in the absence of glucose? That would help explain the IPS results.

      Based on comments from both reviewers, we surveyed the literature to flesh out the discussion of hypotheses that would help explain observed differences between AS and IPS conditions. We found few studies that investigated cell responses in the absence of glucose, and because of significant differences in the experimental approaches it remains difficult to explain our results from conclusions of these previous studies. For instance, Shen et al., 2023 described and modeled the hyperosmotic stress response at various glucose concentrations. They found that Hog1p relocation to the nucleus after hyperosmotic shock lasted longer at lower glucose concentration, which is consistent with our finding in absence of glucose. However, they did not include the absence of glucose in their experiments or periodic fluctuations of glucose concentration. In addition, their model ignores the impact of cell signaling processes involved in growth arrest in response to hyperosmotic stress or glucose depletion. It is therefore difficult to relate their conclusions to our results. We have developed the discussion of our study to include these hypotheses and to clarify what is explained or not in our IPS and AS results.

      There is knowledge on activation of the hyperosmotic stress pathway in response to glucose fluctuations, but not about the response to hyperosmotic stress in absence of glucose.

      • On page 11, Figure 5a should be Figure 4a.

      Correct.

      • I would explain the components of the HOG pathway in the caption of Fig 1 or in the text when you cite Fig 1a. They are described later, but an early overview would be useful.

      To give more context, we added the following sentences to the caption of Figure 1: “Yeast cells maintain osmotic equilibrium by regulating the intracellular concentration of glycerol. Glycerol synthesis is regulated by the activity of the HOG MAP kinase cascade that acts both in the cytoplasm (fast response) and on the transcription of target genes in the nucleus (long-term response). For simplicity, we only represented on the figure genes and proteins involved in this study.”

      • On page 16, I wasn't sure what "redirect metabolic fluxes against glycerol synthesis" meant.

      For more clarity, we modified this sentence to: “Since glucose is a metabolic precursor of glycerol, the absence of glucose may prevent glycerol synthesis and thereby fast osmoregulation."

      • For Fig 2, having a dot-dash and dash-dash lines rather than both dash-dash would be better.

      We made the proposed change, assuming the reviewer was referring to the gray dashed lines and not the colored ones.

      • In the caption of Fig 3, 2% glucose is 20 g/L.

      We thank the reviewer for catching this typo.

      • In the Materials and Methods Summary, adding how you estimated death rates would be helpful: they are not often reported.

      The calculation of death rates was explained in the Methods section. For more clarity, we modified the names of the parameters in the equation to make more explicit which ones refer to cell death.

      Reviewer #2 (Recommendations For The Authors):

      In Figure 2, it would be interesting to show individual growth rates of the perturbations at various frequencies as shown in Figures 3 c and d.

      We thank the reviewer for this suggestion. We added a new supplementary figure (Figure 2 – figure supplement 2) showing the temporal dynamics of division rates at three different frequencies of osmostress and glucose depletion. We did not include high frequencies (periods below 48 minutes) because the temporal resolution of image acquisition in our experiments (1 image every 6 minutes) was too low. Very interestingly, this new analysis suggests that the positive relationship between the frequency of glucose depletion and division rate is explained by a delay between glucose removal and growth arrest rather than a delay between glucose addition and growth recovery. We therefore added the following conclusion:

      “Under periodic fluctuations of 2% glucose, the division rate was lower during half-periods without glucose than during half-periods with glucose (Figure 2 – figure supplement 2d-f), as expected. However, this difference depended on the frequency of glucose fluctuations: the average division rate during half-periods without glucose was higher at high frequency (small period) than at low frequency (large period) of fluctuations (Figure 2 – figure supplement 2d-f). Therefore, the effect of the frequency of glucose availability on the division rate in 2% glucose is likely due to a delay between glucose removal and growth arrest: cell proliferation never stops when the frequency of glucose depletion is too fast.”

      According to Sharifan et al. 2015, I would have expected that Hog1 would not relocate in the nucleus in 0% glucose. I wonder if this is due to the use of sorbitol as a stressor or the presence of low levels of glucose in the medium. I would suggest performing some control experiments with NaCl as hyperosmotic agent and test the addition of 2-deoxy-glucose to completely block glycolysis.

      After careful reading of Sharifian et al. 2015, we fail to understand why the reviewer think Hog1 would be expected to not relocate to the nucleus after hyperosmotic stress in 0% glucose. In this previous study, the authors never combined glucose depletion with a strong hyperosmotic stress as we did in our study. They report the results of independent experiments where cells were exposed either to a single pulse of hyperosmotic stress (0.4 M NaCl) or to transient glucose starvation, but they did not combine these two stimuli. In this context, it is difficult to compare their results with ours. The fact that Sharifian et al. 2015 did not observe Hog1 nuclear relocation in 0% glucose (consistent with our result in Figure 6 – figure supplement 1a, yellow curve) is not inconsistent with our observation of Hog1 nuclear enrichment in 0% glucose + 1M sorbitol. One potential discrepancy between the two studies is the fact that they observed a small transient peak of Hog1 nuclear localization just after glucose is added back to the medium, while we failed to observe this peak in similar conditions (yellow curve in Figure 6 – figure supplement 1a). However, this could be simply explained by the temporal resolution of our experimental system: we image cells once every 6 minutes and the peak lasts less than 2 minutes in Sharifian et al. 2015. We added a sentence to discuss this minor point in the Results: “Although previous studies observed small transient (less than two minutes) peaks of Hog1-GFP nuclear localization after glucose was added back to the medium following glucose depletion (Sharifian et al., 2015, Piao et al., 2013), the temporal resolution in our experiments (one image every 6 minutes) may have been too low to detect these peaks.”.

      While we agree many additional experiments would be interesting, such as testing the effects of different stress factors or the non-metabolizable glucose analog 2-deoxy-D-glucose, we think this is beyond the scope of this study because such experiments are likely to open broad perspectives and to not be conclusive in a reasonable amount of time.

      When discussing Figure 7, the authors write that the HOG pathway is "overactivated" or "hyperactivated". I would refrain from using these terms because as seen in Figure 6, the Hog1 activity pattern, if anything, decreases as the number of alternative pulses increases. The high level of pSTL1mCitrine measured is mostly due to the long half-life of the fluorescent protein.

      We used the formulation “hyper-activation” of the HOG pathway because Mitchell et al. 2015 used it to refer to the same phenomenon in their seminal study. This "hyper-activation" refers to the fact that both the integral activation of Hog1p (sum of areas under Hog1 nuclear peaks) and the global activation of transcriptional targets is much higher during fast periodic hyperosmotic stress than during constant hyperosmotic stress. That being said, we understand the point made by the reviewer about the decreasing size of Hog1 peaks over time during repeated pulses of osmotic stress. Therefore, we slightly modified the text to refer to hyper-activation of pSTL1-mCitrine transcription or expression instead of hyper-activation of the HOG pathway. For coherency, we replaced all instances of “overactivation” by “hyper-activation”.

      Last but not least, the high level of pSTL1-mCitrine is both due to the long half-life of the protein and to the fact that pSTL1 transcription is never turned off due to high Hog1p activity under fast periodic osmostress.

      Minor comments:

      In the main text, I think it might be more intuitive to refer to doubling time in hours instead of division rates in 1/min which are harder to interpret.

      In an early draft of the manuscript, we made figures with either division rates or with doubling times (ln(2)/division rate) and we received mixed opinions from colleagues on what measure was more intuitive to interpret. Both measures are widely used in the literature, and we decided to use division rates in the final version of the figures because it was more directly related to population growth rate and to fitness. For instance, the population growth rate shown in Figure 5 is simply calculated by subtracting the death rate from the division rate. For coherency, we therefore reported division rates instead of doubling times in figures and results. However, to address the reviewer’s comment we included the doubling times (in addition to the division rates) when mentioning the most important results. For instance, page 12: “Strikingly, cells divided about twice as fast under IPS condition (1.67 x 10-3 division/min, corresponding to an average doubling time of 415 minutes) than under AS condition (9.4 x 10-4 division/min, corresponding to an average doubling time of 737 minutes)”.

      I found various capitalized version of "HOG /Hog pathway"

      We corrected this incoherency and used “HOG pathway” everywhere.

      Page 11. Figure 5a should refer to Figure 4a I believe.

      Correct.

      The methods are generally very thorough and precise. The explanation about the calculation of the division rate seems incomplete. For completeness, it would be good to mention the brand and model of valves used. In addition, it would be interesting to have an idea of the number of cells and microcolonies tracked in the various growth experiments.

      We are not sure why the reviewer found the explanation of the calculation of division rate incomplete. For more clarity, we modified the names of parameters in the equations to make them more explicit. We also added a reference to Supplementary File 1 that contains all R scripts used to calculate division rates and death rates. We included the brand and model of valves used, as requested. As for the number of cells tracked in the various experiments, we mentioned in the Methods: “we selected 25 positions (25 fields of view) of the motorized stage (Prior Scientific ProScan III) that captured 10 to 50 cells in each of the 25 growth chambers of the chip and were focused slightly below the median cell plane based on cell wall contrast.” To address the reviewer’s comment, we also included the range of number of tracked cells for each experiment in corresponding figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, we would like to thank you and all the reviewers for acknowledging the meaningful contribution of our manuscript to the field. Your useful comments helped us improve the manuscript's quality. We understood the key issues of the manuscript were the quantification of inference accuracy and applicability to methylome data. We here therefore present a revised version of the manuscript addressing all major comments.

      For each demographic inference we have added the root mean square error as demanded by the reviewers. These results confirm the previous interpretation of the graphs especially in recent times. We also added TMRCA inference analysis as requested by one reviewer as a proof of principle that integrating multiple markers can improve ARG inference.

      The discussion was rewritten to further discuss the challenges of application to empirical methylation data. We clarify that in the case epimutations are well understood and modelled, they can be integrated into a SMC framework to improve the approaches accuracy. When epimutations are not well understood, our approach can help understand the epimutations process through generations at the evolutionary time scale along the genome. Hence, in both cases our approach can be used to unveil marker evolution processes through generations, and/or deepen our understanding of the population past history. We hope our discussion underlies better how our approach is designed and can be used.

      eLife assessment

      This important study advances existing approaches for demographic inference by incorporating rapidly mutating markers such as switches in methylation state. The authors provide a solid comparison of their approach to existing methods, although the work would benefit from some additional consideration of the challenges in the empirical use of methylation data. The work will be of broad interest to population geneticists, both in terms of the novel approach and the statistical inference proposed.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyse multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Answer: We thank the reviewer 1 for his positive comments and acknowledging the future promises of our method as better and more reliable data will be available in different species. We appreciate the reviewer noticing the complete set of work undertaken here to integrate local and regional effects of methylation into a model containing as much knowledge of the epigenetics mutational processes as possible. Note that in Figure 2 of the manuscript we observed a gain of accuracy even when the rates are unknown. Our results thus suggests that the accuracy gain of additional marker with unknown rates is also possible, although it is most likely be scenario and rate dependent.

      At last, as noticed and highlighted by the very recent work of the Johannes lab (Yao et al. Science 2023) using phylogenetic methods, knowing the epimutation rate is essential at short time scale to avoid confounding effects of homoplasy. In our estimation of the coalescent trees, the same applies, though our model considers finite site markers. We now provide additional evidence for the potential gain of power to infer the TMRCA (Supplementary Table S7) when knowing or not the epimutation rates and revised the discussion to clarify the potential shortcomings/caveats for the analysis of real data.

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      Answer: We thank the reviewer 2 for his positive comments. Indeed, surveys of CG methylation in other plant species show that its distribution is clearly bimodal (i.e. binary). This is not the case for non-CG methylation, such as CHG and CHH (where H=C,T,A). However, these later types of methylation contexts are also not heritable across generations and can therefore not be used as heritable molecular markers.

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems sound and reasonable. This is an exciting new avenue for thinking about inference from genomic data. I have a few concerns about the presentation and then also questions about the use of empirical methylation data sets.

      I think a more detailed description of demographic accuracy is warranted. For example, in L245 MSMC2 identifies the bottleneck (albeit smoothed) and only slightly overestimates recent size. In the same analysis the authors' approach with unknown mu infers a nonexistent population increase by an order of magnitude that is not mentioned.

      Answer: We thank the reviewer 3 for his positive comments and refer to our answer to reviewer 1 above. We added RMSE (Root Mean Square Error) analyses to quantify the inference accuracy. We apologize for not mentioning this last point. Thank you for pointing this out and we have now fixed it (line 245-253).

      Similarly, it seems problematic that (L556) the approach requiring estimation of site and region parameters (as would presumably be needed in most empirical systems like endangered nonmodel species mentioned in the introduction) does no better than using only SNPs. Overall, I think a more objective and perhaps quantitative comparison of approaches is warranted.

      Answer : See answer to reviewer 1 above, and more elaborate answers below. We provide now new RMSE analyses to quantify the accuracy of our demographic inference (Supplementary Tables 1,6,7,8,9,10). We also discuss the validity and usefulness of our approach when the epimutation rates are unknown. In short, the discussion was rewritten to further discuss the challenges of application to empirical methylation data. We clarify that in the case epimutations are well known and modelled (as much is known in A. thaliana for example), they can be integrated into a SMC framework to improve the accuracy of the method approach. When epimutations are not well understood and rates unknown, our approach can help understand the epimutational process through generations at the evolutionary time scale. Hence, whether makers are understood or not, our approach can be used to study the marker evolutionary processes through generations and/or to deepen our understanding of the population past history. We hope our discussion underlies better how our approach is designed and can be used.

      The authors simulate methylated markers at 2% (and in some places up to 20%). In many plant genomes a large proportion of cytosines are methylated (e.g. 70% in maize: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496265/). I don't know what % of these may be polymorphic, but this leads to an order of magnitude more methylated cytosines than there are SNPs. Couldn't this mean that any appreciable error in estimating methylation threatens to be of a similar order of magnitude to the SNP data? I would welcome the authors' thoughts here.

      Answer : The reviewer is correct and this is an interesting question. First, studies show that heritable epimutations in plants are restricted to CG dinucleotides that are located well outside of the target regions of de novo methylation pathways in plants. Most of these CGs tend of fall within so-called gene body methylated regions. While it is true that plant species can differ substantially in their proportion of methylation at the genome-wide scale, the number of gene body methylated genes (i.e. genic CG methylation) is relatively similar, and at least well within the same order of magnitude (Takuno et al. Nature Plants 2016, review in Muyle et al. Genome Biol Evol 2022). Moreover, spontaneous CG epimutations in gene body methylated regions has been shown to be neutral (van Der Graaf et al. 2015, Vidali et al. 2016, Yao et al. 2023), which is an ideal property for phylogentic and demographic inference.

      Second, CG methylation calls are sometimes affected by coverage or uncertainty. Stringent filtering for reliable SMP calls typically reduces the total proportion of CG sites that can be used as input for demographic inference. Here we only kept CG sites where the methylation information could be fully trusted after SMP calling (i.e. >99.9% posteriori certainty). Overall, this explains why the percentage of sites with methylation information is so small, and why we have decided to work on simulation with 2% of reliable methylated markers.

      Nevertheless, for the sake of generality, it may be that in some species such as maize a higher percentage of polymorphic methylated sites can be used, and the number of SMPs could be higher than that of SNPs when the effective population size is very small (due to past demographic history and/or life history traits). In this case, any error in the epimutation rate and variance due to the finite site model estimation (and homoplasy) are not corrected by the lack of SNPs and can lead to mis-inference.

      A few points of discussion about the biology of methylation might be worth including. For example, methylation can differ among cell types or cells within a tissue, yet sequencing approaches evaluate a pool of cells. This results in a reasonable fraction of sites having methylation rates not clearly 0 or 1. How does this variation affect the method? Similarly, while the authors cite literature about the stable inheritance of methylation, a sentence or so more about the time scale over which this occurs would be helpful.

      Answer: We thank reviewer 3 for asking those very interesting questions, which we further developed below and mention in the discussion (lines 716-722).

      For Arabidopsis thaliana:

      Following up on our previous comment above, the majority of the CG sites that serve as input to our approach are located in body methylated genes. Previous work has shown that CG methylation in these regions shows essentially no tissue and cellular heterogeneity (e.g. Horvath et al. 2019). This means that bulk methylation measurements only show limited susceptibility to measurement error. That said, to guard against any spurious SMPs call that could arise from residual measurement variation, we applied stringent filtering of CG methylation. We have kept sites where the methylation percentage is close to either 0% or 100% (the rest being removed from the analysis). We have used similar filtering strategies in previous studies of epimutational processes in mutation accumulation lines and long-lived perennials (work of the Johannes lab). In these later studies we found that the SMP calls sufficiently accurate for inferences of phylogenetic parameters in experimental settings (Sharyhary et al. Genome Biology 2021, Yao et al. Science, 2023).

      For other species:

      It is true that currently, evaluating the methylation state of a site from a pool of cells may be problematic for some species for two main reasons: 1) it will add noise to the signal and SMP calling could be erroneous, and 2) the methylation state used in analysis might originate from different tissues at different location of the genome/methylome. Overall, this will lead to spurious SMPs and can render the inference inaccurate (see Sellinger et al 2021 for the effect of spurious SNPs). Hence, caution is advised when calling SMPs in other species and for different tissues.

      Finally, in some species methylated cytosines have mutation rates an order of magnitude higher than other nucleotides. The authors mention they assume independence, but how would violation of this assumption affect their inference?

      Answer: Indeed, we assume the mutation and epimutation process to be independent thus the probability for a SNP to occur does not depend on the local methylation state. If this was the case, the mutation rate use would indeed be wrong to a degree function of the dependency between the processes. We suggest that by ignoring this dependence, we are in the same situation as ignoring the variation of mutation rate along the genome. We have previously documented the effect of ignoring this biological feature of genomes in Strüt et al 2023 and Sellinger et al 2021. The variation in mutation rate along the genome if too extreme and not accounted for can lead to erroneous inference results. However, this problem could be easily solved (modelled) by adapting the emission matrix. To correctly model this dependency, additional knowledge is needed: either the mutation and epimutation rates must be known to quantify the dependency, or the dependency must be known to quantify the resulting rates. As far as we know, these data are at the moment not available, but could maybe be obtained using the MA lines of A. thaliana (used in Yao et al. 2023).

      Recommendations for the authors:

      All three reviewers liked this approach and found it a valuable contribution. I think it is important to address reviewer 1/3 concerns about quantifying the accuracy of inference (the TMRCA approach from reviewer 1 sounds pretty reasonable), and reviewer 1 also highlights an intriguing point about model accuracy being worse when the mutation rate is known. Additionally, I think some discussion is warranted about challenges dealing with empirical methylation data (points from Rev 2 and 3 as well as Rev 1's question about inferred vs published rates of epigenetic mutation).

      Answer : We have added tables containing the root mean square error (RMSE) of every demographic inference in the manuscript to better quantify accuracy. We have below given the explanation on why accuracy in presence of site and region epimutations can in some cases decrease when real rates are known (because methylation state at the region level needs to be first inferred). We added evidence that accounting for methylation can improve the accuracy when recovering the TMRCA along the genome when the rates are known. We also have enhanced the discussion on the challenges of dealing with epimutations data for inference. As is suggested, we hope this study will generate an interest in tackling these challenges by applying the methods to various methylome datasets from different species.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • For all of the simulated demographic inference results, only plots are presented. This allowsfor qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      Answer : We understand the concern of reviewer #1 for a more quantitative approach to compare the inference results. We agree that plots are not sufficient to fully grasp a method performance. To provide better supports to quantity approaches performance, we added Sup tables 1,6,8,9 and 10 containing the RMSE (in log10 for visibility) for all Figures. The root mean-squared error is calculated as in Sellinger 2021 and a description of how the root mean-squared error is calculated and now found in the method section lines 886-893.

      • 434: The discussion downplays the really odd result that inputting the true value of themutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour.

      Answer : There are unfortunately no errors in this plot and those results are perfectly normal and coherent, but we understand they can be confusing at first.

      As described in the method section and in the appendix, when accounting for regionlevel epimutations, our algorithm requires the regional methylation status which needs to be inferred as a first step from the data (real or simulated). Because region and single site epimutation events are occurring at similar rates in our simulated scenario, the methylation state of the region is very hard to correctly recover (e.g. there will be unmethylated site in methylated regions and methylated sites in unmethylated regions). In other words, the accuracy of the region estimation HMM procedure is decreased by the joint action of site and region epimutation processes.

      When subsequently applying the HMM for inference, as described in the appendix, the probabilities of two CG site being in the same or different methylation state depends on the methlylation state of the "region". Hence the mislabelling of the region methylation state is (to some extent) equivalent to spurious SMPs (or inaccurate SMP calling).

      If the true rates for site and region epimutations are given as input, the model forces the demography (and other inferred parameters) to fit the observed distribution of SMPs (given the inputted rates), resulting in the poor accuracy observed in the Figure (Now Supplementary Figure 7).

      Note: The estimated rates from real data in A. thaliana suffer from the same issue as the region and site epimutation rates are independently estimated, and the existence of regions first quantified using an independent HMM method (Denkena et al. 2022).

      However, when rates are freely inferred, they are inferred accordingly to the estimated methylation status of regions and SNPs. Therefore, even if the inferred rates are wrong, they are used by the SMC in a more consistent way.

      Note: When methylation rates violate the infinite site assumption, such as here, we first estimate the tree sequence along the genome using SNPs (i.e. DNA mutations). The algorithm then infers the epimutations rates given the inferred coalescent times and the observed methylation diversity.

      To summarise: when inputting rates to the model, if the model fails to correctly recover the region methylation status there will be conflicting information between SNPs and SMPs leading to accuracy loss. However if the rates are inferred this is realized with the help of SNPs, leading to less conflicting information and potentially smaller loss of accuracy. We apologize that the explanations were missing from the manuscript and have added them lines 449-460 and 702-716.

      A further argument is that if region and site epimutations occur at rates of at least two orders of magnitude difference, the inference results are better (and accurate) when the true rates are given. The reason is that one epimutational process overrides the other (see Supplementary Table 2). In that case one epimutation process is almost negligible and we fall back to results from Figure 5 or Supplementary Figure 6.

      • As noted at 580, all of the added power from integrating SMPs/DMRs should come fromimproved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases.

      Answer : We fully agree with reviewer 1. We have added a comparison in TMRCA inference as proof of principle between using or not using methylation sites. The results are written in Supplementary Table 7 and methodology is inspired by Schiffels 2014 and described at the end of the method section (line 894-907). Those results demonstrate the potential gain in accuracy when using methylation polymorphic. However, TMRCA (or ARG) inference is a very vast and complex subject in its own right. Therefore, we are developing a complete TMRCA/ARG inference investigation and an improve methodology than the one presented in this manuscript. To do so we are currently working on a manuscript focusing on this topic specifically. We hence consider further investigations of TMRCA/ARG inference beyond the scope of this current study.

      • A general remark on the derivations in Section 2 of the supplement: I checked theseformulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      Answer: We thank the reviewer for this very interesting suggestion! Unfortunately, it is a bit late to re-implement the algorithm and reshape the manuscript according to this suggestion. Speed is not yet an issue but will most likely become one in the future when integrating many different rates or when using a more complex SMC model. Hence, we added reviewer #1 suggestions to the discussion (line 648) and hope to be using it in our future projects.

      • Most (all?) of the SNP-only SMC methods allow for binning together consecutiveobservations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      Answer: This is a very good question. We do the binning exactly as described in Mailund 2013 & Terhorst 2017, and added this information in the method section (lines 801-809). However, as described in Terhorst 2017, one can only bin observation of the same "type" (to compute the Baum-Welch algorithm). Therefore, the computation time gain by binning is reduced when different markers spread along the genome in high proportion. This is the approach we used throughout the study when facing multiple markers as it had the best speed performance. As for example, when the proportion of site with methylated information is 1% or less, computation time is only slightly affected (i.e. same order of magnitude).

      However, the binning method presented in Mailund 2013 can be extended to observation of different types, but parameters need to be estimated through a full likelihood approach (as presented in Figure 2). In our study this approach did not have the best speed performance. However, as our study is the first of its kind, it remains sub-optimal for now. Hence, we did not further investigate the performance of our approach in presence of many multiple different genomic marker (e.g. 5 different markers each representing ~20% of the genome each). Currently, with SMC approaches a high proportion of sites contain the information "No SNPs", making the Baum welch algorithm described in Terhorst 2017 very efficient. But when further developing our theoretical approach, we expect that most of the sites in a genome analysis will contain some "information", which could render the full likelihood approach computationally more tractable.

      • 486: The assumed site and region (de)methylation rates listed here are several OOMdifferent from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533?

      Answer: We thank the reviewer for asking this question. We believe answering this question is indeed the most interesting aspect of our study. Beyond demographic inference, our study has indeed unveiled a discrepancy between rates inferred through biological experiment and our study through the use of SNPs and branch length. There are several reasons which could explained the discrepancy between both approaches:

      • Firstly, our underlying HMM hypotheses are certainly violated. We ignoredpopulation structure, variation of mutations and recombination rate along the genome as well as the effect of selection. Hence, the branch lengths used for methylation rate estimations are to some extent inaccurate. We note that this is especially likely for the short branches of coalescent tree originating from background selection events in the coding regions and which are especially observable when using the methylation sites with a higher mutation rate than SNPs (Yao et al. 2023) at body methylated genes.

      • Secondly, calling single methylation site polymorphism is not 100 % reliable. If theerror rate is 0.1%, as the study was conducted on ~10 generations a minimum epimutation rate of 10-4 is to be expected. However, because our approach works at the evolutionary time scale, we expect that it suffers less from this bias as the proportion of diversity originating from actual epimutations, and not SMP calling error, should be greater.

      • Thirdly, as mentioned above, recovering the methylation status of a region is veryhard. Hence false region status inference could affect our inference accuracy as shown in Supplementary Figure 4.

      • Lastly and most importantly, the reason behind this discrepancy is the modelling ofepimutation and methylation between sites and regions. As we discuss, the current combination of rates and models is still limited to describe the observed diversity along the genome (as we intend in SMC methods). This is in contrast to the recent study by Yao et al. where very few regions of polymorphic SMPs are chosen, which implicitly avoids the influence of the methylation region effect. A study just published by Biffra et al. (Cell reports 2023) also uses a functional model of methylation modelling using a mix of region and site epimutation, albeit not tuned for evolutionary analyses. Thus we suggest, in line with functional studies, that epimutations are not independent from the local methylation context and may tend to stabilize the methylation state of a region. Therefore, the estimated methylation rates show a discrepancy to the previously measured ones. Indeed, the biological experiment would reveal a fast epimutation rate because epimutations can actually be tracked at sites which can mutate, while region mutation rate is much slower. However, because the methylation state of a region is rather stable through time it would reduce the methylation diversity over long time scale, and these rates would differ between methylated or unmethylated regions (i.e. the methylation rate is higher in methylated regions). Our results are thus in agreement with the observation by Biffra et al. that region methylation modelling is needed to explain patterns of methylation across the genome.

      To solve the discrepancy, one would need to develop a theoretical region + site epimutation model capable of describing the observed diversity at the evolutionary time scale (possibly based on the Biffra et al. model within an underlying population evolution model), and then use this model to reanalyse the sequence data from the biological experiment (i.e. in de Graaf et al. 2015 & Denkena et al. 2022) to re-estimate the methylation region sizes and epimutation rates.

      Minor comments:

      • 189: "SMCtheo" first occurs here, but it's not mentioned until 247 that this is the newmethod being presented.

      Answer : Fixed

      • 199: Are the estimates in this section from a single diploid sequence? Or is it n=5 (diploid) as mentioned in the earlier section?

      Answer : Yes, those results were obtained with 5 diploid individuals. We added it in the Table 1 description.

      • 336: I'm confused by the wording: it sounds like the test rejects the null if there is positivecorrelation in the methylation status across sites. But then, shouldn't 339 read "if the test is significant" (not non-significant)?

      Answer : We apologize for the confusion and rewrote the sentence line 339-348, the choice of word was indeed misleading .

      • Fig. 6: for some reason fewer simulations were run for 10Mb (panels C nad D) than for100Mb (A and B). Since it's very difficult to tell what's happening on average in the 10Mb case, I suggest running the same number of simulations.

      Answer : Yes we understand your concern. Actually, the same number of simulations were run but we plotted only the first 3 runs as it was less visually confusing. We now have added the missing lines to the plot C and D.

      Typos:

      • 104: "or or"

      • 292: build => built

      • 388: fulfil

      • 683: sample => samples

      Answer : Many thanks to reviewer 1 for pointing out the typos. They are all now fixed.

      Reviewer #2 (Recommendations For The Authors):

      The authors may find some valuable information in Pisupati et al (2023) "On the causes of gene-body methylation variation in Arabidopsis thaliana" on interpreting epimutation rates.

      Answer: Many thanks for the recommended manuscript. We add it to the cited literature as it strongly supports our use of heritability or methylation. We also added the recent Biffra et al. paper.

      Reviewer #3 (Recommendations For The Authors):

      There are many places throughout the manuscript with minor grammatical errors. Please review these. A few noted below as I read:

      L104: extra "or"

      L123: built not build

      L 160 "relies" instead of "do rely"

      L161 "events"

      L 336 "from methylation data"

      L 378 "exists"

      L 379 "regions are on average shorter" instead of "there are shorter"

      L 338 "a regional-level"

      L 349 "," instead of "but"

      L 394 DMRs

      Table 1 legend: parentheses not brackets?

      Answer : Many thanks to reviewer #3 for finding those mistakes. They are all now fixed.

      I think a paragraph in the discussion of considerations of when to use this approach might be helpful to readers. Comparison to e.g. increased sample size in MSMC2, while not necessary, might be helpful here. It may often be the case that doubling the number of haplotypes with SNP data may be easier and cheaper estimating methylation accurately.

      Answer : We discuss (lines 691-698) that our approach is always useful by design, but cannot always be used for the same purpose. If the evolutionary properties of the used marker used are not understood, we suggest that our approach can be used to investigate the marker heritability process through generations. This could help to correctly design experiments aiming to study the marker heritability through lineages. And if the properties of the marker are well understood and modelled, it can be integrated into the SMC framework to improve inference accuracy.

      Other minor notes:

      L 486 "known" is a stretch. empirically estimated seems appropriate.

      Answer : Fixed

      L 573 ARG? You are not estimating the full ARG here.

      Answer : We apologize for the wrong choice of word and have rephrased the sentence.

      Fig. 2 is not super useful and could be supplemental.

      Answer : We moved Figure 2 to the appendix (now sup fig 1)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses: The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      We agree on the reviewers observation about the evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from southern latitudes. We include a paragraph in the Introduction section regarding this. Unfortunately, studies conducted in South America to understand host use by Culex mosquitoes are very limited, and there are virtually no studies on the seasonal feeding pattern. In Argentina, there is some evidence (Stein et al., 2013, Beranek, 2019) regarding the seasonal change in host use by Culex species, including Cx. quinquefasciatus, where the inclusion of mammals during the autumn has been observed. As part of a comprehensive study on characterising bridge vectors for SLE and WN viruses, our research group is currently working on the molecular identification of blood meals from engorged females to gain deeper insights into the seasonal feeding pattern of Culex mosquitoes. While the seasonal change in host use by Culex quinquefasciatus has not been reported in Argentina so far, there has been an observed increase in reported cases of SLE virus in humans between summer and fall (Spinsanti et al., 2008). It is based on this evidence that we hypothesise there is a seasonal change in host use by Cx. quinquefasciatus, similar to what occurs in the United States. This is also considering that both countries (Argentina and the United States) have regions with similar climatic conditions (temperate climates with thermal and hydrological seasonality). Since we work on the same species and in a similar temperate climate regimen, we assumed there is a seasonal shift in the host use by this mosquito species.

      Reviewer #1 (Recommendations for the authors):

      Abstract

      Line 23: fed on two different hosts.

      Accepted as suggested.

      I think the concluding statement should be rewritten to say that immediate reproductive outcomes do not explain the shift in host use pattern of Cx. quinquefasciatus mosquitoes from birds to mammals towards autumn.

      Accepted as suggested.

      Introduction

      No comments.

      Materials and Methods

      Please mention sample sizes in the text as well (n = ?) for each treatment.

      Accepted as suggested.

      Page 99: ......C. quinquefasciatus, since C. pipiens and its hybrids are present as well in Cordoba.

      Accepted as suggested.

      Results – Line 146: subsequently instead of posteriorly

      Accepted all changes as suggested.

      Line 148: were counted instead of was counted.

      Accepted all changes as suggested.

      Line 160: Subsequently instead of posteriorly

      Accepted all changes as suggested.

      Line 171: on fertility

      Accepted all changes as suggested.

      Line 174: there was an interaction effect on…

      Accepted all changes as suggested.

      Line 175: there were no differences in the number of eggs

      Accepted all changes as suggested.

      Discussion

      I think the first paragraph in the discussion section is redundant and should be deleted.

      The whole discussion was rewritten to be focused on our aims and results.

      Line 282: this sentence needs to be rewritten.

      Accepted as suggested.

      Line 299: at 28{degree sign}C

      Line 300: at 30{degree sign}C

      Sorry, but we are not sure about your comment here. We checked. Temperatures are written as stated, 28°C and 30°C.

      Line 363: I think the authors need to discuss more about the bigger question they were addressing. I think that the discussion section can be strengthened greatly by elaborating on whether there is evidence for a seasonal shift in host use pattern in Cx. quinquefasciatus in the southern latitudes. If yes, what alternate mechanisms they believe could be driving the seasonal change in host use in this species in the southern latitudes now that they show the 'deriving reproductive advantages' hypothesis to be not true for those populations.

      Thanks for this observation. We agree and so the Discussion section was restructured to align it with our results, as suggested.

      Reviewer #2 (Public Review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness in birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear mixed model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity and fertility and a null model analysis via data randomization for hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite of that hypothesized. While this finding is interesting and worth reporting, there are significant issues with the experimental design and the conclusions that are drawn from the results, which are described below. These issues should be addressed to make the findings trustworthy.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) There is no replication built into this study. Egg lay is a highly variable trait, even within treatments, so it is important to see replication of the effects of treatment across multiple discrete replicates. It is standard practice to replicate mosquito fitness experiments for this reason. Furthermore, the sample size was particularly small for some groups (e.g. 15 egg rafts for the second gonotrophic cycle of mice in the autumn, which was the only group for which a decrease in fecundity and fertility was detected between 1st and 2nd gonotrophic cycles). Replicates also allow investigators to change around other variables that might impact the results for unknown reasons; for example, the incubators used for fall/summer conditions can be swapped, ensuring that the observed effects are not artefacts of other differences between treatments. While most groups had robust sample sizes, I do not trust the replicability of the results without experimental replication within the study.

      We agree egg lay is a variable trait and so we consider high numbers of mosquitoes and egg lay during experiments compared to our studies of the same topics. Evaluating variables such as fecundity, fertility, or other types of variables (collectively referred to as "life tables") is a challenging issue that depends on several intrinsic and extrinsic factors. Because all of this, in some experiments, sample sizes might not be very large, and in several articles, lower sample sizes could be found. For instance, in Richards et al. (2012), for Culex quinquefasciatus, during the second gonotrophic cycle, some experiments had 13 or even 6 egg rafts. For species like Aedes aegypti, the sample size for life table analysis is also usually small. As an example, Muttis et al. (2018) reported between 1 and 4 engorged females (without replicates). In addition, small sample size would be a problem if we would not have obtained any effect, which is not the case due to the fact that we were interested in finding an effect, regardless of the effect size. Because of this, we do find our sample sizes quite robust for our results.

      Regarding the need to repeat the experiments in order to give more robustness to the study we also agree. However, after a review of the literature (articles cited in the original manuscript), it is apparent that similar experiments are not frequently repeated as such. Examples of this are the studies of Richards et al. (2012), Demirci et al. (2014) or Telang & Skinner (2019), which even they manipulate several cages at a time as “replicates”, they are not true replicates because they summarise and manipulate all data together, and do not repeat the experiment several times. We see these “replicates” as a way of getting a greater N.

      As was stated by the reviewer, repetition is a resource and time-consuming activity that we are not able to do. Replicating the experiment poses a significant time and resources challenge. The original experiment took over three months to complete, and it is anticipated that a similar timeframe would be necessary for each replication (6 months in total considering two more replicates). Given our existing commitments and obligations, dedicating such an extensive period solely to this would impede progress on other crucial projects and responsibilities.

      Given the limitations of resources and time and the infrequent use of experimental replication in this type of studies, we performed a simulation-based analysis via a Monte Carlo approach. This approach involved generating synthetic data that mimics the expected characteristics of the original experiment and subsequently subjecting it to the same analysis routine. The main goal of this simulation was to evaluate the potential spuriousness and randomness of the results that might arise due to the experimental conditions. So, evaluating the robustness and confidence of our results and data.

      (2) Considering the hypothesis is driven by the host switching observed in the field, this phenomenon is discussed very little. I do not believe Cx. quinquefasciatus host switching has been observed in Argentina, only in the northern hemisphere, so it is possible that the species could have an entirely different ecology in Argentina. It would have been helpful to conduct a blood meal analysis prior to this experiment to determine whether using an Argentinian population was appropriate to assess this question. If the Argentinian populations don't experience host switching, then an Argentinian colony would not be the appropriate colony to use to assess this question. Given that this experiment has already been conducted with this population, this possibility should at least be acknowledged in the discussion. Or if a study showing host switching in Argentina has been conducted, it would be helpful to highlight this in the introduction and discussion.

      Thanks for this observation. We agree. However, we conducted the experiment beside host use data from Argentina since we used the mosquito species, and the centre region of Argentina (Córdoba) has a similar temperate weather regimen that those observed in the east coast of US.

      We are aware that few studies regarding host shifting in South America are available, some such that those conducted by Stein et al. (2013) and Beranek (2019) reported a moderate host switch for Culex quinquefasciatus in Argentina. We have already performed a study about seasonal host feeding patterns for this species. However, even though there are few studies regarding host shifting, our hypothesis is based mainly in the seasonality of human cases of WNV and SLEV, a pattern that has been demonstrated for our region, see for example the study of Spinsanti et al. (2008).

      We include a new paragraph in the Introduction and Discussion sections. Please see answers Reviewer #1.

      (3) The impacts of certain experimental design decisions are not acknowledged in the manuscript and warrant discussion. For example, the larvae were reared under the same conditions to ensure adults of similar sizes and development timing, but this also prevents mechanisms of action that could occur as a result of seasonality experienced by mothers, eggs, and larvae.

      We understand the confusion that may have arisen due to a lack of further details in the methodology. If we are not mistaken, you are referring to our oversight regarding the consideration of carry-over effects of larvae rearing that could potentially impact reproductive traits. When investigating the effects of temperature or other environmental factors on reproductive traits, it is possible to acclimate either larvae or adults. This is due to the significant phenotypic plasticity that mosquitoes exhibit throughout their entire ontogenetic cycle. In our study, we followed an approach similar to that of other authors where the adults are exposed to experimental conditions (temperature and photoperiod). For a similar approach you can refer to the studies conducted by Ferguson et al. (2018) for Cx. pipiens, Garcia Garcia & Londoño Benavides (2007) for Cx. quinquefasciatus or Christiansen-Jucht et al. (2014, 2015) for Anopheles gambiae.

      (4) There are aspects of the data analysis that are not fully explained and should be further clarified. For example, there is no explanation of how the levels of categorical variables were compared.

      The methodology and statistical analysis were expanded for a better understanding.

      (5) The results show the opposite trend as was predicted by the authors based on observed feeding switches from birds to mammals in the autumn. However, they only state this once at the end of the discussion and never address why they might have observed the opposite trend as was hypothesized.

      The discussion was restructured to focus on our results and our model.

      (6) Generally speaking, the discussion has information that isn't directly related to the results and/or is too detailed in certain parts. Meanwhile, it doesn't dig into the meaning of the results or the ways in which the experimental design could have influenced results.

      As mentioned above, the discussion was restructured to reflect our findings. We also included the effect that our design might have influenced our results. However, as stated above we do not fully agree that the design is inadequate for our analysis, we performed standard protocols followed by other researchers and studies in this research field.

      (7) Beyond the issue of lack of replication limiting trust in the conclusions in general, there is one conclusion reached at the end of the discussion that would not be supported, even if additional replicates are conducted. The results do not show that physiological changes in mosquitoes trigger the selection of new hosts. Host selection is never measured, so this claim cannot be made. The results don't even suggest that fitness might trigger selection because the results show that physiological changes are in the opposite direction as what would be hypothesized to produce observed host switches. Similarly, the last sentence of the abstract is not supported by the results.

      We agree with this observation. However, we did not evaluate the impact of fitness on host selection in this study. Instead, we aimed to investigate the potential influence of seasonality on mosquito fitness as a potential trigger for a shift in host selection. We agree that we have incorrectly used the term “host selection” when we should actually be discussing “host use change”. Our results indicate a seasonal alteration in mosquito fitness in response to temperature and photoperiod changes. Building upon this observation, we re-discussed our hypothesis and theoretical model to explain this seasonal shift in host use.

      (8) Throughout the manuscript, there are grammatical errors that make it difficult to understand certain sentences, especially for the results.

      All English grammar and writing of the manuscript was revised and corrected to be easily understood.

      This study is driven by an interesting question and has the potential to be a valuable contribution to the literature.

      Reviewer #2 (Recommendations for The Authors):

      I hope that the authors will consider the suggested revisions and experimental replication to improve the quality of the study and paper.

      This study tests a very interesting hypothesis. I understand that additional replicates are difficult to conduct, but I do believe that fitness studies absolutely require experimental replicates. Unless you are able to replicate the observed effects, I personally would not trust the results of this study. I hope that you will consider conducting replicates so that this important question can be answered in a more robust manner. Below, I expand upon some additional points in the public review and also provide more specific suggestions. I provided some copy-editing feedback, but was not able to point out all grammatical mistakes. I suggest that you use ChatGPT to help you edit the English. For example, you can feed ChatGPT your MS and ask it to bold the grammatical errors or you can ask it to edit grammatical errors and bold the sections that were edited. I understand that writing in a second language is very difficult (from personal experience!), so I view ChatGPT as a great tool to help even the playing field for publishing. Below are line item suggestions. Apologies that wording is curt, I was trying to be efficient in writing.

      20-21: I suggest that you emphasize that you are investigating the interactive effect.

      Accepted as suggested.

      22: they weren't "reared" (from larvae) in different conditions, they were "maintained" as adults

      Accepted as suggested.

      26-27: increased/decreased is a bit misleading since you did not evaluate these groups sequentially in time. It might be more accurate to describe it as less than/greater than. Also, if you say increased/decreased or less than/greater than, you should always say what you are comparing to. The same applies throughout the MS.

      Accepted as suggested.

      29-30: "finding the" is not correct here; could be "with the lowest..."

      Accepted as suggested.

      34-36: I do not think that your results suggest this, even if you were to replicate the results of this experiment. You haven't shown metabolic changes.

      We understand the point. Accepted as suggested.

      42-44: "one of the main responsible" should be "one of the main species responsible..."

      Accepted as suggested.

      48: I think that "host preference" is better than selection here; -philic denotes preference

      Accepted as suggested.

      50: "Moreover" isn't the correct transition word here

      Accepted as suggested.

      57: "could" isn't correct here; consider saying "... species sometimes feed primarily on mammal hosts, including humans, in certain situations."

      Accepted as suggested.

      58: Different isn't correct word here

      Accepted as suggested.

      60: delete "feeding"

      Accepted as suggested.

      66-68: I am not familiar with any blood meal analysis studies in the southern hemisphere that show host switching for Culex species between summer and autumn. If this hasn't been shown, then this critique of the host migration hypothesis doesn't make sense.

      There are some studies pointing this out (Stein et al., 2013, Beranek 2019), and unpublished data from us). However, our hypothesis has supported by epidemiological data observed in human population which indicate a seasonal activity pattern. It was explained in depth in the Introduction section.

      68: ensures is not the right word; I suggest "suggests"

      Accepted as suggested.

      68-70: this explanation isn't clear to me; please revise

      It will be revised. Accepted as suggested.

      70: change cares to care

      Accepted as suggested.

      76-77: can you explain how they were not supported by the data for the benefit of those who are not familiar with these papers please?

      Accepted as suggested.

      87-89: I suggest the following wording: "In the autumn, we expect a greater number of eggs (fecundity) and larvae (fertility) in mosquitoes after feeding on a mammal host compared to an avian host, and the opposite relationship in the summer."

      Accepted as suggested.

      99: edit for grammar

      Accepted as suggested.

      102: suggest: "...offered a blood meal from a restrained chicken twice a month"

      Accepted as suggested.

      107: powder

      Accepted as suggested.

      108: inbred? Is this the term you meant to use?

      Changed as suggested.

      109: "several" cannot be used to describe 20 generations; suggest using "over twenty generations"; also, it would be good to acknowledge in your discussion that lab adaptation could force evolution, especially since mosquitoes are kept at constant temperatures and fed with certain hosts (with easy access) in the lab. Also, it would be good to know when the experiments were conducted to know the lapse of time between the creation of the colony and the experiments.

      Accepted as suggested.

      110-111: Does humidity vary between summer and fall in Córdoba? If so, I suggest acknowledging in the discussion that if humidity differences are involved in a potential interaction between host species and seasonality, then this would not have been captured by your experimental design.

      Several variables change during seasons. We were interested in capturing the effects of temperature and photoperiod, since humidity is a variable difficult to control.

      113-116: I suggest combining into one sentence to make more concise.

      Accepted as suggested.

      135: You might be obscuring the true impact of seasonality by rearing the larvae under the same conditions. There may be signals that mothers/eggs/larvae receive that influence their behavior (e.g. I believe this is the case for diapause), so this limitation should also be acknowledged. I understand why you decided to do this to control for development time and size, but it is something that should be considered in the discussion.

      As it was explained above, Cx. quinquefasciatus do not suffer diapause in our country. Maintaining mosquitoes from adults was an approach selected by us based on other studies.

      138: edit: "with cotton pads soaked in... on plastic..."; what is plastic glass? Do you mean plastic dishes?

      Accepted as suggested.

      141: here and throughout paragraph, full should be "fully"

      Accepted as suggested.

      144: located should be "placed"

      Accepted as suggested.

      147: suggest editing to "at which point, they were fixed with 1 mL of 96% ethanol and the number of L1 larvae per raft was counted."

      Accepted as suggested.

      154-155: edit for grammar

      Accepted as suggested.

      157: Your GLM explanation doesn't say anything about how you made pairwise comparisons between your levels; did you use emmeans?

      This revised version includes a more detailed methodology and statistical analysis. Accepted as suggested.

      158-160: I don't understand why you took this approach - it seems strange to me to use this analysis, but I am not familiar with it, so it might be that I lack the knowledge to be able to adequately evaluate. Please provide more explanation so that readers can better understand this analysis. A citation for this kind of application of the analysis would be helpful.

      It was changed to be in accordance with the remaining analyses.

      173: replace neither with either

      Accepted as suggested.

      174: this applies throughout; edit to : "An interaction effect was observed..."

      Accepted as suggested.

      175: "it was not found" is grammatically incorrect; instead : "We did not find ..." or "no differences in... were detected", etc

      Accepted as suggested.

      183: "it was detected" is grammatically incorrect

      Accepted as suggested.

      185-186: "being this treatment... in terms of fitness": I do not understand what this means. Please rephrase

      Accepted as suggested.

      170-199: you should provide the effect sizes and p values in text and/or in the figure for the pairwise comparisons

      Accepted as suggested.

      193-196. These two sentences are confusing and I am not sure what you mean, especially in the first sentence.

      It was rewritten. Accepted as suggested.

      Figure 1: This figure is great and easy to read and interpret! Thank you for the comment! 218-219: it is important to state which mosquito species you are referring to here.

      Accepted as suggested.

      226-227: you definitely should acknowledge the small sample size here.

      Considered.

      227: "it was observed" should be "We observed" or "A greater hatching rate.... was observed."

      Accepted as suggested.

      228-229: is the result really comparable even though you took very different approaches to the analysis for these outcomes?

      Changed to be comparable.

      230-278: the discussion of these hypotheses is too long and detailed, especially since the comparison of mouse vs chicken wasn't your main question; you really wanted to understand this in the context of seasonality. I suggest cutting this down a lot and making room to dig into your results more, and also to discuss the potential impacts of your experimental design/limitations on the results.

      Discussion was changed to focus on our results and model. Accepted as suggested.

      281: Hoffman is an old citation; I suggest you cite a modern review.

      Accepted as suggested. We deleted it due to the re-writing of the manuscript.

      282: "It can be recognise".. I am not sure what you are trying to say here

      Accepted as suggested.

      1. After the first time you write a species name, you can abbreviate the genus in all future mentions unless it is at the beginning of a sentence.

      Accepted as suggested.

      303-305: Revise this sentence. E.g "Fewer studies are available regarding photoperiod and show mixed results; Mogi (1992) found that mid and long day lengths induced greater fecundity while Costanzo et al. (2015) did not find differences in fecundity by day length."

      Accepted as suggested.

      315-316: typically, unpublished data shouldn't be referenced; I'm not sure if eLife has a policy on this.

      We will check this with eLife guidelines. However, since the lack of evidence on this pattern we consider important to include this unpublished data.

      316: Aegypti should be lowercase

      Accepted as suggested.

      328-330: This sentence is redundant with the first sentence of the paragraph

      Accepted as suggested.

      321-336: You never reintroduced your hypothesis in your discussion. I suggest that you center your whole discussion more directly around the hypothesis that motivated the study. If you decide not to restructure your discussion, you should at least reintroduce your hypothesis here and discuss how your results do not support the hypothesis.

      Accepted as suggested.

      337-348: This paragraph is a bit confusing as you jump between fertility and hatchability

      Accepted as suggested.

      353: is viral transmission the right word to use here? I think you might mean bridge vector transmission to humans specifically?

      Accepted as suggested.

      357: you say "neither" but never define which traits you are referring to

      Accepted as suggested.

      361: I suggest "two variables previously analyzed separately..."

      Accepted as suggested.

      General: There is no statement about the availability of data; it is eLife policy to require all data to be publicly available. Also, it would be helpful to share your code to help understand how you conducted pairwise comparisons, etc.

      In the submission it was not mentioned anything about data availability. However, all data and scripts will be uploaded with the VOR if it is required.

      Recommendations for the authors:

      I found your study interesting and potentially promising. However, there are some fundamental problems with the study design and the hypothesis, including:

      <(1) Seasonality simulation - Seasonality is strongly associated with time, so it is unusual to simulate seasonal factors without accounting for time. The actual factors associated with seasonal change in reproductive output may be neither a difference in host blood meal nor temperature and photoperiod. It is therefore, odd to reduce seasonality to a difference in photoperiod and temperature in summer and autumn without even mentioning the time of year when the experiment was carried (except for the mention of February as the time the stock samples were collected from the wild).

      The temperature and photoperiod settings are established according to a representative day in both autumn and summer. To determine these settings, we utilized climate data spanning a 3-year period (2020-2022), encompassing the most frequently occurring temperatures and day lengths. The weather conditions remained notably consistent throughout this time frame, which is why the specific year was not mentioned. Moreover, including the year in laboratory experiment details is uncommon, as evident in various papers. This practice can be corroborated by referring to multiple sources (cited in the original manuscript). We mention this in the new version.

      (2) Hypothesis - While the hypothesis alludes to the 'reason' for seasonal host shift, the prediction is on the outcome of the interaction between blood meal type and season.

      It might be nicer to frame your hypothesis to be consistent with the aim, which is, testing the partial contributions of blood meal type, versus photoperiod and temperature to seasonal change in the reproductive output of Culex quinquefasciatus. A hypothesis like that can be accompanied by alternative predictions according to the expected individual and interactive effects of both factors.

      It was rewritten in the revised version to be consistent with our predictions and findings.

      Blood meal type, temperature, and photoperiod are all components of seasonality, so the strength of the study is its potential to decouple the effect of blood meal type from that of temperature and photoperiod on the seasonal reproductive output of Culex quinquefasciatus by comparing the two blood meal types under simulated summer and winter conditions. Ideally, this should have been over a natural summer and winter because a natural time difference captures the effect of other seasonal factors other than temperature and photoperiod.

      Furthermore, the hypothesis stemmed from field observations, while the study itself was conducted under laboratory conditions using a local population of Culex quinquefasciatus from Argentina. It remains uncertain whether there is supporting evidence for a seasonal shift in host usage in Culex quinquefasciatus from the stock population. Discussing the field observations within the stock population would provide valuable insights.

      It was considered in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study seeks to disentangle the different selective forces shaping the evolutionary dynamics of transposable elements (TEs) in the wild grass Brachypodium distachyon. Using haplotype-length metrics, and genetic and environmental differentiation tests, the authors present in large parts convincing evidence that positive selection on TE polymorphisms is rare, and that the distribution of TE ages points to purifying selection being the main force acting on TE evolution in this species. A caveat of this study, as of other studies that seek to assess TE insertion polymorphisms with short reads, is that the rates of false negatives and false positives are difficult to estimate, which may have major effects on the interpretation. This study will be relevant for anyone interested in the role of TEs in evolution and adaptation.

      Thank you for considering our manuscript for publication in eLife. We appreciate the constructive comments and suggestions of the reviewers. We have addressed the raised issues by the reviewers. Below, we provide a more detailed response to each of the reviewer comments.

      Public Reviews:

      Reviewer #1:

      The study presented in this manuscript presents very convincing evidence that purifying selection is the main force shaping the landscape of TE polymorphisms in B. distachyon, with only a few putatively adaptive variants detected, even though most conclusions are based on the 10% of polymorphisms contributed by retrotransposons. That first conclusion is not novel, however, as it had already been clearly established in natural A. thaliana strains (Baduel et al. Genome Biol 2021) and in experimental D. simulans lines (Langmüller et al. NAR 2023), two studies that the authors do not mention, or improperly mention. In contrast to the conclusions reached in A. thaliana, however, Horvath et al. report here a seemingly deleterious effect of TE insertions even very far away from genes (>5kb), a striking observation for a genome of relatively similar size. If confirmed, as a caveat of this study is the lack of benchmarking of the TE polymorphisms calls by a pipeline known for a high rate of false positives (see detailed Private Recommendations #1), this set of observations would make an important addition to the knowledge of TE dynamics in the wild and questioning our understanding of the main molecular mechanisms through which TEs can impact fitness.

      Thank you for your positive evaluation of our paper. We have now adjusted the manuscript to include the mentioned studies (Line 330-333) and to address the issue of false positive and false negative calls. The detailed responses to all the raised points are below.

      Reviewer #2:

      Summary:

      Transposable elements are known to have a strong potential to generate diversity and impact gene regulation, and they are thought to play an important role in plant adaptation to changing environments. Nevertheless, very few studies have performed genome-wide analyses to understand the global effect of selection on TEs in natural populations. Horvath et al. used available whole-genome re-sequencing data from a representative panel of B. distachyon accessions to detect TE insertion polymorphisms (TIPs) and estimate their time of origin. Using a thorough combination of population genomics approaches, the authors demonstrate that only a small amount of the TE polymorphisms are targeted by positive selection or potentially involved in adaptation. By comparing the age-adjusted population frequencies of TE polymorphisms and neutral SNPs, the authors found that retrotransposons are affected by purifying selection independently of their distance to genes. Finally, using forward simulations they were able to quantify the strength of selection acting on TE polymorphisms, finding that retrotransposons are mainly under moderate purifying selection, with only a minority of the insertions evolving neutrally.

      Strengths:

      Horvath et al., use a convincing set of strategies, and their conclusions are well supported by the data. I think that incorporating polymorphism's age into the analysis of purifying selection is an interesting way to reduce the possible bias introduced by the fact that SNPs and TEs polymorphisms do not occur at the same pace. The fact that TE polymorphisms far from genes are also under purifying selection is an interesting result that reinforces the idea that the trans-regulatory effect of TE insertions might not be a rare phenomenon, a matter that may be demonstrated in future studies.

      Weaknesses:

      TEs from different classes and orders strongly differ in multiple features such as size, the potential impact of close genes upon insertion, insertion/elimination ratio (ie, MITE/TIR excision, solo-LTR formation), or insertion preference. Given such diversity, it is expected that their survival rates on the genome and the strength of selection acting on them could be different. The authors differentiate DNA transposons and retrotransposons in some of the analyses, the specificities of the most abundant plant TE types (ie, LTR/Gypsy, LTR/Copia, MITE DNA transposons) are not considered.

      The authors used a short-read-based approach to detect TIPs and TAPs. It is known that detecting TE polymorphisms is challenging and can lead to false negatives, depending on the method used and the sequencing coverage. The methodology used here (TEPID) has been previously applied to other species, but it is unclear if the sensitivity of the TIP/TAP caller is equivalent to that of the SNP caller and how these potential differences may affect the results.

      Thank you for your positive evaluation of our paper. We have now adjusted the manuscript and the discussion to include the mentioned points on the different TE superfamilies and the reliability of the TE calls. The detailed responses to all the raised points are below.

      Private Recommendations:

      Reviewer #1:

      (1) TE polymorphisms (presence and absence variants) were called from short-read sequencing data using a pipeline (TEPID, Stuart et al. eLife 2016) that is known to have a low specificity as well as a low sensitivity in its detection of presence variants (Baduel et al. MIMB 2021). An assessment of the rate of false positives and false negatives in the data presented in this study and how it varies across TE superfamilies is therefore of crucial importance as it may bias all downstream analyses, especially if it impacts the identification of polymorphisms contributed by retrotransposons, as these are the basis of most conclusions of the manuscript. Nonetheless, the fact that the PCA of the polymorphisms contributed by DNA transposons is less able to distinguish genetic clades than with those contributed by retrotransposons, suggests the issue of false positives is most preeminent for DNA transposons. However, high rates of false positives may explain why no significant increase in TE frequency is detected within selective sweep regions, a result that runs against the expectation of hitch-hiking of neutral or weakly deleterious polymorphisms which the authors claim is the category of many TE polymorphisms. Furthermore, given that the reference genome belongs to the B_east clade, and the TEPID is better at calling absence than presence it may bias analyses in this clade (where clade-specific insertions will take the form of absence in other clades which are well detected) compared to other clades (where clade-specific insertions will be presence polymorphisms and may be missed). A benchmark of TE polymorphism calls could be done by de novo assembling one genome from each clade or by cross-checking at least the presence variant calls from TEPID with those made with another of the many TE calling pipelines available.

      We agree with this issue raised by both reviewers regarding the effects of false negative and false positive TE calls. We also think that some reasonable follow-ups should be done to check the potential impact of the false negative and false positive TE calls on the presented results, without turning the manuscript in a method comparison paper as this is not the main goal of this study. Therefore, we generated a subsample of our dataset that included only accession with an average genome wide mapping coverages of at least 20x, as the false negative TE call rate is correlated with the mapping coverage and a high mapping coverage is expected to lead to a reduction in the false negative TE call rates. We then used this subsample to check if our results would change if our dataset had a lower false negative TE call rate. However, reducing the rate of false negative calls through the use of only higher coverage samples did not change our results and interpretations.

      Re-running the ANCOVA analyses revealed similar results regarding the accumulation of TEs in selective sweep regions. This was added to the main text Line 143-148: “Similar results were obtained when investigating the number of fixed TE polymorphisms (Additional file 2: Table S1) and the allele frequency of TE polymorphisms (Additional file 2: Table S2) in high iHS regions using a subset of our dataset with an expected lower false negative TE call rate, that only included samples with a genome-wide mapping coverage of at least 20x (see Discussion and Materials and Methods for more details).” and in Additional file 2: Table S1 and S2.

      Further, we re-ran the age-adjusted SFS based on this subset of our dataset and found that the results and conclusions from the age-adjusted SFS were not only driven by false negative TE calls. This was also included in the text Line 338-349: “One caveat of the approach used in this study is that TE calling pipelines based on short-reads tend to have higher false positive and false negative call rates than SNP calling pipelines, which is also the case for the TEPID TE calling pipeline used here [57, 59]. A high false negative TE calling rate however might bias our TE frequency estimates toward lower frequencies, which could drive the observed patterns in the age-adjusted SFS. To assess if the false negative TE calling rate in our study substantially affected our results, we re-run the age-adjusted SFS on a subset of our dataset only including samples with a genome-wide mapping coverage of at least 20x, as higher mapping coverages are expected to reduce the false negative call rate [27, 59]. Using the TE allele frequencies estimated based on this subset of our data to estimate  frequency revealed similar results of the age-adjusted SFS based on the whole dataset (Additional file 1: Fig. S9), indicating that our observation of retrotransposons evolving under purifying selection is not solely driven by a high false negative TE calling rate.” and in Additional file 1: Fig. S9.

      The details of this analyses have been added to the materials and methods Line 493-498: “Mapping coverage is known to influence false discovery rate [27, 59]. To investigate the impact of false positive and false negative TE calls on our results, we down sampled the TE dataset to only include TEs that have been called in samples that had at least an average mapping coverage of 20x. The allele frequencies of TEs present in our high coverage dataset was recalculated only considering samples with at least an average mapping coverage of 20x. This second TE dataset was then used to check if using a dataset with a higher mapping coverage and presumably a lower false TE calling rate impacted our results.”

      (2) If confirmed, the observation that retrotransposons located more than 5kb away from genes appear to be also affected by purifying selection (L209) is indeed surprising. The authors should add a comparison with SNPs at the same distance from genes to strengthen the claim and make sure it is not the result of mapping artifacts, such as alignment quality dropping far away from genes.

      We added a comparison of the age-adjusted SFS of SNPs and retrotransposons more than 5 kb away from genes to evaluate if the observed shape of the age-adjusted SFS of retrotransposons more than 5 kb away from genes were due to artefacts. The results are included on line 383-389: “Finally, we tested whether TE polymorphisms located more than 5 kb away from genes are evolving under purifying selection could be due to mapping or other artefacts by comparing the shape of the age-adjusted SFS of retrotransposons and SNPs more than 5 kb away from genes. However, the age-adjusted SFS of SNPs 5 kb away from genes differs from the one of retrotransposons (Additional file 1: Fig. S10), indicating that the shape of the age-adjusted SFS of retrotransposons more than 5 kb away from genes is not likely to be the result of artefacts in regions of the genome far away from genes.” and Additional file 1: Fig. S10.

      (3) The authors' claim that most TE polymorphisms are under weak to moderate purifying selection (L273) relies on the comparison of the age of polymorphisms in the oldest age bin with forward simulations. However, the conclusions from these comparisons cannot be extrapolated to the fitness effects of all TE polymorphisms as variants in the oldest age bin are de facto a biased sample of the variants of a category, a point the authors highlight.

      We adjusted the mentioned paragraph to better highlight this point. Line 390-397: “To further ascertain the strength of purifying selection, we used forward simulation and showed that simulations assuming a moderately weak selection pressure (S = -5 or S = -8) against TE polymorphisms best fitted our observed data. In theory, no TE polymorphisms under strong purifying selection should be present in a natural population, as such mutations are expected to be quickly lost, especially in a predominantly selfing species where most loci are expected to be homozygous. Therefore, it is not surprising that TE polymorphisms which persist in B. distachyon are under weak to moderate selection, as also shown, for example, for the L1 retrotransposons in humans [27] or the BS retrotransposon family in Drosophila melanogaster [62].”

      L220-228 for high-effect SNPs. Indeed, the most deleterious TE polymorphisms would be purged very quickly and never contribute to variants in the oldest age bin. Unless new arguments can be made to support this claim, this conclusion should be rephrased to claim instead that even the oldest TE polymorphisms are still mostly non-neutral and under weak to moderate purifying.

      This has been adjusted. Line 231-232: “. Hence, even the oldest retrotransposon polymorphisms seem to be mostly non-neutral and are affected by purifying selection.”

      L214: replace smaller with more negative for clarity.

      Done.

      L233: Given the discussion L220-228, the oldest age bin seems to be biased in its composition and thus not useful for comparisons. The sentence should therefore be rephrased to reflect that DNA transposon polymorphisms appear to be actually less deleterious than high-effect SNPs in S9A and B based on the penultimate age bin.

      This has been fixed.

      Reviewer #2:

      • I wonder if false negative detection could artificially increase the evidence for purifying selection by increasing the amount of low-frequency variants. This could be easily checked if long-read data or genome assembly is available for any of the samples in the collection, by comparing the TIP/TAP prediction with the actual sequence.

      We agree with this point from the reviewers that false negative calls can lead to misinterpretations of the observed low-frequencies of the TEs. (But see response to the first comment of reviewer #1). Unfortunately, long-read data from the sample used here are not available to estimate false negative call rates. However, to check if the observed results are manly driven by high false negative rates, we re-run the age-adjusted SFS based on samples with at least 20x mapping coverage, which should result in the reduction the false negative TE calling rate. The results and conclusions from this second analyses were included in the text Line 338-349: “One caveat of the approach used in this study is that TE calling pipelines based on short-reads tend to have higher false positive and false negative call rates than SNP calling pipelines, which is also the case for the TEPID TE calling pipeline used here [57, 59]. A high false negative TE calling rate however might bias our TE frequency estimates toward lower frequencies, which could drive the observed patterns in the age-adjusted SFS. To assess if the false negative TE calling rate in our study substantially affected our results, we re-run the age-adjusted SFS on a subset of our dataset only including samples with a genome-wide mapping coverage of at least 20x, as higher mapping coverages are expected to reduce the false negative call rate [27, 59]. Using the TE allele frequencies estimated based on this subset of our data to estimate  frequency revealed similar results of the age-adjusted SFS based on the whole dataset (Additional file 1: Fig. S9), indicating that our observation of retrotransposons evolving under purifying selection is not solely driven by a high false negative TE calling rate.” and in Additional file 1: Fig. S9.

      • Supplementary Figure S1. DNA transposons are much worse at separating the samples in comparison to LTR-retrotransposons. Doesn´t this suggest that these two classes have very different dynamics in the population and maybe different intensities of the selection forces acting on them? Could this profile be explained as DNA transposons being older and likely more fixed in all the clades, whereas retrotransposons are more recent and more specific to some populations? Another possibility might be that some B. distachyon DNA transposons had an unusually high excision rate. In any case, in my opinion, this reinforces the need to study the different TE orders in more detail.

      Indeed, different TE orders and superfamilies can have different excision rates, age distributions and be under different selective regimes. To investigate the possibility that different TE orders are affected by very different selective regimes, we split our TE dataset into the four different TE types: Copia, Ty3, Helitron and MITE. We than re-run the age-adjusted SFS analyses and added our results to the text Line 422-430: “To further examine our conclusion on purifying selection, we investigated the selective regime affecting different retrotransposons and DNA-transposons superfamilies. Thereby, we generated age-adjusted SFS for the four most common TE superfamilies Copia, Ty3 (also known under the name Gypsy, but we will avoid using this name because of its problematic nature see [71]), Helitron and MITE and found similar deviations of the  frequency from 0 in the four investigated TE superfamilies (Additional file 1: Fig. S12–S15). These results indicate that our conclusion on the broad effect of purifying selection is not driven by a single TE superfamily but is at least common among the four most numerous TE superfamilies.” and in Additional file 1: Fig. S12- S15.

      • Line 112: "most TE polymorphisms in our dataset were young and only a few were very old". Does this change substantially among TE orders/superfamilies?

      Indeed, there are some differences in the age distribution of the TEs depending on the superfamilies, However, the differences are no substantial as the age bins in the age-adjusted SFS of the different TE superfamilies are fairly similar. See Additional file 1: Fig. S12-S15.

      • Figure 2. Is difficult to read, especially lower panels. I think the grey border of the boxplots makes visualization difficult.

      The gray borders have been removed.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Many of my specific issues have been addressed in the revision. However, the data shown in Reviewer Fig. 1 and 2 is not sufficiently described to assess it's reliability and these new data do not appear to have been integrated into the paper. A response that more clearly states how the manuscript has been revised to address the comments is necessary.

      We appreciate the opportunity to respond to your updated comments on our manuscript. We carefully considered the feedback and made changes to address the specific issues raised.

      In response to your question of insufficient description of the data shown in Reviewer Fig. 1 and 2, we would like to confirm that we have taken this feedback seriously. Supplementary data, including the information provided in Reviewer Figures 1 and 2, have been fully described and integrated into the body of the manuscript according to your request. We ensured that the reliability and significance of new data were clearly presented to enhance the overall synthesis of the manuscript.

      We are grateful to your valuable feedback, which undoubtedly contributed to the refinement of our manuscript. We hope that the revised version meets the standards of the journal and look forward to the opportunity for further deliberation.

      Reviewer #2 (Recommendations For The Authors):

      Additional feedback from the reviewer:

      "I think the authors have been responsive to my previous comments. However, I cannot find this new data in the main text but rather only in the response to reviewers. New data should be incorporated into the main text not the supplement as the controls are important to consider alongside the treatment groups. Lastly, while the authors include BODIPY in their approaches, their results are not quantitative. My suggestion was to include this data in a quantitative manner not just the images. Lastly, I am still somewhat puzzled about the connection with GABA. The rationale for its selection other than it was significantly changed is not strong."

      Thank you for providing us with the latest feedback. We appreciate the opportunity to address the specific concerns raised and provide a detailed response to each point.

      (1) Incorporation of New Data into the Main Text:

      We acknowledge the reviewer's comment regarding the incorporation of new data into the main text rather than solely in the response to reviewers. In response to this feedback, we have diligently revised the manuscript to ensure that the new data, including controls, is now seamlessly integrated into the main body of the text. This modification allows for a more comprehensive and contextual presentation of the data, as recommended by the reviewer.

      (2) Quantitative Presentation of BODIPY Results:

      We understand the importance of presenting quantitative data for the BODIPY results, and we appreciate the reviewer's suggestion to include this information in a quantitative manner, not just as images. In line with this valuable feedback, we have revised the relevant sections to incorporate quantitative data alongside the images, providing a more robust and comprehensive presentation of the results.

      (3) Rationale for the Selection of GABA:

      In the present study, in order to elucidate the molecular mechanisms through which pathway participates metformin-treated IR injury, we analysed gene expression profiles of each group mice, showing that similar mRNA changes are mainly concentrated in the three top pathways: lipid metabolism, carbohydrate metabolism, and amino acid metabolism. Given the close relevance between lipid metabolism and ferroptosis, and the fact of carbohydrate metabolism is a primary way to metabolize amino acids, 22 species of amino acid were detected in liver tissues using HPLC-MS/MS for further identification of key metabolites involved in the role of metformin against HIRI-induced ferroptosis. It was found that only GABA level is significantly increased by metformin treatment and FMT treatment, further verifying by the data of ELISA detection. Consequently, we identified GABA was the main metabolism of metformin protecting from HIRI and focus on the source of GABA generation.

      We would like to express our gratitude to your thorough evaluation and constructive feedback, which has undoubtedly contributed to the improvement of our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study that provides new insights into the development and function of medullary thymus epithelial cells (mTEC). The authors provide compelling evidence to support their claims as to the differentiation and lineage outcomes of CCL21+ mTEC progenitors, which further our understanding of how central tolerance of T cells is enforced within the thymus.

      Public Reviews:

      Reviewer #1 (Public Review):

      The work by Ohigashi and colleagues addresses the developmental and lineage relationship of a newly characterized thymus epithelial cell (TEC) progenitor subset. The authors take advantage of an elegant and powerful set of experimental approaches to demonstrate that CCL21-expressing TECs appear early in thymus organogenesis and that these cells, which are centrally located, go on to give rise to medullary (m)TECs. What makes the findings intriguing is that these CCL21-expressing mTECs are a distinct subset, which do not express RANK or AIRE, and transcriptomic and lineage tracing approaches point to these cells as potential mTEC progenitor-like cells. Of note, using in vitro and in vivo precursor-product cell transfer experiments, the authors show that this subset has a developmental potential to give rise to AIRE+ self-antigen-displaying mTECs, revealing that CCL21-expressing mTECs can give rise to distinct mTEC subsets. This functional duality provides an attractive rationale for the necessary function of mTECs, which is to attract CCR7+ thymocytes that have just undergone positive selection in the thymus cortex to enter the medulla to undergo tolerance-induction against self-antigen-displaying mTECs. Overall, the work is well supported and offers new insights into the diverse functions of the medullary compartment, and how two distinct subsets of mTECs can achieve it.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to discover a developmental pathway leading to functionally diverse mTEC subsets. They show that Ccl21 is expressed early during thymus ontogeny in the medullary area. Fate-mapping gives evidence for the Ccl21 positive history of Aire positive mTECs as well as of thymic tuft cells and postnatally of a certain percentage of cTECs. Therefore, the differentiation potential of Ccl21+ TECs is tested in reaggregate thymus experiments - using embryonic or postnatal Ccl21+ TECs. From these experiments, the authors conclude that at least embryonic mTECs in large part pass through a Ccl21 positive stage prior to differentiation towards an Aire expressing or tuft cell stage.

      The authors are using Ccl21a as a marker for a bipotent progenitor that is detectable in the embryonic thymus and is still present at the adult stage mainly giving rise to mTECs. The choice of this marker gene is very interesting since Ccl21 expression can directly be linked to an important aspect in thymus biology: the expression of Ccl21 by cells in the thymic medulla allows trafficking of T cells into the medulla in order to undergo T cell selection.

      Making use of the Ccl21 detection, the authors can nicely show that cells actively expressing Ccl21 are localized throughout the medulla at an embryonic stage but also in adult thymus tissue. This suggests, that this progenitor is not accumulating at a specific area inside the medulla. This is a new finding.

      Moreover, the finding that a Ccl21+ progenitor population plays a functional role in thymocyte trafficking towards the medulla has not been described. Thus, Ccl21 expression may be used to localize a late bipotent progenitor in the thymic lobes.

      In addition, in Fig.8, the authors provide evidence that these progenitor cells have the potential to self-maintain as well as to differentiate in reaggregate experiments at E17 (not at 4 weeks of age). The first point is of great interest and importance since these cells in theory can be of therapeutic use.

      Overall assessment:

      The authors highlight a developmental pathway starting from a Ccl21-expressing TEC progenitor that contributes to a functionally diverse mTEC repertoire. This is a welcome addition to current knowledge of TEC differentiation.

      Reviewer #3 (Public Review):

      In this manuscript, the authors define the developmental trajectory resulting in a diverse mTEC compartment. Using a variety of approaches, including a novel CCL21-fate mapping model, data is presented to argue that embryonic CCL21-expressing thymocyte attracting mTECs naturally convert to into self-antigen displaying mTEC subsets, including Aire+ mTECs and thymic tuft cells. Perhaps somewhat surprisingly, a large fraction of cTECs were also marked for having expressed CCL21, suggesting that there exists some conversion of mTEC (progenitors) into cTEC, a developmentally interesting observation that could be followed up later. Overall, the experimental setup, writing, and conclusions, are all outstanding.

      Provisional author response

      We thank the editors and the reviewers for their supportive comments on our manuscript. We will revise the manuscript according to their helpful recommendations.

      Author response to recommendations

      We thank the editors and the reviewers for their supportive comments on our manuscript. We also thank the three reviewers for their helpful recommendations. We have revised the manuscript accordingly, as detailed below.

      Reviewer #1 (Recommendations For The Authors):

      There are several unanswered questions, which the authors themselves acknowledge, a principal one being whether CCL21+ mTECs represent a progenitor for yet another distinct subset of cortical (c)TECs, or whether they represent an intermediary or unique population of mTECs derived from a bipotent (cTEC/mTEC) progenitor. These questions will need to be addressed in future work as they go beyond the initial characterization of this intriguing mTEC subset.

      Indeed, our findings reported in this manuscript have stimulated many interesting questions, including those pointed out by the reviewer. We would like to address them one by one in our future work.

      The presence of GFP+ cTECs, which are lineage-traced as having expressed CCL21, begs the question as to whether these cells are generated as a consequence of later steps in mTEC differentiation or derived from earlier bipotent cells, which again the authors point out. The authors could discuss this further or perhaps experimentally address this by using a model system whereby mTEC differentiation is absent or halted (e.g., Relb ko, or TCRa/TCRd ko) and test whether GFP+ cTECs are still present.

      According to the suggestion, we have revised the manuscript by adding a statement that it is interesting to examine whether GFP+ cTEC development in Ccl21a-Cre x CAG-loxP-EGFP mice is mediated through RelB-dependent mTEC developmental progression or developing thymocyte-dependent mTEC-nurturing ‘crosstalk’ signals.

      Reviewer #2 (Recommendations For The Authors):

      Even though the manuscript highlights the functional aspect of a postnatal bipotent progenitor, there are several aspects that need further discussion.

      (1) The title is somewhat misleading since the identified TEC subset can not only be detected in embryonic, but also in postnatal thymus. Only the RTOC experiments indicate a higher developmental potential of TECs isolated from embryos, but this might as well be due to experimental difficulties as discussed in the text. Furthermore, Ccl21+ TECs are shown to differentiate postnatally into mTECs and cTECs, therefore this subset presumably belongs to a bipotent progenitor population described earlier (their ref. 22, 39).

      We are fully aware of previous studies showing that mTEC progenitors include cells that transcribe Ccl21a, and have cited them in the manuscript. The manuscript title describes our finding that thymocyte-attracting CCL21-expressing functional mTECs isolated from embryonic thymus show the capability to give rise to self-antigen-displaying mTECs. We thank the reviewer for further pointing out the possibility that postnatal CCLl21+ TECs include cells that retain the capability to differentiate into mTECs and cTECs.

      (2) In the introduction the authors claim that the "developmental progression of the self-antigen-displaying mTEC subset occurs in a single stream as mTEClow progenitors -> mTEChigh Aire-expressing cells -> mTEClow mimetic cells." line 79. So far it only could be shown that some mimetic cell types undergo an Aire+ stage; whether this is true for all mimetic cells remains to be shown. Therefore, this statement should be toned down.

      Following the suggestion, this sentence has been toned down in the revised manuscript.

      (3) In line 86, the reference to another paper, describing Ccl21a expression in a postnatal mTEC biased progenitor should be added: Nusser et al. Nature. 2022 PMID: 35614226, in which the developmental potential of the Ccl21 positive so-called postnatal progenitor is analysed by barcoding and results give evidence for differentiation into mature mTECs (see lines 94-96).

      As suggested, the Introduction of the revised manuscript now cites Nusser, et al. study showing that postnatal mTEC-biased progenitors include cells that transcribe Ccl21a.

      (4) Have a look at Extended Data Figure 2b of PMID: 35614226, wherein the population-specific gene expression pattern of the progenitor population at different time points is depicted. Ccl21a belongs to a group of genes, which identifies the postnatal progenitor, and indicates that its functionality and/or developmental potential is age-dependent. Therefore, it would be important to specify the age of the analysed mice throughout the text of the results part instead of describing them as "postnatal" only.

      As recommended, mouse age has been added to the revised manuscript and figures.

      (5) Line 113: "embryonic" needs to be replaced since the results of Fig. 1 are referring to 5-week-old mice.

      The manuscript has been revised per the reviewer’s suggestion.

      (6) Referring to Fig. 3g, line 173: It is interesting to see that, at 3 weeks of age, 95% of mTECs have a Ccl21-history but only approx. 70% of cTECs. Therefore, the earliest progenitor giving rise to the first cTECs might still be productive and feed into the cTEC lineage. This reporter would allow for the analysis of progenitor activity over time. The same could be done for mTECs since at E15 the tdTomato signal is still low compared to the assigned medullary area in Fig. 2c in order to detect when the Ccl21-expressing progenitor becomes the main source of mTECs. The finding in Fig. 4e (line196) also argues for the timed replacement of cTECs by a progenitor which locates to the medulla, thus, leading to a decline in Ccl21-history signal towards the subcapsular region at 2 weeks of age. This should be better explained/discussed.

      We appreciate the work of Nusser, et al. showing that postnatal mTEC-biased, but not embryonic cTEC-biased, TEC progenitors include cells that transcribe a detectable amount of Ccl21a (cited in the Introduction as ref. 23). It is important to clarify whether and how those postnatal TEC progenitors (23) overlap with the embryonic and postnatal CCL21-protein-expressing mTECs reported in this study. It is also interesting to shed light on how Ccl21a+ progenitors contribute to cTECs and mTECs over the ontogeny and whether the enrichment of Ccl21a+ progenitor-derived cTECs in the perimedullary area reflects a temporal replacement of cTECs derived from Ccl21a+ progenitors localized in the medulla. We would like to clarify these issues in our future work. The revised manuscript includes a discussion of these issues.

      (7) Line 304 and 355: Note that the "unstable" age-dependent gene expression profiles were already reported in Nusser et al. Nature. 2022. Not only Ccl21 expression, but other progenitor-specific genes also change their expression levels with age. The entirety of changes in gene expression during aging likely impacts the developmental potential of progenitor populations. These changes might be reflected in the negative results of the RTOC experiment using TECs of 4-week-old mice. The manuscript would benefit from a discussion in light of this "unstable" age-dependent gene expression.

      It is interesting to point out that the age-dependent difference in gene expression profiles, which was reported in TEC progenitors by Nusser, et al. (23), is also detected in CCL21-expressing mTECs in this study. Similarly to the recommendation no. 6 by reviewer 2, and as described in the revised manuscript, it is interesting to clarify whether and how embryonic and postnatal CCL21-expressing mTECs overlap with the previously reported TEC progenitors.

      (8) Line 321: as discussed above, the exact time point should be added to the text since the proportion of cTECs derived from a Ccl21+ progenitor is associated with a certain time point, "2/3 of cTECs" refers to 3 weeks of age.

      The manuscript has been revised following the reviewer’s suggestion.

      Reviewer #3 (Recommendations For The Authors):

      The one question I have, which may be more of a curiosity of this reviewer than a requirement for the manuscript, is whether thymocytes themselves are required for the conversion/maturation of attracting TECs to mTECs? For example, in CD3e-/- (or Rag-/-) mice, are mTECs arrested at the thymocyte attracting stage, or is the conversion process 'pre-programed'? In the same vein, do cTECs (or the immature cTECs) maintain CCL21 expression in the absence of mature thymocytes? These are not critical studies but are fairly straightforward (effort- and time-wise) that would aid in placing this process in the overall scope of thymus development.

      We previously showed that Aire+ mTECs are detectable in the thymus of RAG2-deficient mice, in which thymocyte development is arrested beyond the CD4/CD8 double-negative 3 stage (Hikosaka, et al. 2006; PMID: 18799150). In another work, we also showed that Aire+ mTECs and CCL21+ mTECs are detectable in the thymus of TCR-alpha-KO mice, which lack mature CD4/CD8 single-positive TCR-alpha/beta-expressing thymocytes (Lkhagvasuren, et al. 2013; PMID: 23585674). These results indicate that thymocyte maturation beyond the Rag-dependent stage is not essential for the development of Aire+ mTECs. Nonetheless, we agree with the reviewer pointing out that it is important to clarify how developing thymocytes contribute to the growth and differentiation of diverse TEC subpopulations, including GFP+ cTEC development in Ccl21a-Cre x CAG-loxP-EGFP mice. The revised manuscript includes a discussion of these issues.

    1. Author Response

      We thank eLife Senior Editor and reviewers for the comprehensive evaluation and constructive comment on our manuscript. We are grateful that all 3 reviewers recognize the value of the large pharmacological and proteomics screen of 51 cancer cell lines in relation to vitamin C IC50 values. As reviewer 1 points out, our findings are of interest as high dose vitamin C is in clinical trials. Most importantly, we show that all 51 cell lines tested can be killed at a dose range that is achievable by intravenous administration in the clinic. These pharmacological findings underscore high-dose vitamin C as a potent anti-cancer agent. Moreover, we provide an elaborate description of functional terms associated with the vitamin C IC50 values in the different cell panels (Figs 1-5) and the common denominators across panels (Figs 6, 7 and 8), thereby enhancing our biological insights of sensitivity to vitamin C treatment. This study indeed is of descriptive nature and our large scale pharmacological and proteomics scale dataset should be seen as a resource for further research. The raw and processed data will be available in the ProteomeXchange repository (accession number and reviewer password were provided before) and the resubmission will include all processed proteome and phosphoproteome data as a supplementary file.

      It is beyond the scope of our study to do mechanistic studies with knock-downs to see if we can further sensitize cancer cell lines that are less sensitive. We do not call these cell lines resistant as cell growth can be inhibited at a clinically achievable dose.

      In our detailed rebuttal we will follow up on the suggestion of reviewer 1 to put our data also in the context of NCI-60 growth inhibition data for other cytotoxic agents. This will expand our comparative analysis to cisplatin in the lung cancer panel (Fig 5A) where we show that vitamin C IC50 values and cisplatin IC50 values are not one-on-one correlated as one of the most cisplatin resistant NSCLC cell lines in our panel was very sensitive to high dose vitamin C. Furthermore, we will clarify method details and annotate mutational status in our panels and explore potential genomic associations to high-dose vitamin C sensitivity as presented in previous studies (e.g. mutant BRAF and/or KRAS tumors, https://doi.org/10.1126/science.aaa5004).

      Finally, we will critically read the manuscript and add references where needed.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.

      Strengths:

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.

      Weaknesses:

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons). Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.

      The reviewer points out a weakness in the analysis of VTA axons in our dataset. The relatively low n (currently 7) comes from the fact that VTA axons in the CA1 region of the hippocampus are very sparse and very difficult to record from (due to their sparsity and the low level of baseline fluorescence inherent in long range axon segments). This is the reason they have not been recorded from in any other lab outside of our lab. LC axons, on the other hand, are more abundant in CA1. In the paper when comparing VTA versus LC axons we deal with the mismatch in n by downsampling the LC axons to match the VTA axons and repeated this 1000 times to create a distribution. However, because the VTA axon n is relatively low, it is possible that we have not sampled the VTA axon population sufficiently and therefore have a biased population in our dataset. The issue is that it takes months for the baseline expression of GCaMP to reach sufficient levels to be able to record from VTA axons, and it is typical to find only a single axon in a FOV per animal. There are additional reasons why mice and/or axon recordings do not reach criteria and cannot be included in the dataset (these exclusion criteria are reported in the Methods section). For instance, out of the 54 DAT-Cre mice injected, images were never conducted in 36 for lack of expression or because mice failed to reach behavioral criteria. Another 11 mice were excluded for heat bubbles that developed during imaging, z-drift of the FOV, or bleaching of the GCaMP signal.

      However, we do have n=2 additional VTA axon recordings that we will add to the dataset to bring the n up from 7 to 9. We plan on re-analyzing the data with n=9 VTA axons and making comparisons to down-sampled LC axons as described above. This boost in n will increase the power of our VTA axon analysis. To more formally test whether this is sufficient for statistical tests, we plan to utilize the G*power power-analysis tool to compute statistical power for each of the different tests we use. We will report this in the next version of the paper. However, the n=2 additional axons were nor recorded in the novel environment, so the next version will remain at n=7 for the novel environment analysis. We agree with the reviewer that the lack of the novelty induced DA axon activity may be a false negative, and so we will adjust the description of our results and discussion accordingly.

      During the data collection of VTA axon activity we tried two variants of GCaMP: 6s and 7b, to see if one would increase the success rate of finding and recording from VTA axons. Given the long time-course of these experiments and the low yield in success, we pooled the GCaMP variants together to increase statistical power. Because the 2 additional VTA DA axons that were recorded from expressed GCaMP6s, the next version of the paper will have n=5 GCaMP6s, and n=4 GCaMP7b VTA DA axons, which will allow us to compare the activity of the two sensors in the familiar environment. The reviewer correctly pointed out that the sensors themselves could confound our results, and so they should not be pooled unless we can show they do not produce different signals in the axons. We will make this comparison and report the findings in the next version of the paper. If we find no significant differences, we will pool the data. If differences are detected, we will keep these axons separate for subsequent analysis and comparisons to LC axons.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.

      This is indeed true, and this suggested analysis could further support our conclusions regarding the LC novelty signal. For the next version of the paper, we will use the periods of immobility to analyze and isolate any novelty induced activity in LC axons. However, following exposure to the novel environment, mice spend much less time immobile, therefore there may not be sufficient periods of immobility close in time to the exposure to the novel environment (which is when the novelty signal occurs). We plan to analyze mouse behavior during the early exposure to the novel environment for immobility and check whether we have enough of this behavior to perform the suggested analysis.

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.

      This is a very good point. We agree that the VTA DA axons could be signaling temporal information, as we have previously shown that these axons also exhibit ramping activity when you average their activity by time to reward (Krishnan et. al., 2022). We will conduct this analysis on this dataset. We have not, however, conducted any experiments designed to separate out time from distance, such as the experiments conducted in Kim et. al., 2020. Therefore, we cannot determine whether this is due to proximity in space to reward or time to reward. We will clarify in our text that by proximity, we mean either place or time, and cannot conclude which feature of the experience drives the VTA axon signal.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Kim, HyungGoo R., Athar N. Malik, John G. Mikhael, Pol Bech, Iku Tsutsui-Kimura, Fangmiao Sun, Yajun Zhang, et al. A Unified Framework for Dopamine Signals across Timescales. Cell 183, no. 6 (2020).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.

      LC axon activity was recorded on a 3m track to match the track length from an experiment we recently published (Dong et al., 2021) in which mice were exposed to a novel 3m track while populations of CA1 pyramidal cells were recorded. In that paper we described the time course of place field formation on the novel track. We wanted to test if LC axons signaled novelty (as we hypothesized) and whether the time course of LC axon activity matched the time course of place field formation. We briefly discuss this in the Discussion section of this paper and hypothesize that LC axons in CA1 could open a window of plasticity in which new place fields can form.

      VTA axons were recorded on a 2m track (same VR tracks as LC axons were recorded on) to match another recent paper from our lab in which reward expectation was manipulated (Krishnan et al, 2022). In that study CA1 populations of pyramidal cells were recorded during the reward expectation experiment. To match the experience during recordings of VTA axons in CA1 to test how reward expectation may influence axon signaling along the track, we also used a 2m track. The idea was to check how VTA dopaminergic inputs to CA1 may influence CA1 population dynamics along the track.

      Although the tracks were identical for LC and VTA recordings for both the familiar and novel tracks in terms of visual cues and design, the track lengths are different (simply modulated by gain control of the rotary encoder). To account for this we normalized the lengths for our comparison analysis. This normalization allows for a direct comparison of the patterns of activity across the two types of axons, controlling for the potential confound introduced by the different track lengths. By adjusting the data to a common scale, we could assess the relative changes in activity levels at matched spatial bins, ensuring that any observed differences or similarities are due to the intrinsic properties of the axons rather than differences in track lengths. However, the different lengths do make the animal’s experience slightly different. This is somewhat offset by the observations in our study that none of the LC or VTA axon signals would be expected to be majorly influenced by variations in track length. For instance, LC axons are associated with velocity and a pre-motion initiation signal, neither of which would be influenced by track length. VTA axons are also associated with velocity, which would not influence a direct comparison to LC axon velocity signals as mice reach maximal velocity very rapidly along the track. VTA axons do ramp up in activity as they approach the reward zone, and this signal could be modulated by track length (or maybe not if the signal is encoding time to reward rather than distance). However, LC axons show no ramping to reward signals, so a comparison across axons recorded on different track lengths for this analysis is justified.

      However, to add rigor to comparisons of axon dynamics recorded along 2m and 3m tracks, we plan to plot axon activity of both sets of axons by time to reward, and actual (un-normalized) distance from reward.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):

      Summary:

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.

      The main findings were as follows:

      • In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.
      • VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.
      • In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.
      • In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.

      Strengths:

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.

      Weaknesses:

      (1)The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.

      (2) Some aspects of the methodology would benefit from clarification.<br /> First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have or will address each one.

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2)Sample size - We state both the number of ROIs and mice for each analysis. Wherever we state how many axons had a certain kind of activity, we will also state the number of mice we saw this activity in. For the next version of the paper, we plan to conduct a power analysis using G*power to assess the power of our sample sizes for statistical analysis.

      (3) Inclusion/exclusion criteria - Out of the 36 NET-Cre mice injected, 15 were never recorded for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, images were never conducted in 36 for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 11 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis.

      (4) Randomization - Already included in the paper is a description of random down sampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution.

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version.

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and related axon signaling.

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in detail in this version of the paper.

      (8) Experimental animals - We described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017)

      (9) Experimental procedure - Already listed in detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?

      A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.

      This is an error leftover from before we converted velocity from rotational units of the treadmill to cm/s. This will be corrected in the next version of the paper.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.

      This is a great point. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons. We will add a discussion point on this in the next version. This could also be the case for VTA axons, so will add a discussion point that the lack of novelty signaling seen in VTA axons could be due to reduced velocity masking this signal.

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.

      Mice receive their water reward through a waterspout that is immobile and positioned directly in front of their mouth (which is also immobile as they are head fixed) and water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the waterspout remains in the same place relative to the mouse, and the water reward is not delivered until they reach the end of the virtual track, there is nothing for the mice to detect. We will update the paper to make this clearer.

      Additionally, on the initial laps with no reward, the ramping activity is still present (Krishnan et al, 2022) indicating this activity is not directly related to the presence/absence of water but is instead caused by reward expectation.

      Reviewer #3 (Public Review):

      Summary:

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.

      Weaknesses:

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?

      In response to this comment, we will analyze the freezing ratios in DAT-Cre and NET-Cre mice separately. However, other members of the lab have seen the same result in other mouse strains (See Dong et al. 2021), so we do not expect to see a difference (but it is certainly worth checking).

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point here. To help avoid this problem, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice (i.e. only one LC axon was taken from each mouse to match the VTA dataset).

      However, in the next version of the paper we will also report the number of mice that we see a significant novel response in. We will also add the number of mice with significant activity for each of the measures in the familiar environment (e.g. how many mice had axons positively correlated with velocity).

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?

      As stated above, we will add the number of mice that had each activity type we reported here.

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?

      This is an important point to address. While we never varied the presentation order of the familiar vs novel environments, we did record the activity of LC axons in some of the mice in a dark environment (no VR cues) prior to exposure to the familiar environment. We will look at these axons to address whether they respond to initial exposure to the familiar environment. This will allow us to check whether they are responding to environmental change or novelty. We will add this analysis to the next version of the paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study assesses anatomical, behavioral, physiological, and neurochemical effects of early-life seizures in rats, describing a striking astrogliosis and deficits in cognition and electrophysiological parameters. The convincing aspects of the paper are the wide range of convergent techniques used to understand the effects of early-life seizures on behavior as well as hippocampal prefrontal cortical dynamics. While reviewers thought that the scope was impressive, there was criticism of the statistical robustness and number of animals used per study arm, as well as the lack of causal manipulations to determine cause-and-effect relationships. This paper will be of interest to neurobiologists, epileptologists, and behavioral scientists.

      We thank Joseph Gleeson as the Reviewing Editor and Laura Colgin as the Senior Editor for considering this revision of our manuscript for publication in eLife. We appreciate the positive acknowledgment of the study and the critical points raised by the reviewers. We have addressed all the excellent comments of the two reviewers, providing a detailed response for each comment. We believe that these revisions have significantly improved the quality and rigor of our study.

      We want to assure you that our experimental design was meticulously crafted, incorporating adequate control groups, and is grounded in prominent studies in systems neurophysiology focusing into early-life seizures effects, especially for capturing mild effects. We conducted statistical tests adhering to established norms and recommendations, ensuring a thorough and transparent description of the employed statistical methods. We welcome any specific suggestions to further improve this aspect.

      In fact, the concerns raised by the reviewers regarding statistical robustness may stem from a misunderstanding of the rat cohorts used in each experiment. Criticism was directed at the use of only 5 animals without a control group for acute electrophysiological recording. It is essential to clarify that this group served the sole purpose of confirming that the injection of lithium-pilocarpine would induce both behavioral and electrographic seizures. Importantly, this was a descriptive result, and no statistical test or further analysis was conducted with these data. In the revised manuscript, we have made adjustments to this description, aiming to eliminate any ambiguity, particularly addressing the issue of sample size in each experiment.

      Regarding the lack of causal manipulations, we fully agree that this approach would provide a deeper mechanistic understanding of our findings and is an essential next step. Still, developmental brain disturbances are linked to manifold intricate outcomes, so an initial observational exploration would offer insights about particular and nuanced relationships for following studies aimed at targeted interventions. In this context, our objective was to provide a comprehensive characterization of ELS effects to serve as a foundation for future research. While recognizing the relevance of causal manipulations, only a more sophisticated data analyses were able to reveal more complex aspects like specific multivariate associations and non-linear relationships that would not have been revealed by causally perturbing one or another factor at first. In the revised manuscript, we emphasized the limitation of lacking causal manipulations as well as the advantages of our approach. Also, we mentioned some possible targets for following perturbational investigations based on our findings.

      For a more detailed discussion on these matters, we invite you to review our response to reviewers.

      Reviewer 1

      In this paper, Ruggiero, Leite, and colleagues assess the effects of early-life seizures on a large number of anatomical, physiological, behavioral, and neurochemical measures. They find that prolonged early-life seizures do not lead to obvious cell loss, but lead to astrogliosis, working memory deficits on the radial arm maze, increased startle response, decreased paired pulse inhibition, and increased hippocampal-PFC LTP. There was a U-shape relationship between LTP and cognitive deficits. There is increased theta power during the awake state in ELS animals but reduced PFC theta-gamma coupling and reduced theta HPC-PFC coherence. Theta coherence seems to be similar in ACT and REM states in ELS animals while in decreases in active relative REM in controls.

      Strengths:

      The main strength of the paper is the number of convergent techniques used to understand how hippocampal PFC neural dynamics and behavior change after early-life seizures. The sheer scale, breadth, and reach of the experiments are praiseworthy. It is clear that the paper is a major contribution to the field as far as understanding the impact of early-life seizures. The LTP findings are robust and provide an important avenue for future study. The experiments are performed carefully and the analysis is appropriate. The paper is well-written and the figures are clear.

      We express our gratitude to Reviewer #1 for conducting a thoughtful and comprehensive review of our manuscript. We sincerely value both the constructive criticisms provided and your acknowledgment of the manuscript's strengths.

      Weaknesses:

      The main weakness of the paper is the lack of causal manipulations to determine whether prevention or augmentation of any of the findings has any impact on behavior or cognition. Alternatively, if other manipulations would enhance working memory in ELS animals, it would be interesting to see the effects on any of these parameters measured in the paper.

      We sincerely appreciate the insightful comments from Reviewer #1 regarding the potential benefits of including causal manipulations in our study. We wholeheartedly agree that such manipulations can provide a deeper understanding of the mechanistic underpinnings of the observed relationships and represent a crucial next step in our research trajectory.

      Our primary objective in this study was to establish a comprehensive framework through observational examinations, exploring intricate relationships across various neurobiological and behavioral variables in the aftermath of early-life seizures (ELS). By identifying these associations, our work aims to provide a foundation for future investigations that can delve into targeted interventions.

      While we acknowledge the importance of causal manipulations, we would like to underscore the advantages of our initial multivariate correlational study. Importantly, developmental brain disturbances have lasting impacts affecting multiple biological outcomes that may have intricate relationships between themselves. Firstly, although some neurobiological variables stood out from the comparisons of group means, this did not reveal some nuanced relationships within the data. The complexity of the relationships we uncovered, involving behavior, cognition, immunohistochemistry, plasticity, neurochemistry, and network dynamics, required a more elaborate analytical approach. Only through sophisticated data analysis techniques, we were able to dissect important peculiarities, such as the robust multivariate association between brain-wide astrogliosis and sensorimotor impairments, as well as non-linear relationships, such as the inverted-U relationship between plasticity and working memory. These nuances might not have been fully revealed through causal manipulations, since several variables are strongly related and consequently can affect several outcomes, leading to a false conclusion of direct causality.

      Nevertheless, we acknowledge the understatement of the limitation of lacking causal manipulations in our manuscript. To address this, we have included a dedicated section in the discussion highlighting this limitation. We emphasize the advantages of this exploratory phase, supported by a review of the literature on cause-and-effect studies that align with our findings. Additionally, we speculate on promising targets for future cause-and-effect studies based on our findings. For instance, we hypothesize that enhancing plasticity may improve working memory in control subjects, while attenuating plasticity might have a similar effect in ELS subjects. Furthermore, we propose that reactive astrogliosis and concurrent neuroinflammatory processes likely underlie sensorimotor changes in the ELS group. Lastly, we suggest that dopaminergic antagonism in the ELS group could normalize behavioral deficits, prevent the exaggerated LTP induction of the HPC-PFC pathway, reestablish the state-dependent network dynamics, and desensitize the dopaminergic response.

      [...]Also, I find the sections where correlations and dimensionality reduction techniques are used to compare all possible variables to each other less compelling than the rest of the paper (with the exception of the findings of U-shaped relationship of cognition to LTP). In fact, I think these sections take away from the impact of the actual findings.

      We appreciate the reviewer's feedback and would like to emphasize the significance of the multivariate analysis conducted in our study. Multivariate analysis extends beyond bivariate correlations and is the only type of analysis capable of comprehending the relation of data in a multidimensional way, offering a comprehensive approach to understanding complex relationships among multiple variables. By employing techniques such as principal component analysis (PCA), generalized linear models (GLM), and canonical correlation analysis (CCA), we aimed to unravel intricate patterns of covariance that explore how different variables collectively contribute to the observed outcomes and assess the impact of each independent variable (predictor) on the dependent variable (the variable to be predicted or explained). Importantly, it enables us to control for potential confounding factors by keeping all other variables constant.

      While we acknowledge that these sections may appear intricate, their inclusion is indispensable for a comprehensive understanding of the diverse variables associated with SE outcomes. We believe that these analyses offer valuable insights into the intricate dynamics of our study, providing a more holistic perspective on the altered spectrum induced by early-life seizures (ELS).

      Regarding the reviewer's observations about the impact of the U-shaped relationship between cognition and LTP, we have made graphical and textual adjustments to emphasize the significance of these findings, aiming to enhance their clarity and impact within the broader context of our research. We trust that these modifications contribute to a more compelling presentation of our results.

      […]Finally, the apomorphine section seemed to hang separately from the rest of the paper and did not seem to fit well.

      We appreciate the Reviewer #1 feedback on the apomorphine section. In order to address this point, we carefully rewrote our rationale before the results to clarify our hypothesis and chosen methodology. In our work, we performed the apomorphine experiment as a logical next step of previous data. We showed that ELS rats display REM-like oscillatory dynamics during active behavior, similar to genetically and pharmacologically hyperdopaminergic mice (Dzirasa et al., 2006). Furthermore, other results also indicated possible dopamine neurotransmission alterations, such as working memory deficits, hyperlocomotion, PPI deficits, aberrant HPC-PFC LTP, and abnormal PFC gamma coordination. Therefore, we hypothesized that ELS animals would present a state of hyperdopaminergic activity. Among the possible methodologies to investigate the hyperdopaminergic state, we choose the apomorphine sensitivity test, which is classically used and induces unambiguous behavior and neurochemical alterations in hyperdopaminergic rodents (Duval, 2023; Ellenbroek & Cools, 2002).

      Reviewer 1 (Recommendations For The Authors):

      (1) It would be useful to stain for other GABAergic interneuron markers such as somatostatin, VIP, CCK.

      (2) The authors refer to neuroinflammation but they are really referring to reactive astrogliosis. I would also suggest staining for microglial markers.

      (3) The duration of chronic electrographic seizures in ELS animals should also be calculated and presented.

      (4) Word usage: the authors frequently use the word "presents" when "demonstrates" would be more appropriate

      (1) We appreciate your insight into staining for other GABAergic interneuron markers such as somatostatin, VIP, CCK. While investigating additional interneuron types is indeed relevant, it was not the primary focus of this study for several reasons: 1) The overall neuron density, assessed through NeuN immunostaining, revealed no differences between controls and early life seizure (ELS) groups, even in brain regions susceptible to neuron death after SE (i.e., CA1). Therefore, differences in interneurons, which are more resistant to death in SE and constitute approximately 20% of the cells, are unlikely. 2) Among all interneuron subtypes, Parvalbumin-positive (PV+) interneurons represent a substantial population and are susceptible to various stressors. In the hippocampus, 24% of GABAergic neurons are PV+, whereas 14% are SST+, 10% are CCK+, and VIP+ are less than 10% (Freund and Buzsaki, 1996). Consequently, we considered PV+ interneurons to be a more sensitive subpopulation for evaluating the effects of SE. As they showed no significant difference, we do not believe that assessing smaller subtypes, such as VIP+ or CCK+ cells, would yield significant differences.

      (2) While we often see activated microglia in hippocampal sclerosis, these cells are only slightly increased in cases without hippocampal sclerosis (which are similar to our animals), as we previously published (Peixoto-Santos et al., 2012). Astrocytes are a better marker for the epileptogenic zone, as are increased in epileptogenic zones without neuron loss and are also important for controlling neuronal activity by neurotransmitter recycling and ion buffering. In fact, our present model is very similar to the mesial temporal lobe epilepsy patients with gliosis-only, which are characterized by only presenting increased reactive astrogliosis in the hippocampus, without cell loss, and also present changes in innate inflammatory response related to the presence of reactive astrocytes (Grote et al., 2023).

      (3) We have performed these calculations and added this information to the revised manuscript.

      (4) We thank the reviewer for the word usage recommendation. Indeed, we frequently used “present” throughout the manuscript to describe the observations and patterns the groups “exhibited” or “showed”. However, we believe this is truly not the most appropriate usage in the Discussion when we describe the multivariate latent factors, as we did not “present” them, but rather, we “demonstrated” their existence and significance through our analysis. We rewrote these sentences and hope this is the point the reviewer was referring to.

      References:

      Duval F. Systematic review of the apomorphine challenge test in the assessment of dopaminergic activity in schizophrenia. Healthcare. 2023 11 (1487): 1-11. doi: 10.3390/healthcare11101487.

      Dzirasa K, Ribeiro S, Costa R, Santos LM, Lin SC, Grosmark A, Sotnikova TD, Gainetdinov RR, Caron MG, Nicolelis MAL. Dopaminergic control of sleep-wake states. Journal of Neuroscience. 2006 26:10577–10589. doi:10.1523/JNEUROSCI.1767-06.2006.

      Freund TF, Buzsáki G. Interneurons of the hippocampus. Hippocampus. 1996;6(4):347-470. doi: 10.1002/(SICI)1098-1063(1996)6:4<347::AID-HIPO1>3.0.CO;2-I. PMID: 8915675.

      Ellenbroek BA & Cools AR. Apomorphine susceptibility and animal models for psychopathology: genes and environment. Behavior Genetics. 2002 32 (5): 349-361. doi: 10.1023/a:1020214322065.

      Grote A, Heiland DH, Taube J, Helmstaedter C, Ravi VM, Will P, Hattingen E, Schüre JR, Witt JA, Reimers A, Elger C, Schramm J, Becker AJ, Delev D. 'Hippocampal innate inflammatory gliosis only' in pharmacoresistant temporal lobe epilepsy. Brain. 2023 Feb 13;146(2):549-560. doi: 10.1093/brain/awac293. PMID: 35978480; PMCID: PMC9924906.

      Peixoto-Santos JE, Galvis-Alonso OY, Velasco TR, Kandratavicius L, Assirati JA, Carlotti CG, Scandiuzzi RC, Serafini LN, Leite JP. Increased metallothionein I/II expression in patients with temporal lobe epilepsy. PLoS One. 2012;7(9):e44709. doi: 10.1371/journal.pone.0044709. Epub 2012 Sep 18. Erratum in: PLoS One. 2016;11(7):e0159122. PMID: 23028585; PMCID: PMC3445538.

      Reviewer 2

      In this manuscript, the authors employ a multilevel approach to investigate the relationship between the hippocampal-prefrontal (HPC-PFC) network and long-term phenotypes resulting from early-life seizures (ELS). Their research begins by establishing an ELS rat model and conducting behavioral and neuropathological studies in adulthood. Subsequently, the manuscript delves into testing hypotheses concerning HPC-PFC network dysfunction. While the results are intriguing, my enthusiasm is tempered by concerns related to the logical flow

      We thank the reviewer for bringing attention to the logical flow of the manuscript. Given the diverse array of behavioral and neurobiological variables examined in our study obtained through various methods and measures, we utterly recognize the utmost importance of a clear and coherent logical flow to provide a comprehensive understanding of the overall narrative.

      Our goal was to articulate the neurobiological findings in a manner that underscores their convergence of mechanisms, revealing a cohesive relationship between early-life seizure, cognitive deficits, sensorimotor impairments, abnormal network dynamics, aberrant plasticity, neuroinflammation and dysfunctional dopaminergic transmission.

      Briefly, an outline of our narrative could be summarized in the highlights:

      (1) ELS induces sensorimotor alterations and working memory deficits.

      (2) ELS does not induce neuronal loss, so neurobiological underpinnings may be molecular and functional.

      (3) ELS induces brain-wide astrogliosis and exaggerated HPC-PFC long-term plasticity.

      (4) ELS does not induce neuronal loss, so neurobiological underpinnings may be molecular and functional.

      (5) Sensorimotor alterations are more correlated to astrogliosis, while cognitive deficits to altered HPC-PFC plasticity.

      (6) ELS-induced functional alterations may also be observable in freely moving subjects. ELS induces state-dependent alterations in the HPC-PFC network dynamics, such as increased hippocampal theta and abnormal PFC gamma coordination during behavioral activity.

      (7) ELS leads to REM-ACT similarity, previously reported in hyperdopaminergic mice, indicating dopaminergic dysfunction.

      (8) ELS exhibits altered dopaminergic transmission and behavioral sensitivity that mirror the initial sensorimotor findings.

      (9) The literature establishes an inverted-U relationship between dopamine and cognition and PFC plasticity, which may explain our finding of an inverted-U relationship between working memory and HPC-PFC LTP across CTRL and ELS rats.

      To address this concern, we have made revisions to enhance the logical flow, ensuring a more seamless transition between the different sections of the Results by presenting clearer links between observations and following investigations. We hope these changes contribute to a more straightforward rationale and easily understandable presentation of our hypotheses and results.

      Focus on Correlations: The manuscript primarily highlights correlations as the most significant findings. For instance, it demonstrates that ELS induces cognitive and sensorimotor impairments. However, it falls short of elucidating why these deficits are specifically linked to HPC-PFC synaptic plasticity/network. Furthermore, the manuscript mentions the involvement of other brain regions like the thalamus in the long-term outcomes of ELS based on immunohistochemistry data.

      Thank you for your insightful comments, which allowed us to provide further clarification on our study's focus and findings. Our primary goal was to delve into the electrophysiological alterations within the HPC-PFC pathway. The rationale behind this choice lies in the hypothesis that, even in the absence of significant neuronal loss, functional changes in circuits closely linked to the cognitive and behavioral aspects under investigation could be identified.

      While we concentrated our electrophysiological investigation on the HPC-PFC pathway due to its well-established functional correlates in existing literature, it is essential to highlight that our data reveal broader alterations in neural circuitry. Notably, we observed an increase in GFAP in the entorhinal cortex and thalamic reticular nucleus, along with changes in the dopaminergic release within the VTA-NAc pathway. These findings suggest that the impact of early-life seizures extends beyond the HPC-PFC circuit.

      While we recognize the relevance of other brain circuits in the outcomes of ELS, we argue for a specific role of the HPC-PFC circuit in the outcomes of ELS. We will detail the supporting evidence and arguments that specifically link the HPC-PFC function to our ELS-related observations in a later comment regarding the "overinterpretation" of the HPC-PFC role. To better convey these important nuances, we have made specific modifications to the results and in the discussion section to underscore the broader implications of our findings, providing a more comprehensive understanding of the study's scope and outcomes.

      […]This raises questions about the subjective nature and persuasiveness of the statistical studies presented.

      All statistical analyses were carefully applied based on the literature and following well-established precepts and precautions. Specifically, we constructed the experimental design for univariate inferential statistics for the data related to behavioral tests, synaptic plasticity, immunohistochemistry, oscillatory activity, and dopaminergic sensitization. However, we also submitted our data to multivariate statistical analysis, which is recommended in cases with a considerable amount of data, and intend to investigate possible hidden effects. In this situation, multivariate analyses are inherently exploratory due to the possibility of using multiple measurements for each phenomenon investigated. Nevertheless, their application is not subjective and follows the same statistical rigor as univariate analyses. We firmly believe that abstaining from exploring these data, would not reach the full potential of this analytical method in dissecting the multidimensional associations within our dataset. In order to eliminate any doubt regarding the objectivity in the choice and application of statistics, we carefully rewrote the methods, highlighting the details of statistical rigor even more.

      Sample Size Concerns: The manuscript raises concerns about the adequacy of sample sizes in the study. The initial cohort for acute electrophysiology during ELS induction comprised only 5 rats, without a control group. Moreover, the behavioral tests involved 11 control and 14 ELS rats, but these same cohorts were used for over four different experiments. Subsequent electrophysiology and immunohistochemistry experiments used varying numbers of rats (7 to 11). Clarification is needed regarding whether these experiments utilized the same cohort and why the sample sizes differed. A power analysis should have been performed to justify sample sizes, especially given the complexity of the statistical analyses conducted.

      We appreciate the reviewer's thoroughness and considerations regarding the sample sizes used in our study. The concerns raised about statistical robustness seem to stem from a lack of clarity in delineating the rat cohorts used in each experiment. It is encouraging to note that several studies in the field of neurophysiology, employing similar analyses, utilize a sample size similar to what was used in our research. The choice of the sample size was based on a thorough analysis of the existing literature, considering specific experimental demands, the complexity of employed techniques, and the need to achieve statistically robust results. In response to these concerns and to enhance clarity on the sample sizes, we have made several modifications (highlighted in red) in the text. Below, we provide details for each animal cohort utilized:

      Cohort 1 - Acute Electrophysiology

      The decision to use only 5 animals without a control group for acute electrophysiological recording aimed specifically to confirm that the injection of lithium-pilocarpine would induce both behavioral and electrographic seizures. It is crucial to note that this was a descriptive result and a methodological control of the ELS model. Besides, no statistical test or further analysis was conducted on these data. We maintain the belief that a group of 5 animals is sufficient to demonstrate that the protocol induces electrographic seizures, and introducing a control group was considered unnecessary to show that saline injection does not induce electrographic seizures.

      Cohort 2 - Behavior, LTP Recording, and Immunohistochemistry

      Initially, 14 (ELS) and 11 (CTRL) rats were used for behavior assessment. The reduction in sample size for LTP and immunohistochemistry experiments was influenced by practical challenges, including mortality during LTP surgery and issues with immunohistochemical staining that hindered a proper analysis for some animals.

      Cohort 3 - Chronic Freely-Moving Electrophysiology

      A new cohort of animals (n=6 and 9 for CTRL and ELS, respectively) was used specifically for freely-moving electrophysiological data.

      Cohort 4 - Behavioral Sensitization to Psychostimulants

      A fourth cohort was utilized for assessing behavioral sensitization to psychostimulants (CTRL n=15 and ELS n=14). The reduced sample size for neurotransmitter analysis (CTRL n=8 and ELS n=9) was a deliberate selection of a subsample to ensure a sufficient sample for quantification while maintaining statistical validity

      Overinterpretation of HPC-PFC Network Dysfunction: The manuscript potentially overinterprets the role of HPC-PFC network dysfunction based on the results.

      We appreciate the insight from Reviewer #2 regarding the potential overinterpretation of the role of the hippocampal-prefrontal cortex (HPC-PFC) network dysfunction in the various alterations observed after ELS.

      The significance of HPC-PFC plasticity and network function has been extensively documented concerning cognitive, affective, and sensorimotor functions, as well as in models of neuropsychiatric diseases. Our recent review (Ruggiero et al., 2021) compiles these findings. Specifically, the HPC-PFC network has been linked to spatial working memory through a series of causal and correlational studies conducted by Floresco et al. and Gordon et al. These findings make the HPC-PFC pathway a plausible candidate for underlying alterations associated with working memory, consistent with our observation of exaggerated HPC-PFC LTP associated with poorer performance in the ELS group. Regarding the immunohistochemical observations, we concur with Reviewer #2 that these findings suggest broader-scale brain alterations related to sensorimotor dysfunction beyond the HPC-PFC circuitry. Surely, we acknowledge that these large-scale alterations may underlie brain-wide network functional changes.

      In our network dynamics study arm, we investigated HPC-PFC oscillatory activity, allowing us to discuss potential relationships between abnormal plasticity (verified in the second study arm) and network dynamics. It is important to note that while there is some anatomical specificity to the LFPs recorded in the HPC and PFC, these activities may represent larger-scale limbic-cortical dynamics. The intermediate HPC exhibits a significant influence from both dorsal and ventral HPC, and the prelimbic PFC is intricately related to both hippocampal and thalamic oscillations exhibiting under-demand state-dependent synchrony. Additionally, the state maps used in our study were initially described to distinguish states at a global forebrain network level. Even in our past studies, we have described HPC-PFC patterns of network activity (Marques et al., 2022a) that later were found to represent a part of a brain-wide synchrony pattern (Marques et al., 2022b). However, most of our findings on oscillatory dynamics were centered around theta oscillations, a well-established brain-wide activity that originates and spreads from the hippocampus and are present in the HPC-PFC circuit during activity.

      In conclusion, we believe the correlations between HPC-PFC LTP and working memory, as well as the specific alterations of theta coordinated activity, support a particular role of the HPC-PFC network dysfunction in the effects of ELS. However, the brain-wide immunochemical alterations are plausible indications of larger-scale dysfunctional networks. To address this issue, we emphasized in the discussion of network findings that the immunohistochemical and neurochemical findings endorse the need to investigate ELS effects on larger networks.

      Notably, cognitive deficits are described as subtle, with no evidence of learning deficits and only faint working memory impairments. However, sensorimotor deficits show promise. Consequently, it's essential to justify the emphasis on the HPC-PFC network as the primary mechanism underlying ELS-associated outcomes, especially when enhanced LTP is observed. Additionally, the manuscript seems to sideline neuropathological changes in the thalamus and the thalamus-to-PFC connection. The analysis lacks a direct assessment of the causal relationship between HPC-PFC dysfunction and ELS-associated outcomes, leaving a multitude of multilevel analyses yielding potential correlations without easily interpretable results.

      We thank Reviewer #2 for the thorough review and insightful comments. To better grasp the context, it is crucial to consider this characterization within the scope of our experimental design and expected outcomes. Unlike epilepsy models involving adult animals or interventions causing pronounced neuronal loss and structural modifications, our study was intentionally designed to explore moderate behavioral alterations. In fact, the mild behavioral alterations observed in ELS models and the lack of neuronal loss guided our focus on investigating changes in HPC-PFC communication.

      While our observed cognitive deficits may be milder compared to certain models, it is imperative to underscore their robustness and clinical relevance. These findings have been consistently replicated globally across various experimental models, encompassing ELS induced by hyperthermia (Chang et al., 2003; Kloc et al., 2022), kainic acid (Statsfrom et al. 1993), flurothyl (Karnam et al., 2009a; 2009b), and hypoxia (Najafian et al., 2021; Hajipour et al., 2023). Mild cognitive deficits were also evident by other research groups using the pilocarpine model in P12 (Mikulecká et al., 2019; Kubová et al., 2013; Kubová et al., 2002). Furthermore, our group replicated the working memory deficit results using an alternative paradigm (the T-maze) and a different rat strain (Sprague Dawley), enhancing the reliability of our observations (D’Agosta et al., 2023).

      The clinical perspective gains importance, considering that cognitive effects of ELS may be less severe than those in patients with long-term epilepsy. In fact, the majority of patients with childhood epilepsy exhibit mild cognitive impairment as the most common grade of severity - more than two times the rate of severe cognitive impairment (Sorg et al., 2022). Investigating the mechanisms underlying these mild cognitive changes is crucial for shedding light on neurobiological aspects not fully understood, thereby expanding our comprehension of the consequences of ELS.

      We recognize the challenges associated with conducting causal experiments in neuroscience, especially in long-term and chronic alterations as seen in our model. Isolating modifications of specific activities is indeed intricate. However, it's essential to acknowledge that neuroscience progress has not solely relied on causal experiments but has significantly advanced through correlational observations. Our findings serve as a foundational step in comprehending the repercussions of ELS, proposing mechanisms and circuits that necessitate further in-depth dissection and study in the future. We have integrated these considerations into the discussion section of the manuscript to enhance clarity.

      Overall, while the manuscript presents intriguing findings related to the HPC-PFC network and ELS outcomes, it requires a more rigorous experimental design[…]

      We thank the reviewer for acknowledging our intriguing findings. Regarding the experimental design, we are confident that all the manuscript hypotheses, design, and execution of experiments were rigorously based on the literature and carried out with all necessary controls. As stated earlier, we constructed the experimental design for univariate inferential statistics and explored associations between variables using multivariate statistics. Specifically, we achieved a rigorously experimental design following a series of guidelines. First, the planning of the sample size in each experiment and their respective controls were based on mild effects from the ELS literature. As previously indicated, the only experiment with one group was just the description of the behavioral effects and electrographic seizures after the acute injection of lithium-pilocarpine. Given the exhaustive replication of these data in the ELS literature, this result was presented descriptively as a methodological control. Second, detailed descriptions of statistics were made in both methods and results, always indicating positive and negative results. Notably, the experimental designs used in the work do not correspond to any novelty or radicalization, strictly following the literature of the field. However, new indications and references about the experimental accuracy were added to the manuscript to resolve any doubts regarding objectivity.

      References:

      Chang YC, Huang AM, Kuo YM, Wang ST, Chang YY, Huang CC. Febrile seizures impair memory and cAMP response-element binding protein activation. Ann Neurol. 2003 Dec;54(6):706-18. doi: 10.1002/ana.10789. PMID: 14681880.

      D'Agosta R, Prizon T, Zacharias LR, Marques DB, Leite JP, Ruggiero RN. Alterations in hippocampal-prefrontal cortex connectivity are associated with working memory impairments in rats subjected to early-life status epilepticus. In: NEWROSCIENCE INTERNATIONAL SYMPOSIUM, 2023, Ribeirão Preto. Poster.

      Hajipour S, Khombi Shooshtari M, Farbood Y, Ali Mard S, Sarkaki A, Moradi Chameh H, Sistani Karampour N, Ghafouri S. Fingolimod Administration Following Hypoxia Induced Neonatal Seizure Can Restore Impaired Long-term Potentiation and Memory Performance in Adult Rats. Neuroscience. 2023 May 21;519:107-119. doi: 10.1016/j.neuroscience.2023.03.023. Epub 2023 Mar 28. PMID: 36990271.

      Karnam HB, Zhou JL, Huang LT, Zhao Q, Shatskikh T, Holmes GL. Early life seizures cause long-standing impairment of the hippocampal map. Exp Neurol. 2009 Jun;217(2):378-87. doi: 10.1016/j.expneurol.2009.03.028. Epub 2009 Apr 2. PMID: 19345685; PMCID: PMC2791529.

      Karnam HB, Zhao Q, Shatskikh T, Holmes GL. Effect of age on cognitive sequelae following early life seizures in rats. Epilepsy Res. 2009 Aug;85(2-3):221-30. doi: 10.1016/j.eplepsyres.2009.03.008. Epub 2009 Apr 22. PMID: 19395239; PMCID: PMC2795326.

      Kubová H, Mareš P. Are morphologic and functional consequences of status epilepticus in infant rats progressive? Neuroscience. 2013 Apr 3;235:232-49. doi: 10.1016/j.neuroscience.2012.12.055. Epub 2013 Jan 7. PMID: 23305765.

      Kloc ML, Marchand DH, Holmes GL, Pressman RD, Barry JM. Cognitive impairment following experimental febrile seizures is determined by sex and seizure duration. Epilepsy Behav. 2022 Jan;126:108430. doi: 10.1016/j.yebeh.2021.108430. Epub 2021 Dec 10. PMID: 34902661; PMCID: PMC8748413.

      Kubová H, Mares P, Suchomelová L, Brozek G, Druga R, Pitkänen A. Status epilepticus in immature rats leads to behavioural and cognitive impairment and epileptogenesis. Eur J Neurosci. 2004 Jun;19(12):3255-65. doi: 10.1111/j.0953-816X.2004.03410.x. PMID: 15217382.

      Marques DB, Ruggiero RN, Bueno-Junior LS, Rossignoli MT, and Leite JP. Prediction of Learned Resistance or Helplessness by Hippocampal-Prefrontal Cortical Network Activity during Stress. The Journal of Neuroscience. 2022a 42 (1): 81-96.. https://doi.org/10.1523/jneurosci.0128-21.2021.

      Marques DB, Rossignoli MT, Mesquita BDA, Prizon T, Zacharias LR, Ruggiero RN and Leite JP. Decoding fear or safety and approach or avoidance by brain-wide network dynamics abbreviated. bioRxiv. 2022b https://doi.org/10.1101/2022.10.13.511989.

      Mikulecká A, Druga R, Stuchlík A, Mareš P, Kubová H. Comorbidities of early-onset temporal epilepsy: Cognitive, social, emotional, and morphologic dimensions. Exp Neurol. 2019 Oct;320:113005. doi: 10.1016/j.expneurol.2019.113005. Epub 2019 Jul 3. PMID: 31278943.

      Najafian SA, Farbood Y, Sarkaki A, Ghafouri S. FTY720 administration following hypoxia-induced neonatal seizure reverse cognitive impairments and severity of seizures in male and female adult rats: The role of inflammation. Neurosci Lett. 2021 Mar 23;748:135675. doi: 10.1016/j.neulet.2021.135675. Epub 2021 Jan 28. PMID: 33516800.

      Ruggiero RN, Rossignoli MT, Marques DB, de Sousa BM, Romcy-Pereira RN, Lopes-Aguiar C and Leite JP. Neuromodulation of Hippocampal-Prefrontal Cortical Synaptic Plasticity and Functional Connectivity: Implications for Neuropsychiatric Disorders. Frontiers in Cellular Neuroscience. 2021 15 (October): 1–23. https://doi.org/10.3389/fncel.2021.732360.

      Sorg AL, von Kries R, Borggraefe I. Cognitive disorders in childhood epilepsy: a comparative longitudinal study using administrative healthcare data. J Neurol. 2022 Jul;269(7):3789-3799. doi: 10.1007/s00415-022-11008-y. Epub 2022 Feb 15. PMID: 35166927; PMCID: PMC9217877.

      Stafstrom CE, Chronopoulos A, Thurber S, Thompson JL, Holmes GL. Age-dependent cognitive and behavioral deficits after kainic acid seizures. Epilepsia. 1993 May-Jun;34(3):420-32. doi: 10.1111/j.1528-1157.1993.tb02582.x. PMID: 8504777.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      This is a short but important study. Basically, the authors show that α-synuclein overexpression's negative impact on synaptic vesicle recycling is mediated by its interaction with E-domain containing synapsins. This finding is highly relevant for synuclein function as well as for the pathophysiology of synucleinopathies. While the data is clear, functional analysis is somewhat incomplete.

      (1) The authors should present a clearer dissociation of endocytosis and exocytosis under the various conditions they study. They should quantify the rate of rise and decay of pHluorin signals. 2. In addition, I strongly recommend a few additional experiments with and without a vATPase inhibitor such as bafilomycin to estimate the relative effects on exo- vs. endocytosis. As the authors are aware bafilomycin will mask the re-acidification /endocytosis component, thus revealing pure exocytosis and thus enabling quantification of endocytosis with minimal contamination from exocytosis.

      In the revised version, we analyzed and quantified exocytosis and endocytosis separately, with bafilomycin experiments, as the reviewer suggested (new data, Fig. 1- Fig. Supp. 1A-B). Overexpression of human alpha-synuclein only attenuated exocytosis in neurons that also expressed synapsins (WT neurons and synapsin TKO neurons transduced with synapsin Ia). In parallel, we also examined endocytosis by calculating the time-constant of the decay in the fluorescence of sypHy during the endocytotic phase (Fig. 1- Fig. Supp. 1C-E). Previous studies have shown that after brief stimulus-trains – like those used in our study (20Hz/300AP) – most endocytosis occurs after the cessation of stimulation 1. Expression of human alpha-synuclein did not alter the endocytosis time-constant in any of our experiments. To summarize, the interaction of alpha-synuclein with the synapsin E domain was required for alpha-synuclein induced attenuation of exocytosis, but not endocytosis.

      Reviewer #2

      ...The paper will be improved significantly if additional experiments are added to expand and provide a more mechanistic understanding of the effect of α-syn and the intricate interplay between synapsin, α-syn, and the SV. For an enthusiastic reader, the manuscript as it looks now with only 3 figures, ends prematurely. Some of the experiments above or others could complement, expand and strengthen the current manuscript, moving it from a short communication describing the phenomenon to a coherent textbook topic. Nevertheless, this work provides new and exciting evidence for the regulation of neurotransmitter release and its regulation by synapsin and α-syn.

      (1) Did the authors try to attach E-domain for example to synapsin Ib and restore α-syn inhibition with synapsin Ib-E?

      This is an interesting idea, but in previous studies, we found that synapsin Ib does not associate with synaptic vesicles2, so it will not be present at the right location to be able to restore alpha-synuclein induced synaptic attenuation. We have also seen that this mis-localization alters synaptic properties (unpublished).

      (2) Was the expression level of Synapsin-IaScrE examined and compared to WT Synapsin-Ia in Fig 3?

      Yes, this data is now shown in Fig. 3-Fig. Supp. 1.

      (3) Were SVs dispersed in α-syn overexpression as predicted?

      We interpret the reviewer’s question and reasoning as follows. If alpha-synuclein binds to the E-domain of synapsin, a prediction in the alpha-synuclein over-expression scenario is that the overabundance of alpha-synuclein molecules would bind to and sequester the E-domain synapsins away from synaptic vesicles. In the absence of E-domain synapsins, the synaptic-vesicle clustering effects of synapsins would be lost, and there would be dispersion of synaptic vesicles. We tested this prediction, which is now shown in an additional figure (new data, Fig. 4). Indeed, the AAV-mediated over-expression of alpha-synuclein leads to a dispersion of synaptic vesicles, and this dispersion is dependent on synapsins Ia and Ib, but not IIa and IIb (please see Fig. 4D-E in the revised manuscript). Appropriate text is also added, starting with “Previous studies have shown that loss of all synapsins...” presents this data and interprets it.

      (4) How does this study coincide with the effects of α-syn on fusion pore and endocytosis? This should be at least discussed. It is also possible that the effects of α-syn on endocytosis might affect the results as if endocytosis is affected, SVs number and distribution will be also affected.

      It is difficult to reconcile our data with the idea that alpha-synuclein facilitates fusion-pore opening, as proposed by the Edwards lab 3. In fact, its difficult to reconcile this concept with their own previous data, showing that alpha-synuclein over-expression attenuates SV-recycling 4. As mentioned above, modulation of endocytosis does not seem to be a major factor in our experiments, though this does not rule out a physiologic role for alpha-synuclein in endocytosis, since all our experiments are based on over-expression paradigms. Future experiments looking at phenotypes after acute alpha-synuclein knockdown may provide more clarity. In any case, there are many purported roles of alpha-synuclein, and this is now mentioned in the last paragraph (starting with Additionally, -syn has been implicated…”

      (5) What happened after stimulation when synapsin is detached from SV, does α-syn continues to be linked to it?

      The fate of alpha-synuclein after stimulation is unclear in our experiments. Previous experiments suggest that while both synapsin and alpha-synuclein detach from the SV cluster during stimulation, synapsin returns to synapses while alpha-synuclein does not 5. However, our more recent experiments (unpublished) suggest that the activity-induced dispersion of alpha-synuclein might be phosphorylation-dependent, and that over-expression of alpha-synuclein may not be the best setting to evaluate protein dispersion. We hope to answer this question more rigorously using alpha-synuclein knock-in constructs.

      (6) The experiment with E-domain fused to syPhy assumes that α-syn will still be bound to the SV. So how does α-syn inhibit ST?

      The goal of this experiment was to force the synapsin E-domain to be in a location where it would normally be present – i.e. surface of the synaptic vesicle – by tagging it to sypHy (sypHy-E), and ask if this forced-retention would be sufficient to reinstate the alpha-synuclein mediated attenuation of SV-recycling (as shown in Fig. 3F, it does). Please note that the sypHy-E in these experiments does target to the synapses (new data, Fig. 3-Fig. Supp. 2D). In this context, we are not sure what the reviewer means by “So how does a-syn inhibit synaptic transmission?” We don’t think that alpha-synuclein needs to unbind from the SVs in order to inhibit synaptic transmission. Overall, we think that alpha-synuclein needs to cooperate with synapsins to perform its function, but as mentioned above and in the manuscript, the precise role of alpha-synuclein in this process is still unclear.

      (7) An interesting experiment will be the expression of the isolated E-domain and examining blockage of α-syn inhibition and disruption of synapsin- α-syn interaction. Have the authors examined it as was done in other models?

      We did do the experiment where we only over-expressed the isolated synapsin E-domain in neurons. We were thinking that perhaps the E-domain would have a dominant-negative effect on SV-clustering, as it did in the lamprey and other model-systems, where the E-peptide was directly injected into the axon. However, we found that in cultured hippocampal neurons, the over-expressed E-domain behaves like a soluble protein and is not enriched in synapses (see new data, Fig. 3-Fig. Supp. 2B). Also, the over-expressed E-domain cannot reinstate the synaptic attenuation induced by alpha-synuclein (new data, Fig. 3-Fig. Supp. 2C), likely because the E-domain does not target to synapses. Actually, this is why we did the syPhy-E domain experiment in the first place, to ensure that the E-domain was in the right location to have an effect.

      (8) A schematic model/scheme providing a mechanistic view of the interplay between the proteins is essential and can improve the paper.

      The only model we can confidently make right now would be stick-figures showing the site where alpha-synuclein C-terminus binds to synapsin, which is obviously not very insightful. As noted above (and in the revised version), several different functions have been attributed to alpha-synuclein, and the precise role of alpha-synuclein/synapsin interactions in regulating the SV-cycle is unclear. We hope to create a better model after getting some more data from us and our colleagues working on this challenging problem.

      References

      (1) Kononenko NL & Haucke V. (2015) Molecular mechanisms of presynaptic membrane retrieval and synaptic vesicle reformation. Neuron 85, 484-496.

      (2) Gitler D, Xu Y, Kao H-T, Lin D, Lim S, Feng J, Greengard P & Augustine GJ. (2004) Molecular Determinants of Synapsin Targeting to Presynaptic Terminals. J. Neurosci. 24, 3711-3720.

      (3) Logan T, Bendor J, Toupin C, Thorn K & Edwards RH. (2017) α-Synuclein promotes dilation of the exocytotic fusion pore. Nat Neurosci 20, 681-689.

      (4) Nemani VM, Lu W, Berge V, Nakamura K, Onoa B, Lee MK, Chaudhry FA, Nicoll RA & Edwards RH. (2010) Increased expression of alpha-synuclein reduces neurotransmitter release by inhibiting synaptic vesicle reclustering after endocytosis. Neuron 65, 66-79.

      (5) Fortin DL, Nemani VM, Voglmaier SM, Anthony MD, Ryan TA & Edwards RH. (2005) Neural activity controls the synaptic accumulation of alpha-synuclein. J Neurosci 25, 10913-10921.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1: I would have preferred to see more figures with brain images showing the cellular abundance maps and the atrophy maps. Without being able to see these figures, it's difficult for the reader to 1) validate the atrophy patterns or 2) gain intuition about how the cellular abundance maps vary across the brain. The images in Figure 1C give a small preview, but I'd like to see these maps in their entirety on the brain surface or axial image slices.

      (1) We added brain surface visualization plots of the voxel-wise cellular abundance maps to Figure 1 (lateral, dorsal, and ventral views of both hemispheres). To illustrate how their spatial distributions are associated with brain tissue damage, in Figure 2, we have also added brain surface visualizations of regional values from the atrophy t-statistic maps for the thirteen neurodegenerative conditions and the cell-type map most strongly associated with each condition. These plots allow us to observe variability across the cell-type density and atrophy maps, as well as to visually validate and compare how the patterns vary across the brain.

      Reviewer 1: FTD is an umbrella category for a family of distinct clinical syndromes with different atrophy patterns. It doesn't seem a good idea to take the average of all subjects in this group to form a single atrophy map. Instead, different average maps for each syndrome should be provided.

      (2) Considering the heterogeneity of clinical FTD syndromes, we addressed the reviewers' concerns about using the averaged atrophy map across all patients with an FTD diagnosis. As suggested, we accessed different atrophy maps for each major variant of clinical FTD, including behavioral FTD (n = 70), as well as the semantic (n = 36) and nonfluent variants of primary progressive aphasia (n = 30). These maps are based on data from the participants from the same dataset of the Frontotemporal Lobar Degeneration Neuroimaging Initiative (FTLDNI) that we originally used. Similar to our previous results using the atrophy map averaged over all FTD patients, the analysis showed significant associations of atrophy patterns with cell type densities in all three major variants (see Figure 3A). Notably, these new findings offer insights into specific differences in spatial vulnerability of different cell-types across the variants of FTD, each characterized by unique symptoms, clinical manifestations, and atrophy patterns. In response to these additions, we have updated all figures, results, and interpretations accordingly.

      Reviewer 2: In the abstract, the list of neurodegenerative disorders should be edited: frontotemporal dementia is an umbrella clinical syndrome, not a neurodegenerative disorder. Frontotemporal lobar degeneration (FTLD) is a neurodegenerative disorder, and many tauopathies are FTLDs. While the authors grab their definitional classes from various sources (i.e., published cohort, and other studies), the reader fatigues to understand the population that is being assessed.

      (3) To address potential confusion arising from the inclusion of atrophy maps from FTLD patients across two different studies, stratified based on both clinical and pathological criteria, we added clarifications regarding the assessed population and the used definitions. We used the term FTD when addressing the clinical syndromes, and the term FTLD was employed when referencing the histologically confirmed neurodegenerative pathologies. In addition, we added details on the diagnostic criteria employed for participant recruitment in the FTLDNI cohort, which data we used for atrophy maps in clinical subtypes of FTD. Lastly, throughout the text and within the figures, we systematically refined the nomenclature for FTLD pathological types, categorizing them based on their known definitions used in literature and type of proteinaceous inclusions (FTLD- 3-repeat and 4-repeat tauopathies and FTLD-TDP types A and C).

      Reviewer 1: The results section contains perhaps too much interpretation. While the information that's provided serves as an interesting review (e.g., the discussion of the blood-brain barrier), the discussion may be a better place for this.

      (4) We removed sentences with excessive interpretation but insisted on including those outlining the fundamental functions of cell types and their literature-based relevance to neurodegenerative diseases in the Results section, clarifying the significance of our findings to the readers.

      Reviewer 2: The authors based their methodology on the use of a deconvolutional cell classifier; however, do not extensively recognize that their data on gene expression are based on normal brain levels rather than on diseased ones.

      (5) We acknowledged that the gene expression data is based on normal human brain levels in figure titles and all sections of the paper (Introduction, Results, Discussion, Methods) to remind the readers that the analysis shows how changes in gray matter tissue in diseased brains correlates with healthy reference levels of cellular density.

      Reviewer 2: More information in the text needs to be provided regarding the method used to infer gene expression levels at non-sampled brain locations. The reader should not be forced to read reference 40 or investigate the methods section. Figure 1 schematics do not sufficiently explain the used method.

      (6) We added clarifications/references about the used Gaussian progress regression for imputing gene expression (Results and figure titles).

      Reviewer 2: Also, while predicted levels are uniquely based on patterns of brain atrophy, it is not possible to know whether this strategy is generalizable to all diseases (for instance, it is known that pure DLB, PD and ALS are not associated with extensive brain atrophy), or even adequately comparable between subtypes of diseases within the same class (e.g., different forms of FTLD). The authors do not acknowledge that only data based on true neuropathological assessment may prove whether their findings are true.

      (7) Although diagnoses of most dementia conditions used in our study were histologically confirmed, we added acknowledgement about the importance of neuropathological assessment (Discussion section).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      One criticism the authors have made of previous studies was that they have not distinguished between 'tonic' and 'phasic' LC activity and could not demonstrate 'time- locked phasic firing'. This has not been achieved in the present report, as an examination of Fig 1 C,D and 2 C,D shows. Previous reports in rats and monkeys, using unit recording in rats and monkeys clearly show that the latency of LC 'phasic' responses to salient or behaviorally relevant stimuli are in the range of tens of milliseconds, with a very short duration, often followed by a long-lasting inhibition. This kind of temporal precision concerning the phasic response cannot be gleaned from the time scale shown in the Figures (assuming the time scale is in seconds). We can discern a long-lasting increase in tonic firing level for the more salient stimuli (Fig 1C) (although the authors state in the discussion that "we did not observe obvious changes in tonic LC-HPC activity). This calcium imaging methodology as used in the present experiments can give us a general idea of the temporal relation of LC response to the stimulus, but apparently does not afford the millisecond resolution necessary to capture a phasic response, at least as the data are presented in the Figures.

      While we understand the reviewer’s concern with our use of the terms phasic and tonic, we believe we have represented them as accurately as possible given our data. Unfortunately, the distinction between tonic and phasic activity is somewhat arbitrary, in that there is no strict definition, to our knowledge, of the exact parameters that activity must fall into to be categorized as tonic or phasic. While it is true that phasic LC activity has typically been studied with electrophysiological approaches that afford millisecond resolution and that observed phasic responses are often extremely short, there are numerous differences between those studies and this one. Most prominently, the stimuli used to elicit a phasic response are generally extremely short (often 1ms or less) and therefore generate extremely short phasic responses (Aston-Jones and Bloom, 1981a; Aston-Jones and Cohen, 2005), but this is not to say that phasic responses might not be longer in response to a longer lasting stimulus. Moreover, tonic activity is reported to track with behavioral state on the order of dozens of seconds to minutes and is not reported in response to specific stimuli (Aston-Jones and Bloom, 1981b). The “phasic” responses we report generally decay in less than 5 seconds in our fluorescence signals. Given the slow time course of decay for GcAMP6s (a single action potential can generate a response that lasts 3 or more seconds (Chen et al., 2013)) and the GRAB sensors (GRAB-DA2h τoff = 7.2s (Sun et al., 2020)), the underlying neural responses would have lasted for a significantly shorter period. Therefore, we believe the responses we observed are much more consistent with phasic responses to long-lasting sensory stimuli (20-second tone, 1-2 second shock), than with increases in tonic activity associated with a change in behavioral state. Finally, regardless of whether these responses are exactly the same as previously reported phasic responses, our photometry and optogenetics studies provide insight about a form of LC activity that is fundamentally different than what can be gleaned from much slower dialysis, lesion, and pharmacology studies. Nonetheless, we added the following to the discussion section to clarify the limitations of our interpretation:

      “…given their relatively short duration and the fact that they are elicited specifically by salient sensory stimuli, we refer to these responses as “phasic responses.” However, because of the comparatively slow dynamics of fluorescent sensors relative to electrophysiology, we cannot rule out the possibility that these responses are somehow different in nature to previously reported phasic LC responses. Thus, some care must be taken in conflating the characteristics and/or function of the relatively short-lasting responses presented here and the extremely fast phasic responses to very brief (μs to ms) sensory stimuli reported previously.”

      Much of the data presented here can be regarded as 'proof of concept' i.e. demonstrating that Photometric imaging of calcium signalling yields similar results concerning LC responses to salient or behaviorally relevant stimuli as has been previously reported using electrophysiological unit recording. The role of dopamine as the principal player in hippocampaldependent learning also corroborates previous reports.

      Although some of the data presented in this study could be seen as “proof of concept” or “confirmatory” of previous results, we believe this work extends previous results by showing 1) the importance of hippocampal dopamine to aversive hippocampus-dependent learning and trace fear conditioning specifically, 2) that LC responses are important at the specific times of learning (i.e. CS/US onset/termination), and 3) that dopamine in the hippocampus is likely important for learning in a way that is not tied to prediction error or memory consolidation.

      No attempt was made to address the important current question of the modular organisation of Locus Coeruleus, although the authors recognize the importance of this question and propose future experiments using their methodology to record simultaneously in several LC projection sites.

      While we do recognize the importance of this modular organization, which is addressed in the discussion as the reviewer mentions, experiments addressing this organization are beyond the scope of the present study. Future work will address the possibility that LC projections to different regions show differential responses during learning.

      The phasic-tonic issue has not been resolved by these experiments. Phasic responses of LC single units are short-latency, short-lived (just 3-4 action potentials), and followed by a relatively long refraction period. Multiunit responses will have a more jittery latency and longer-lasting response (but still only tens to hundreds of milliseconds). Your figures clearly show long-lasting increases in tonic firing levels, even though you state the contrary in the discussion. Therefore, I strongly recommend removing the word 'phasic' from the title.

      Addressed above.

      Yohimbine, the Alpha 2 antagonist, administered systemically, induces a massive increase in the rate of firing of LC cells (through blockade of autoinhibition at the cell body level at terminals). I guess its effect on the receptor 'backbones' overrides the massive release of NE and/or DA, but you might want to mention this; also include the dose of all drug treatments.

      Yes, yohimbine’s effect on the GRAB-NE signal is somewhat counter-intuitive given the known effect of yohimbine on norepinephrine levels. However, our result is consistent with previous reports (Feng et al., 2019). We have added the following to the results section to clarify:

      “Thus, even though yohimbine is known to increase NE levels in the hippocampus (Abercrombie et al., 1988), its blockade effect on the GRAB-NE sensor should result in a decrease in fluorescence after administration.”

      Include time scale units on all figures (I assume it is seconds in Figs 1 &2).

      Thank you for pointing out this issue, we have added units on all figures.

      • Is it possible to have a better quality example of staining? Fig 1 B in particular is very blurry. Is the yellow double staining? Please indicate. Most of the GCaMP seems to be outside the main area of TH staining. Fig 4 B is much nicer--and it looks morphologically, like LC.

      Unfortunately, the GcAMP6s staining was very dim in our hands and resulted in relatively blurry images. Yes, in this case, yellow is double staining. Regarding the morphology, the GCaMP image is taken from a sagittal section and the shape of expression is consistent with images of LC in the sagittal plane. However, given the quality of our ChR2 images, we are confident in the specificity of expression in these mice.

      Reviewer 2

      The claim that dopamine release in dHPC is caused by LC neurons is not directly tested. Unfortunately, the most critical experiment for the claims that dopamine release comes from LC during conditioning is not tested. A lack of dopamine signal in dHPC caused by inhibition of LC during TFC would show this. It is indeed an interesting observation that chemoegenetic activation of LC causes dopamine release in the dHPC. However, in the absence of concurrent VTA inhibition or lesion, it remains a possibility that the dopamine release is mediated through indirect actions on other dopamine-expressing neurons. The authors do a good job of arguing against this interpretation in the discussion, and the literature seems appropriate for this. However, the title is still an overstatement of the data presented in this study.

      We agree with the reviewer’s comments. As indicated in the discussion, it is possible that hippocampal dopamine is increased indirectly via LC projections to dopaminergic midbrain regions. We believe that our title is consistent with this possibility. When phasic stimulation was delivered to the LC, dopamine levels increased in the hippocampus and trace fear conditioning was enhanced. The observed increase in dopamine could be direct or indirect. As the reviewer notes, we argue for the former in the discussion section. A number of experiments would be needed to show this directly (record dopamine while: inhibiting the LC, inhibiting the VTA, stimulating LC while simultaneously inhibiting the VTA etc.) and we are planning to do these in the future.

      The primary alternative interpretations of the phasic activation experiment are whether only stimulation to the cue events (both on and off), or whether only stimulation to the shock. Thus this experiment would benefit from additional data showing either a no shock control, to show that enhanced activity of the LC to the tone is not inherently aversive, or manipulations to the tone but not to the shock.

      Future work will explore whether the contribution of LC to learning is primarily due to its activation during the CS or the US. However, this is beyond the scope of this manuscript.

      Specificity of the GRAB-NE and GRAB-DA sensors should be either justified through additional experiments testing the alternative antagonist (i.e. GRAB-NE CNO+eticloprode / GRAB-DA CNO+yohimbine) or additional citations that have tested this already. It is critical for the claims of the paper to show that these sensors are specific to dopamine or norepinephrine.<br /> Although sensitivity is a potential concern, these sensors have been thoroughly vetted and used by many groups since their generation. In particular, the creators of these sensors provided extensive data showing their specificity. The GRAB-DA sensor is ~10 fold more sensitive to DA than to NE (Sun et al., 2020, cited 239 times) and the GRAB-NE sensor is ~37 fold more sensitive to NE than to DA (Feng et al., 2019, cited 371 times).

      The role of dopamine in prediction error was tested through a series of conditions whereby the shock was presented either signaled (i.e. predicted), or not. However, another way that prediction error is signaled is through the absence of an expected outcome. Admittedly it might not be possible to observe a decrease in dopamine signaling with this methodology.

      Although this is a strong point, given that the study is not primarily focused on error prediction and the low likelihood of observing the typically small decrease in signaling during expected outcome omission, we feel that additional error prediction studies are beyond the scope of this manuscript. However, further experiments as suggested by the reviewer could prove interesting in future studies.

      The difference between Fig. 6E and 6H needs to be clarified. What is shown in Fig. 6E is that the response to the shock decreases through experience (i.e. by the 10th trial). However in Fig 6H, there is no difference between signaled and signaled shock, but this is during conditioning, and not after learning (based on my understanding of the methods, line 482).

      We are not sure we fully understand what point of clarification the reviewer is asking for. However, we have clarified in the methods that the signaled vs unsignaled shock experiment took place in animals that had already been trained on TFC. Thus, all of the trials took place after the animals had learned the tone-shock association. Therefore, although the drop in shock-response could be taken as an indicator of a prediction-error like signal, all the other data points to this not being the case (no change in tone response over training, no difference in signaled vs. unsignaled responses after training).

      Unless I missed it, at no point in the manuscript is the number of subjects described. Please add the n per experiment within each section describing each experiment in the methods (Behavioral procedures). Some more details in the photometry statistical analysis would be helpful. For example, what is the n per group for every data set that is presented? How many trials per analysis?

      We thank the reviewer for pointing this out. Animal numbers have been added in the methods section in the Behavioral Procedures, Optogenetics, and Drugs sub-sections and in the figure legends. Trial numbers are included in these sections and all trials were used for analysis.

      What is the difference in experimental procedure between Fig. 2D and Fig. 3B? It seems that they are the same, and yet the LC response to the conditioned CS is not.

      Fig. 3B is simply the Day 1 data from Fig 2D presented at a different scale because the shock response is included in Fig. 3B which necessitates a larger scale on both axes. Close inspection of the figures will show that the shapes of these two curves and the error around them is the same, but the different scaling obfuscates this slightly.

      Typo in the legend of Figure 2 - D should be E.

      Thank you, we have corrected this.

      • Anatomical localization of the virus injections, and more importantly the fiber placements, is not shown. Including this information helps with replication and understanding where exactly the observations were made in dHPC to contrast with prior studies.

      Representative examples are included in the manuscript in figure 1B, 3F, 4B, and 5B.

      Reviewer 3

      While the optogenetic study was lovely, a control using the same stimulation but delivered at different time points would have been a good addition to show how critical the neural signal at tone onset, tone offset, and shock is.

      We agree that it would be interesting in future studies to delineate the specific times when LC stimulation produces a learning enhancement. It could be that LC activity is most important during one specific time period (eg. just during shock) or that all three periods of activation are required. It would also be useful to know whether stimulation at other times during learning can produce an enhancement given the potentially long-lasting effects of dopamine on HPC plasticity and learning.

      Justification for the focus on D1 receptors was lacking.

      We chose to focus on D1 receptors because previous studies have shown that these receptors are critical for memory formation or consolidation in the hippocampus. We have added a sentence justifying this in the results section.

      “To test whether dopamine is required for trace fear memory formation, we administered the dopamine D1 receptor antagonist SCH23390 (0.1mg/kg) 30 minutes before training, as D1/D5 receptors have previously been shown to be critical for other types of hippocampus dependent memory and plasticity (Frey et al., 1990; Huang and Kandel, 1995; O’Carroll et al., 2006; Wagatsuma et al., 2018).”

      The manuscript provides convincing evidence that the neural signal is not an error- correcting one by including a predicted (by a tone) and unpredicted shock. One possibility is that perhaps the unpredicted shock could be predicted by the context. Some clarification on the behavioural procedures would help understand if indeed the unsignaled shock could be predicted by the context or not.

      Mice always exhibit freezing in the training environment, so the context is definitely a predictor of shock. However, the tone is a much better predictor because it is always followed by shock while the mice spend a large amount of time in the context without being shocked. This is demonstrated by the fact that the same procedure used in the current experiments consistently produces more tone fear than context fear (Wilmot et al., 2019). While we did not do long-term memory tests here, we assume the same dissociation occurred as it has been observed very consistently across studies (Chowdhury et al., 2005; Kitamura et al., 2014; Wilmot et al., 2019). Nonetheless, it is possible that a difference between signaled and unsignaled groups was obscured by the context. We should note however, that differences between dopaminergic responses to cued and uncued rewards and aversive outcomes has been observed and these animals were also trained in the same context (Eshel et al., 2016; Matsumoto and Hikosaka, 2009; Pan et al., 2005; Schultz, 1998). Therefore, we believe this experiment does differentiate the observed dopamine response in the hippocampus from previously reported VTA dopamine prediction error signaling.

      Figure 2 - tone termination in Tone only group - no change? Stats?

      Thank you for pointing out this omission. We have added the stats to the figure legend. Although the response to tone termination decreased numerically, it did not change significantly across days. This is one point we may seek to clarify in future studies, as the difference between tone onset and termination responses is unexpected. Given the relatively small responses, it’s possible future studies with stronger signal (eg. GcAMP8) may find differences in the tone termination response across training days. This is one of the reasons we focused primarily on the responses to tone onset and shock in the rest of the manuscript.

      Fig 4 data - stimulation at time incongruent with the signal as a control for the timing of stim.

      This is addressed above.

      Fig 5 - GRAB-NE - yohimbine seems to suppress the signal below the vehicle. Not the case for GRAB-DA. Is this sig? post-hoc stats?

      Yes, this does appear to be the case for GRAB-NE, and would not be entirely surprising given that there is likely a baseline level of NE (and dopamine) in the hippocampus that produces some degree of baseline fluorescence in the vehicle group. This signal could be reduced/abolished by blocking the sensor and preventing this baseline level of NE from binding and producing fluorescence. This may not be the same for the GRAB-DA for a variety of reasons – different sensor binding affinities, different baseline neurotransmitter levels, potentially non-equivalent drug doses, etc. Because of the large number of pairwise comparisons in this data (18), we did not make post-hoc pairwise comparisons.

      Shock response curve - lines 466-474 - some explanation of what the pseudorandom order of shock presentation means.

      We have added the following explanation to this section:

      “…pseudorandom order, such that the shocks did not occur in ascending or descending order or follow the same pattern in each block,…”

      Line 126 - the extinction came out of the blue, it needs some introduction such as a statement that the animals were exposed to extinction training following conditioning.

      We have added the following earlier in that same paragraph:

      “On the second and third days, mice underwent extinction trials in which no shocks were administered.”

      References in Response

      Abercrombie ED, Keller RW, Zigmond MJ. 1988. Characterization of hippocampal norepinephrine release as measured by microdialysis perfusion: Pharmacological and behavioral studies. Neuroscience 27:897–904. doi:10.1016/0306-4522(88)90192-3

      Aston-Jones G, Bloom FE. 1981a. Nonrepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to non-noxious environmental stimuli. Journal of Neuroscience 1:887–900. doi:10.1523/JNEUROSCI.01-08-00887.1981

      Aston-Jones G, Bloom FE. 1981b. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci 1:876–886. doi:10.1523/JNEUROSCI.01-08-00876.1981

      Aston-Jones G, Cohen JD. 2005. AN INTEGRATIVE THEORY OF LOCUS COERULEUSNOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. Annual Review of Neuroscience 28:403–450. doi:10.1146/annurev.neuro.28.061604.135709

      Chen T-W, Wardill TJ, Sun Y, Pulver SR, Renninger SL, Baohan A, Schreiter ER, Kerr RA, Orger MB, Jayaraman V, Looger LL, Svoboda K, Kim DS. 2013. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499:295–300. doi:10.1038/nature12354

      Chowdhury N, Quinn JJ, Fanselow MS. 2005. Dorsal hippocampus involvement in trace fear conditioning with long, but not short, trace intervals in mice. Behavioral Neuroscience 119:1396–1402. doi:http://dx.doi.org/10.1037/0735-7044.119.5.1396

      Eshel N, Tian J, Bukwich M, Uchida N. 2016. Dopamine neurons share common response function for reward prediction error. Nat Neurosci 19:479–486. doi:10.1038/nn.4239

      Feng J, Zhang C, Lischinsky JE, Jing M, Zhou J, Wang H, Zhang Y, Dong A, Wu Z, Wu H, Chen W, Zhang P, Zou J, Hires SA, Zhu JJ, Cui G, Lin D, Du J, Li Y. 2019. A Genetically Encoded Fluorescent Sensor for Rapid and Specific In Vivo Detection of Norepinephrine. Neuron 102:745-761.e8. doi:10.1016/j.neuron.2019.02.037

      Frey U, Schroeder H, Matthies H. 1990. Dopaminergic antagonists prevent long-term maintenance of posttetanic LTP in the CA1 region of rat hippocampal slices. Brain Research 522:69–75. doi:10.1016/0006-8993(90)91578-5

      Huang YY, Kandel ER. 1995. D1/D5 receptor agonists induce a protein synthesis-dependent late potentiation in the CA1 region of the hippocampus. Proceedings of the National Academy of Sciences 92:2446–2450. doi:10.1073/pnas.92.7.2446

      Kitamura T, Pignatelli M, Suh J, Kohara K, Yoshiki A, Abe K, Tonegawa S. 2014. Island Cells Control Temporal Association Memory. Science 343:896–901. doi:10.1126/science.1244634

      Matsumoto M, Hikosaka O. 2009. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459:837–841. doi:10.1038/nature08028

      O’Carroll CM, Martin SJ, Sandin J, Frenguelli BG, Morris RGM. 2006. Dopaminergic modulation of the persistence of one-trial hippocampus-dependent memory. Learning & memory 13:760–769.

      Pan W-X, Schmidt R, Wickens JR, Hyland BI. 2005. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network. J Neurosci 25:6235–6242. doi:10.1523/JNEUROSCI.1478-05.2005

      Schultz W. 1998. Predictive Reward Signal of Dopamine Neurons. Journal of Neurophysiology 80:1–27. doi:10.1152/jn.1998.80.1.1

      Sun F, Zhou J, Dai B, Qian T, Zeng J, Li X, Zhuo Y, Zhang Y, Wang Y, Qian C, Tan K, Feng J, Dong H, Lin D, Cui G, Li Y. 2020. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 17:1156–1166. doi:10.1038/s41592-02000981-9

      Wagatsuma A, Okuyama T, Sun C, Smith LM, Abe K, Tonegawa S. 2018. Locus coeruleus input to hippocampal CA3 drives single-trial learning of a novel context. Proceedings of the National Academy of Sciences 115:E310–E316. doi:10.1073/pnas.1714082115

      Wilmot JH, Puhger K, Wiltgen BJ. 2019. Acute Disruption of the Dorsal Hippocampus Impairs the Encoding and Retrieval of Trace Fear Memories. Frontiers in Behavioral Neuroscience 13. doi:10.3389/fnbeh.2019.00116

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors conducted two tasks at 300 days of separation. First, a social perception task, where Ps responded whether a pictured person either deserved or needed help. Second, an altruism task, where Ps are offered monetary allocations for themselves and a partner. Ps decide whether to accept, or a default allocation of 20 dollars each. The partners differed in perceived merit, such that they were highly deserving, undeserving, or unknown. This categorisation was decided on the basis of a prisoner's dilemma game the partner played beforehand. "Need" was also manipulated, by altering the probability that the partner must have their hand in cold water at the end of the experiment and this partner can use the money to buy themselves out. These two tasks were conducted to assess the perception of need/merit in the first instance, and how this relates to social behaviour in the second. fMRI data were collected alongside behavioural.

      The authors present many analyses of behaviour (including DDM results) and fMRI. E.g., they demonstrate that they could decode across the mentalising network whether someone was making a need or deserving judgement vs control judgement but couldn't decode need vs deserving. And that brain responses during merit inferences (merit - control) systematically covaried with participants' merit sensitivity scores in the rTPJ. They also found relationships between behaviour and rTPJ in the altruism task. And that merit sensitivity in the perception task predicted the influence of merit on social behaviour in the altruism task.

      Strengths:

      This manuscript represents a sensible model to predict social perceptions and behaviours, and a tidy study design with interesting findings. The introduction introduced the field especially brilliantly for a general audience.

      Response: We are pleased that the reviewer found the model sensible and the findings interesting! Below, we respond to each of the reviewer’s comments/critiques.

      Weaknesses: (1) The authors do acknowledge right at the end that these are small samples. This is especially the case for the correlational questions. While the limitation is acknowledged at the end, it is not truly acknowledged in the way that the data are interpreted. I.e. much is concluded from absent relationships, where the likelihood of Type II error is high in this scenario. I suggest that throughout the manuscript, authors play down their conclusions about absence of effects.

      Response: We agree with the reviewer that the limitation of small samples should be adequately reflected in the interpretation of the data. We have therefore added cautionary language to the interpretation of the correlational effects in several places of the revised manuscript. For example, we now state: “However, this absence of effects for need ought to be interpreted with caution, given the comparatively small sample size.” (pg. 33) and “As mentioned above, we cannot rule out the possibility that null findings may be due to the comparatively small sample size and should be interpreted cautiously (also see discussion)” (pg. 34-35).

      (2) I found the results section quite a marathon, and due to its length I started to lose the thread concerning the overarching aims - which had been established so neatly in the introduction. I am unsure whether all of these analyses were necessary for addressing the key questions or whether some were more exploratory. E.g. it's unclear to me what one would have predicted upfront about the decoding analyses.

      Response: We acknowledge and share the reviewer’s concern about the length of the results section and potential loss of clarity. Regarding the decoding analyses, we want to clarify that they were conducted as a sanity check to compare against the results of the univariate analysis. We didn’t have apriori hypotheses regarding these supplemental decoding analysis. We have clarified this issue in the revised version of the manuscript and moved the decoding analyses fully to the supplemental material to streamline the main text. The remaining results reported in the manuscript are indeed all based on apriori, key questions (unless specified otherwise, for example, supplemental analyses for other regions of interest for the sake of completeness). The only exception is the final set of results (Neural markers of merit sensitivity predict merit-related behavioral changes during altruistic choice) which represent posthoc tests to clarify the role of activation in the right temporoparietal junction (rTPJ) in merit-related changes in other-regard in altruistic decisions. While we acknowledge that this is a complex paper, after careful consideration we couldn’t identify any other parts of the results section to remove or report in the supplemental material.

      (3) More specifically, the decoding analyses were intriguing to me. If I understand the authors, they are decoding need vs merit, and need+merit vs control, not the content of these inferences. Do they consider that there is a distributed representation of merit that does not relate to its content but is an abstracted version that applies to all merit judgements? I certainly would not have predicted this and think the analyses raise many questions.

      Response: We thank the reviewer for sharing their thoughts on the decoding analyses and agree that this set of analyses are intriguing, yet raise additional questions, such as the neural computations required to assess content. However, we wish to clarify that the way we view our current results is very much analogous to results obtained from studies of perception in other fields. For example, in the face perception literature, it is often observed that the fusiform face area is uniformly more active, not only when a face (as opposed to an object) is on the screen, but when a compound stimulus consistent of features of a face and other features (e.g. of objects) is on the screen, but participants are instructed to attend to and identify solely the face. Moreover, multivariate activity in the FFA (but not univariate activity) is sufficient to decode the identity of the face. We view the results we report in the manuscript as more akin to the former types of analyses, where any region that is involved in the computation is uniformly more active when attention is directed to judgment-specific features. Unfortunately, the present data are not sufficient to properly answer the latter questions, about which areas enable decoding of specific intensity or identity of merit-related content. Follow-up experiments with a more optimized design are needed. Although interesting, we thus refrain from further discussing the decoding analyses in the manuscript to avoid distracting from the main findings based on the univariate comparison of brain responses observed while participants make merit or need inferences in the social perception task.

      Reviewer #2 (Public Review):

      When people help others is an important psychological and neuroscientific question. It has received much attention from the psychological side, but comparatively less from neuroscience. The paper translates some ideas from a social Psychology domain to neuroscience using a neuroeconomically oriented computational approach. In particular, the paper is concerned with the idea that people help others based on perceptions of merit/deservingness, but also because they require/need help. To this end, the authors conduct two experiments with an overlapping participant pool:

      (1) A social perception task in which people see images of people that have previously been rated on merit and need scales by other participants. In a blockwise fashion, people decide whether the depicted person a) deserves help, b) needs help, and c) whether the person uses both hands (== control condition).

      (2) In an altruism task, people make costly helping decisions by deciding between giving a certain amount of money to themselves or another person. How much the other person needs and deserves the money is manipulated.

      The authors use a sound and robust computational modelling approach for both tasks using evidence accumulation models. They analyse behavioural data for both tasks, showing that the behaviour is indeed influenced, as expected, by the deservingness and the need of the shown people. Neurally, the authors use a block-wise analysis approach to find differences in activity levels across conditions of the social perception task (there is no fMRI data for the other task). The authors do find large activation clusters in areas related to the theory of mind. Interestingly, they also find that activity in TPJ that relates to the deservingness condition correlates with people's deservingness ratings while they do the task, but also with computational parameters related to helping others in the second task, the one that was conducted many months later. Also, some behavioural parameters correlate across the two tasks, suggesting that how deserving of help others are perceived reflects a relatively stable feature that translates into concrete helping decisions later-on.

      The conclusions of the paper are overall well supported by the data.

      Response: We thank the reviewer for the positive evaluation of our study and the comprehensive summary of our main findings. We would like to clarify, though, that we did originally collect fMRI data for the independent altruism task. Unfortunately, due to COVID-19-related interruptions, only 25 participants from the sample that performed the social perception task also completed the fMRI altruism task (see pg. 18). Given the limited sample size and noise level of fMRI data, we moved anything related to the neuroimaging data of the altruism task to the supplemental material (see Note S7) and decided to focus solely on the behavior of the altruism task to address our research objectives. We apologize for any confusion.

      (1) I found that the modelling was done very thoroughly for both tasks. Overall, I had the impression that the methods are very solid with many supplementary analyses. The computational modelling is done very well.

      Response: We are pleased that the reviewer found the computational model sensible.

      (2) A slight caveat, however, regarding this aspect, is that, in my view, the tasks are relatively simplistic, so even the complex computational models do not do as much as they can in the case of more complex paradigms. For example, the bias term in the model seems to correspond to the mean response rate in a very direct way (please correct me if I am wrong).

      Response. We agree that the Bias term relates to mean responding (although it is not the sole possibility: thresholds and starting default biases can also produce changes in mean levels of responding that, without the computational model, are not possible to dissociate). However, we think that the primary value of this parameter comes not from the analysis of the social judgment task (where the reviewer is correct that the bias relates in a quite straightforward way to the mean response rate), but in the relationship of this parameter to the un-contextual generosity response in the altruism task. Here, we find that this general bias term relates not to overall generosity, but rather to the overall weight given to others’ outcomes, a finding that makes sense if the tendency to perceive others as deserving overall yields an increase in overall attention/valuation of their outcomes. Thus, a simple finding in one task relates to a more nuanced finding in another. However, we agree it is important to acknowledge the point raised by the reviewer, and now do so on pg. 20: “It is worth noting that the Bias parameters are strongly associated with (though not the sole determinant of) the mean response rate.”

      (3) Related to the simple tasks: The fMRI data is analysed in a simple block-fashion. This is in my view not appropriate to discern the more subtle neural substrates of merit/need-based decision-making or person perception. Correspondingly, the neural activation patterns (merit > control, need > control) are relatively broad and unspecific. They do not seem to differ in the classic theory of mind regions, which are the focus of the analyses.

      Response: The social perception task is modified from a well-established social inference task (Spunt & Adolphs, 2014; 2015) designed to reliably localize the mentalizing network in the brain. As such, we acknowledge that it is not optimally designed to discern the intrinsic complexities of social perception, or the specific appraisals or computations that yield more or less perception (of need or merit) in a given context. Instead, it was designed to highlight regions that are more generally recruited for performing these social perceptions/inferences.

      We heartily agree with the reviewer that it would be interesting and informative to analyze this task in a trial-wise way, with parametric variation in evidence for each image predicting parametric variation in brain activity. Unfortunately, the timing of this task is not optimal for this kind of an analysis, since trials were presented in rapid and blocked fashion. We were also limited in the amount of time we could devote to this task, since it was collected in conjunction with a number of other tasks as part of a larger effort to detail the neural correlates of social inference (reported elsewhere). Thus, we were not able to introduce the kind of jittered spacing between trials that would have enabled such analysis, despite our own wish to do so. We hope that this work will thus be a motivator for future work designed more specifically to address this interesting question, and now include a statement to this effect on pgs. 2223: “Future research may reveal additional distinctions between merit and need appraisals in trial-wise (compared to our block-wise) fMRI designs.”

      References:

      Spunt, R. P. & Adolphs, R. Validating the Why/How contrast for functional MRI studies of Theory of Mind. Neuroimage 99, 301-311, doi:10.1016/j.neuroimage.2014.05.023 (2014).

      Spunt, R. P. & Adolphs, R. Folk explanations of behavior: a specialized use of a domain-general mechanism. Psychological Science 26, 724-736, doi:10.1177/0956797615569002 (2015).

      (4) However, the relationship between neural signal and behavioural merit sensitivity in TPJ is noteworthy.

      Response: We agree with this assessment and thank the reviewer for their positive assessment; we feel that linking individual differences in merit sensitivity with variance in TPJ activity during merit judgments is one of the key findings of the study.

      (5) The latter is even more the case, as the neural signal and aspects of the behaviour are correlated across subjects with the second task that is conducted much later. Such a correlation is very impressive and suggests that the tasks are sensitive for important individual differences in helping perception/behaviour.

      Response: Again, we share the reviewer’s impression that this finding is more noteworthy for appearing in tasks separated both by considerable conceptual/paradigmatic differences, and by such a long temporal distance. These findings make us particularly excited to follow up on these results in future research.

      (6) That being said, the number of participants in the latter analyses are at the lower end of the number of participants that are these days used for across-participant correlations.

      Response: We fully agree with this assessment. Unfortunately, COVID-related disruptions in data collection, as well as the expiration of grant funds due to the delay, severely limited our ability to complete assessments in a larger sample. Future research needs to replicate these results in a larger sample. We comment on this issue in the discussion on pg. 40. If the editor or reviewer has suggestions for other ways in which we could more fully acknowledge this, we would be happy to include them.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to provide a neurocomputational account of how social perception translates into prosocial behaviors. Participants first completed a novel social perception task during fMRI scanning, in which they were asked to judge the merit or need of people depicted in different situations. Secondly, a separate altruistic choice task was used to examine how the perception of merit and need influences the weights people place on themselves, others, and fairness when deciding to provide help. Finally, a link between perception and action was drawn in those participants who completed both tasks.

      Strengths:

      The paper is overall very well written and presented, leaving the reader at ease when describing complex methods and results. The approach used by the author is very compelling, as it combines computational modeling of behavior and neuroimaging data analyses. Despite not being able to comment on the computational model, I find the approach used (to disentangle sensitivity and biases, for merit and need) very well described and derived from previous theoretical work. Results are also clearly described and interpreted.

      Response: We thank the reviewer for their positive comments regarding presentation, approach, and content.

      Weaknesses:

      My main concern relates to the selection of the social perception task, which to me is the weakest point. Such weakness has been also addressed by the same authors in the limitation section, and related to the fact that merit and need are evaluated by means of very different cues that rely on different cognitive processes (more abstract thinking for merit than need). I wonder whether and how such difference can bias the overall computational model and interpretation of the results (e.g. ideal you vary merit and need to leave all other aspects invariant).

      Response: We agree with the reviewer on the importance of future research to more fully unpack the differences in this task, and develop better ways to manipulate need and merit in more comparable fashion. However, we point out that the issue of differences in abstractness of cues for need and merit does not actually seem to have a strong influence on the parameters retrieved by the computational model. Participants seem to be equally sensitive to BOTH merit and need information, despite that information deriving from different sources, as evidenced by the fact that the magnitude of the sensitivity parameters for need and merit in the social judgment task were nearly identical, and not statistically distinguishable. Nor were other parameters related to non-decision time or threshold statistically different (see Supplemental Table S2). If our results were driven purely by differences in the difficulty or abstractness of these judgments, we would have expected to see some evidence of this in the computational model, in the form of longer non-decision times, higher thresholds, or both. We do not. Likewise, the neural underpinnings evoked by both need and merit perceptions in this task (in the mentalizing brain network) were comparable. This is not to say that there aren’t real differences in the cues that might signal these quantities in our social perception task - just that there is little direct evidence for this difference in computational parameters or evoked brain responses, and thus it is unlikely that our results (which rely on an analysis of computational parameters) are driven solely by computational model biases, or the inability of the model to adequately assess participant sensitivity to need as opposed to merit.

      A second weakness is related to the sample size which is quite small for study 2. I wonder, given that study 2 fRMI data are not analyzed, whether is possible to recover some of the participants' behavioral results, at least the ones excluded because of bad MR image quality.

      Response: We fully agree with the reviewer that increasing the sample size for the cross-task correlations would be desirable. Unfortunately, the current sample size already presents the maximum of ‘usable’ data; the approach suggested by the reviewer won’t affect the sample size. We used all participants whose behavioral data in the altruism task suggested they were performing the task in good faith and conscientiously.

      Finally, on a theoretical note, I would elaborate more on the distinction of merit and need. These concepts tap into very specific aspects of morality, which I suspect have been widely explored. At the moment I am missing a more elaborate account of this.

      Response: Need and merit are predominantly studied in separate lines of research (Molouki & Bartels, 2020) so there is relatively little theoretical research on the distinction between the two. Consequently, Siemoneit (2023) states that the relation between the concepts of need and merit in allocative distributions remains diffuse. To emphasize the distinct concepts of morality in the introduction we have now added to pg. 3: “Need and deservingness (merit) are two distinct principles of morality. The need principle involves distributing resources to those who require them, irrespective of whether they have earned them, while the "merit principle" focuses on allocating resources based on individuals' deservingness, regardless of their actual need (Wilson, 2003).”

      One of the added values of our paper to the research literature is in adding to the clarification of computational and neural underpinnings of broad concepts like merit and need. To highlight the latter point, we have added the following statement on pg. 5 to the manuscript: “Examining need and merit concurrently in this task will also help clarify the computational and neural underpinnings of related, but distinct concepts, distinguishing between them more effectively.”

      References:

      Molouki, S., & Bartels, D. M. (2020). Are future selves treated like others? Comparing determinants and levels of intrapersonal and interpersonal allocations. Cognition, 196, 104150.

      Siemoneit, A. (2023). Merit first, need and equality second: hierarchies of justice. International Review of Economics, 70(4), 537-567.

      Wilson, C. (2003). The role of a merit principle in distributive justice. The Journal of ethics, 7, 277-314.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I acknowledge the difficulty with respect to recruitment, especially in the age of covid, but is it possible for the authors to collect larger samples for their behavioural questions via online testing? Admittedly, I'm sure they don't want to wait 300 days to have the complete dataset, but I would be in favour of collecting a sample in the hundreds on these behavioural tasks, completed at a much shorter separation (if any). I believe this would strengthen the authors' conclusions considerably if they could both replicate the effects they have and check these null effects in a sample where they could draw conclusions from them. Indeed, Bayesian stats to provide evidence for the null would also help here.

      Response: We share the reviewer’s desire to see these results replicated (ideally in a sample of hundreds of participants). We have seriously considered the possibility of trying to replicate our results online, even before submitting the first version of the paper. However, it is difficult to fully replicate this paradigm online, given the elaborate story and context we engaged in to convince participants that they were playing with real others, as well as the usage of physical pain (Cold Pressor Task) for the need manipulation in the altruism task. Moreover, given comments by this reviewer that the results are already a little long, adding a new, behavioral replication would likely only add to the memory burden for the reader. We have thus opted not to include a replication study in the current work. However, we are actively working on a replication that can be completed online, using a modified experimental paradigm and different ways to manipulate need and merit. Because of the differences between that paradigm and the one described here, which would require considerable additional exposition, we have opted not to include the results of this work in the current paper. We hope to be able to publish this work as a separate, replication attempt in the future.

      Given the difficulty of wading through the results section while keeping track of the key question being answered, I would suggest moving any analyses that are less central to the supplementary. And perhaps adding some more guiding sentences at the start and end of each section to remind the reader how each informs the core question.

      Response: We deliberated for quite some time about what results could be removed, but in the end, felt that nearly all results that we already described need to be included in the paper, since each piece of the puzzle contributes to the central finding (relating parameters and behavior to neural and choice data across two separate tasks). However, we did move the decoding analysis results to the supplemental (see point below). We also take the reviewers point that the results can be made clearer. We thus have worked to include some guiding sentences at the start and end of sections to remind the readers how each analysis informs the core questions.

      I think it needs unpacking more for the reader what they should conclude from the significant need+merit vs control decoding analyses, and what they would have expected in terms of cortical representation from the decoding analyses in general.

      Response: We agree with the reviewer that given the decoding results position in the main manuscript it would need unpacking. After considering the reviewer's prior suggestion, we have reevaluated the placement of these supplemental results. Consequently, we have relocated it to the supplemental materials, as it was deemed less relevant to directly addressing the core research questions in the main manuscript. On pg. 23, the main manuscript now only states “We also employed supplemental multivariate decoding analyses (searchlight analysis 85-87), as commonly used in social perception and neuroscience research 7,58,82,88,89, corroborating our univariate findings (see Supplemental Note S6, Supplemental Table S10).”

      Reviewer #2 (Recommendations For The Authors):

      (1) I would suggest moving information on how the computational models were fitted to the main text.

      Response: The computational models are a key element of the paper and we deliberated about the more central exposure of the description of how the models were fitted in the main manuscript. However, we are concerned about the complexity and length of the article, which requires quite a lot from readers to keep in mind (as also commented on by reviewer 1). Those readers who are particularly interested in details of model fitting can still find an extensive discussion of the procedures we followed in the supplements. We thus have opted to retain the streamlined presentation in the main manuscript. However, if the editor feels that including the full and extensive description of model fitting in the main paper would significantly improve the flow and exposition of ideas, we are happy to do so.

      (2) For the fMRI analyses: Could it be worth analysing the choices in the different conditions? They could be modelled as a binary regressor (yes/no) and this one might be different across conditions (merit/need/hands). Maybe this won't work because of the tight trial timeline, but it could be another avenue to discern differences across fMRI conditions.

      Response: We thank the reviewer for this interesting suggestion! Unfortunately, the block design and rapid presentation of stimuli within each condition make it challenging to distinguish the different choices (within or across conditions). While we see the merit in the suggested analytical approach (in fact, we discussed it before the initial submission of the article), it would require some modifications of the task structure (e.g., longer inter-trial-intervals between individual stimuli) and an independent replication fMRI study. We were not able to have such a long inter-trial interval in the original design due to practical constraints on the inclusion of this paradigm in a larger effort to examine a wide variety of social judgment and inference tasks. We hope to investigate this kind of question in greater detail in future fMRI work.

      (3) The merit effects seem to be more stable across time than the need conditions. Would it be worthwhile to test if the tasks entailed a similar amount of merit and need variation? Maybe one variable varied more than the other in the task design, and that is why one type of effect might be stronger than the other?

      Response: We thank the reviewer for drawing attention to this important point. We used extensive pilot testing to select the stimuli for the social perception task, ensuring an overall similar amount of need and merit variation. For example, the social perception ratings of the independent, normative sample suggest that the social perception task entails a similar amount of need and merit variation (normative participant-specific percentage of yes responses for merit (mean ± standard deviation: 53.95 ± 13.87) and need (45.65 ± 11.07)). The results of a supplemental paired t-test (p = 0.122) indicate comparable SD for need and merit judgments. Moreover, regarding the actual fMRI participant sample, Figure S3 illustrates comparable levels of variations in need and merit perceptions (participant-specific percentage of yes responses for merit (56.70 ± 11.91) and need (48.69 ± 10.81) in the social perception task). Matching the results for the normative sample, the results of a paired t-test (p = 0.705) suggest no significant difference in variation between need and merit judgments. With respect to the altruism task, we manipulated the levels of merit and need externally (high vs. low).

      Reviewer #3 (Recommendations For The Authors):

      (1) It would be good to provide the demographics of each remaining sample.

      Response: We appreciate the attention to detail and agree with the reviewer’s suggestion. We have now added the demographics for each remaining sample to the revised manuscript.

      (2) The time range from study 1 to study 2, is quite diverse. Did you use it as a regressor of no interest?

      Response: We thank the reviewer for this interesting suggestion. We have examined this in detail in the context of our cross-task analyses (i.e., via regressions and partial correlations). Interestingly, variance in the temporal delay between both tasks does not account for any meaningful variation, and results don’t qualitatively change controlling for this factor.

      For example, when we controlled for the delay between both separate tasks (partial correlation analysis), we confirmed that variance in merit sensitivity (social perception task) still reflected meritinduced changes in overall generosity (altruism task; p = 0.020). Moreover, we confirmed that variance in merit sensitivity reflected individuals’ other-regard (p = 0.035) and self-regard (p = 0.040), but not fairness considerations (p = 0.764) guiding altruistic choices. Regarding people’s general tendency to perceive others as deserving, we found that the link between merit bias (social perception task) and overall other-regard (p = 0.008) and fairness consideration (p = 0.014) (altruism task) holds when controlling for the time range (no significant relationship between merit bias and self-regard, p = 0.191, matching results of the main paper).

      We refer to these supplemental analyses in the revised manuscript on ps. 33 and 35: “Results were qualitatively similar when statistically controlling for the delay between both tasks (partial correlations).”

      (3) Why in study 1 a dichotomous answer has been used? Would not have been better (also for modeling) a continuous variable (VAS)?

      Response: We appreciate the reviewer's thoughtful feedback. In Study 1, opting for a dichotomous response format in the social perception task (Figure 1a) was a deliberate methodological choice. This decision, driven by the study's model requirements, aligns with the common use of a computational model employing two-alternative forced choices ("yes" and "no") as decision boundaries. While drift– diffusion models for multiple-alternative forced-choice designs exist, our study's novel research questions were effectively addressed without their complexity. Finally, our model cannot accept continuous response variables as input unless they are transformed into categorical variables.

      (4) In the fMRI analyses, when you assess changes in brain activity as a function of merit, I would control for need (and the other way round), to see whether such association is specific.

      Response: Regarding the reviewer’s suggestion on controlling for need when assessing changes in brain activity as a function of merit, and vice versa, we would like to clarify the nature of our fMRI analyses in the social perception task. Our focus is on block-wise assessments (need vs. control, merit vs. control, need vs. merit blocks, following the fMRI task design from which our social perception task was modified from). We don’t assess changes in brain activity as a function of the level of perceived merit or need (i.e., “yes” vs. “no” trials within or across task blocks). Blocks are clearly defined by the task instruction given to participants prior to each block (i.e., need, merit, or control judgments). Thus, unfortunately, given the short inter-stimulus-intervals of each block, the task design is not optimal to implement the suggested approach.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1, in both the public review and recommendations to authors, raises the important question of generalizability of the new technique to other brain areas, to analysis with sorters other than Kilosort, and in the absence of reference data. Specifically, how can experimenters working in brain areas other than visual cortex understand if the tracking is functioning, and set the parameters in the tracking pipeline.

      We agree that generalizability of the tracking procedure is a serious issue, especially with respect to other brain areas with varying degrees of measured waveform preservation over time. As the number of potential recording conditions is combinatorial to experimentally test, we instead address these issues in the manuscript by providing a general prescription for interpreting the distribution of vertical distances of matched pairs that can be used for data from any recording using any spike-sorter (Methods section 4.2, Supplement section 8.4, figure S9, paragraphs 7-10 of the Discussion section). This extension of the method allows users to estimate the matching success in the context of their own data, even in the absence of reference data. To address the concern of overfitting, we have also added discussion covering adjustment of the two parameters in the procedure (the relative weight of waveform distance vs. physical distance, and the threshold for accepting matches as real) to the Discussion section.

      Reviewer #2 suggested clarification of the following points in the public review. We answer those here and have also clarified these points in the main text where appropriate.

      (1) What is the purpose of testing the drift correction with imposed drift (Figure 2, page 6 in the original manuscript), and how the value was chosen?

      To test the ability of EMD to detect substantial drift, we need examples that resemble experimental data, including error in fit unit positions and units with no correct matches. We chose to create these examples by taking waveform and position sets from real data with modest drift, and adding a fixed shift to one dataset. The value of 12 um in the figure is arbitrary, simply an example in the range of real drift. These tests allow us to demonstrate the success of EMD for detection of drift in real data.

      (2) How is performance affected by using a different weighting of the 2 measures (physical distance and waveform distance) in the EMD?

      Recovery rate (number of reference units successfully matched in EMD) vs weighting of the waveform distance is shown in Supplement section 8.10. Recovery rate increases with low values of waveform weighting, leveling off at a value of 1500. We selected that inflection point for the analysis in this paper, to avoid coincidental matching of physically distant units with similar waveforms.

      (3) Should the intervals measured in the survival plot in Figure 5 be identical for the three different classes of tracked neurons?

      The plot includes all chains of tracked neurons, which can start on arbitrary days in the set of all recordings (see the definition of chains in section 2.4). As a result, the gaps between days, which determine where there is a point on the plot, can be different for the three sets of neurons (reference, putative, and mixed). We have added a comment to the Figure 5 caption to ensure this is clear.

      (4) Would other metrics of the similarity of visual responses work better?

      The similarity metric we use was adopted from the original paper using this data (reference 7). We chose to use the same metric both to take advantage of the original authors’ expertise about the data and allow for reasonable comparison of the new technique to theirs. It is correct that this similarity metric alone does not allow for unique matching (see Discussion and Supplement section 8.2). However, the agreement of EMD with reference pairs determined from the combination of position and visual response similarity is very high, suggesting there are few incorrect reference pairs. Any incorrect reference pairs cause an underestimate of the tracking accuracy.

      (5) Add a definition of ROC.

      Added this definition to the text.

      Reviewer #1 Recommendation to authors:

      The main text needs proofreading.

      We agree that the manuscript needed more thorough proofreading, and we have made corrections of typos and minor language errors throughout.

      Additional comment from the authors:

      Since the posting of this manuscript, another method for tracking neurons has been introduced:

      Enny H. van Beest, Célian Bimbard, Julie M. J. Fabre, Flóra Takács, Philip Coen, Anna Lebedeva, Kenneth Harris, Matteo Carandini, Tracking neurons across days with high-density probes, bioRxiv 2023.10.12.562040; doi: https://doi.org/10.1101/2023.10.12.562040

    1. Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors compared four types of hiPSCs and four types of hESCs at the proteome level to elucidate the differences between hiPSCs and hESCs. Semi-quantitative calculations of protein copy numbers revealed increased protein content in iPSCs. Particularly in iPSCs, proteins related to mitochondrial and cytoplasmic were suggested to reflect the state of the original differentiated cells to some extent. However, the most important result of this study is the calculation of the protein copy numbers per cell, and the validity of this result is problematic. In addition, several experiments need to be improved, such as using cells of different genders (iPSC: female, ESC: male) in mitochondrial metabolism experiments.

      Strengths:

      The focus on the number of copies of proteins is exciting and appreciated if the estimated calculation result is correct and biologically reproducible.

      Weaknesses:

      The proteome results in this study were likely obtained by simply looking at differences between clones, and the proteome data need to be validated. First, there were only a few clones for comparison, and the gender and number of cells did not match between ESCs and iPSCs. Second, no data show the accuracy of the protein copy number per cell obtained by the proteome data.

      We agree with the reviewer in their assessment that more independent stem cell clones and an equal gender balance would be preferable. We will mention these considerations as limitations of our study and encourage a larger-scale follow-up.

      Regarding the estimated copy numbers, we would like to highlight that they have been extensively in the field, with direct validation of the differences in copy numbers with orthogonal methods like FACS2-4,7,10. Furthermore, the original paper directly compared the copy numbers estimated using the “proteomic ruler” to spike-in protein epitope signature tags and found remarkable concordance. This was performed with a much older generation mass spectrometer with reduced peptide coverage, and the author predicted that higher coverage would increase the quantitative performance.

      Reviewer #2 (Public Review):

      Summary:

      Pluripotent stem cells are powerful tools for understanding development, differentiation, and disease modeling. The capacity of stem cells to differentiate into various cell types holds great promise for therapeutic applications. However, ethical concerns restrict the use of human embryonic stem cells (hESCs). Consequently, induced human pluripotent stem cells (ihPSCs) offer an attractive alternative for modeling rare diseases, drug screening, and regenerative medicine.

      A comprehensive understanding of ihPSCs is crucial to establish their similarities and differences compared to hESCs.

      This work demonstrates systematic differences in the reprogramming of nuclear and non-nuclear proteomes in ihPSCs.

      We thank the reviewer for the positive assessment.

      Strengths:

      The authors employed quantitative mass spectrometry to compare protein expression differences between independently derived ihPSC and hESC cell lines. Qualitatively, protein expression profiles in ihPSC and hESC were found to be very similar. However, when comparing protein concentration at a cellular level, it became evident that ihPSCs express higher levels of proteins in the cytoplasm, mitochondria, and plasma membrane, while the expression of nuclear proteins is similar between ihPSCs and hESCs. A higher expression of proteins in ihPSCs was verified by an independent approach, and flow cytometry confirmed that ihPSCs had larger cell sizes than hESCs. The differences in protein expression were reflected in functional distinctions. For instance, the higher expression of mitochondrial metabolic enzymes, glutamine transporters, and lipid biosynthesis enzymes in ihPSCs was associated with enhanced mitochondrial potential, increased ability to uptake glutamine, and increased ability to form lipid droplets.

      Weaknesses:

      While this finding is intriguing and interesting, the study falls short of explaining the mechanistic reasons for the observed quantitative proteome differences. It remains unclear whether the increased expression of proteins in ihPSCs is due to enhanced transcription of the genes encoding this group of proteins or due to other reasons, for example, differences in mRNA translation efficiency. Another unresolved question pertains to how the cell type origin influences ihPSC proteomes. For instance, whether ihPSCs derived from fibroblasts, lymphocytes, and other cell types all exhibit differences in their cell size and increased expression of cytoplasmic and mitochondrial proteins. Analyzing ihPSCs derived from different cell types and by different investigators would be necessary to address these questions.

      We agree with the Reviewer that our study does not provide a mechanistic reason for the quantitative differences between the two cell types. However, we will include an expanded section in the discussion where we discuss the potential causes.<br /> We also agree studying hiPSCs reprogrammed from different cell types, such as blood lymphocytes, would be of great interest and will include a section about this within the discussion to encourage further research into the area.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Brenes and colleagues carried out proteomic analysis of several human induced pluripotent (hiPSC) and human embryonic stem cell (hESC) lines. The authors found quantitative differences in the expression of several groups of cytoplasmic and mitochondrial proteins. Overall, hiPSC expressed higher levels of proteins such as glutamine transporters, mitochondrial metabolism proteins, and proteins related to lipid synthesis. Based on the protein expression differences, the authors propose that hiPSC lines differ from hESC in their growth and metabolism.

      Strengths:

      The number of generated hiPSC and hESC lines continues to grow, but potential differences between hiPSC and hESC lines remain to be quantified and explained. This study is a promising step forward in understanding of the differences between different hiPSC and hESC lines.

      Weaknesses:

      It is unclear whether changes in protein levels relate to any phenotypic features of cell lines used. For example, the authors highlight that increased protein expression in hiPSC lines is consistent with the requirement to sustain high growth rates, but there is no data to demonstrate whether hiPSC lines used indeed have higher growth rates.

      We respectfully disagree with the reviewer on this point. Our data shows that hESCs and hiPSCs show significant differences in protein mass and cell size, validated by the EZQ assay and FACS, while having no significant differences in their cell cycle profiles. Thus increased size and protein content would require higher growth rates to sustain the increased mass, which is what we show.

      The authors claim that the cell cycle of the lines is unchanged. However, no details of the method for assessing the cell cycle were included so it is difficult to appreciate if this assessment was appropriately carried out and controlled for.<br /> We apologise for this omission; the details will be included in the revised version of the document.

      Details and characterisation of iPSC and ESC lines used in this study were overall lacking. The lines used are merely listed in methods, but no references are included for published lines, how lines were obtained, what passage they were used at, their karyotype status, etc. For details of basic characterisation, the authors should refer to the ISSC Standards for the use of human stem cells in research. In particular, the authors should consider whether any of the changes they see may be attributed to copy number variants in different lines.

      We agree with the reviewer on this. The hiPSC lines were generated by the HipSci consortium in the Wellcome Sanger Centre as described in the flagship HipSci paper13. We cite the flagship paper which specifies in great detail the reprogramming protocols and quality control measures, including looking at copy number variations13. However, we agree that we did not make this information easily accessible for readers. We also believe it is relevant to also explicitly include this information on our manuscript instead of expecting readers to look at the flagship paper. These details will be added to the revised version.

      The expression data for markers of undifferentiated state in Figure 1a would ideally be shown by immunocytochemistry or flow cytometry as it is impossible to tell whether cultures are heterogeneous for marker expression.

      We agree with the reviewer on this. FACS is indeed much more quantitative and a better method to study heterogeneity. However, we did not have protocols to study these markers using FACS.

      TEM analysis should ideally be quantified.

      We agree with the reviewer that it would be nice to have a quantitative measure.

      All figure legends should explicitly state what graphs are representing (e.g. average/mean; how many replicates (biological or technical), which lines)? Some data is included in Methods (e.g. glutamine uptake), but not for all of the data (e.g. TEM).

      We agree with the reviewer completely. These points will be remediated in the revised version of the manuscript.

      Validation experiments were performed typically on one or two cell lines, but the lines used were not consistent (e.g. wibj_2 versus H1 for respirometry and wibj_2, oaqd_3 versus SA121 and SA181 for glutamine uptake). Can the authors explain how the lines were chosen?

      We will include these details within the updated manuscript.

      The authors should acknowledge the need for further functional validation of the results related to immunosuppressive proteins.

      We agree with the reviewer and will add a clear sentence in the discussion making this point explicitly.

      Differences in H1 histone abundance were highlighted. Can the authors speculate as to the meaning of these differences?

      Regarding H1 histones, our study of the literature as well as interaction with chromatin and histone experts both within our institute and externally have not shed light into what the differences could imply. We think this is an interesting result that merits further study, but we don’t have a clear hypothesis on the consequences.

      In summary, we thank the reviewers for their comments and will prepare a revised version that addresses their suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study uses a multi-pronged empirical and theoretical approach to advance our understanding of how differences in learning relate to differences in the ways that male versus female animals cope with urban environments, and more generally how reversal learning may benefit animals in urban habitats. The work makes an important contribution and parts of the data and analyses are solid, although several of the main claims are only partially supported or overstated and require additional support.

      Public Reviews:

      We thank the Editor and both Reviewers for their time and for their constructive evaluation of our manuscript. We worked to address each comment and suggestion offered by the Reviewers in our revision—please see our point-by-point responses below.

      Reviewer #1 (Public Review):

      Summary:

      In this highly ambitious paper, Breen and Deffner used a multi-pronged approach to generate novel insights on how differences between male and female birds in their learning strategies might relate to patterns of invasion and spread into new geographic and urban areas.

      The empirical results, drawn from data available in online archives, showed that while males and females are similar in their initial efficiency of learning a standard color-food association (e.g., color X = food; color Y = no food) scenario when the associations are switched (now, color Y = food, X= no food), males are more efficient than females at adjusting to the new situation (i.e., faster at 'reversal learning'). Clearly, if animals live in an unstable world, where associations between cues (e.g., color) and what is good versus bad might change unpredictably, it is important to be good at reversal learning. In these grackles, males tend to disperse into new areas before females. It is thus fascinating that males appear to be better than females at reversal learning. Importantly, to gain a better understanding of underlying learning mechanisms, the authors use a Bayesian learning model to assess the relative role of two mechanisms (each governed by a single parameter) that might contribute to differences in learning. They find that what they term 'risk sensitive' learning is the key to explaining the differences in reversal learning. Males tend to exhibit higher risk sensitivity which explains their faster reversal learning. The authors then tested the validity of their empirical results by running agent-based simulations where 10,000 computersimulated 'birds' were asked to make feeding choices using the learning parameters estimated from real birds. Perhaps not surprisingly, the computer birds exhibited learning patterns that were strikingly similar to the real birds. Finally, the authors ran evolutionary algorithms that simulate evolution by natural selection where the key traits that can evolve are the two learning parameters. They find that under conditions that might be common in urban environments, high-risk sensitivity is indeed favored.

      Strengths:

      The paper addresses a critically important issue in the modern world. Clearly, some organisms (some species, some individuals) are adjusting well and thriving in the modern, human-altered world, while others are doing poorly. Understanding how organisms cope with human-induced environmental change, and why some are particularly good at adjusting to change is thus an important question.

      The comparison of male versus female reversal learning across three populations that differ in years since they were first invaded by grackles is one of few, perhaps the first in any species, to address this important issue experimentally.

      Using a combination of experimental results, statistical simulations, and evolutionary modeling is a powerful method for elucidating novel insights.

      Thank you—we are delighted to receive this positive feedback, especially regarding the inferential power of our analytical approach.

      Weaknesses:

      The match between the broader conceptual background involving range expansion, urbanization, and sex-biased dispersal and learning, and the actual comparison of three urban populations along a range expansion gradient was somewhat confusing. The fact that three populations were compared along a range expansion gradient implies an expectation that they might differ because they are at very different points in a range expansion. Indeed, the predicted differences between males and females are largely couched in terms of population differences based on their 'location' along the rangeexpansion gradient. However, the fact that they are all urban areas suggests that one might not expect the populations to differ. In addition, the evolutionary model suggests that all animals, male or female, living in urban environments (that the authors suggest are stable but unpredictable) should exhibit high-risk sensitivity. Given that all grackles, male and female, in all populations, are both living in urban environments and likely come from an urban background, should males and females differ in their learning behavior? Clarification would be useful.

      Thank you for highlighting a gap in clarity in our conceptual framework. To answer the Reviewer’s question—yes, even with this shared urban ‘history’, it seems plausible that males and females could differ in their learning. For example, irrespective of population membership, such sex differences could come about via differential reliance on learning strategies mediated by an interaction between grackles’ polygynous mating system and malebiased dispersal system, as we discuss in L254–265 (now L295–306). Population membership might, in turn, differentially moderate the magnitude of any such sex-effect since an edge population, even though urban, could still pose novel challenges—for example, by requiring grackles to learn novel daily temporal foraging patterns such as when and where garbage is collected (grackles appear to track this food resource: Rodrigo et al. 2021 [DOI: 10.1101/2021.06.14.448443]). We now introduce this important conceptual information— please see L89–96.

      Reinforcement learning mechanisms:

      Although the authors' title, abstract, and conclusions emphasize the importance of variation in 'risk sensitivity', most readers in this field will very possibly misunderstand what this means biologically. Both the authors' use of the term 'risk sensitivity' and their statistical methods for measuring this concept have potential problems.

      Please see our below responses concerning our risk-sensitivity term.

      First, most behavioral ecologists think of risk as predation risk which is not considered in this paper. Secondarily, some might think of risk as uncertainty. Here, as discussed in more detail below, the 'risk sensitivity' parameter basically influences how strongly an option's attractiveness affects the animal's choice of that option. They say that this is in line with foraging theory (Stephens and Krebs 2019) where sensitivity means seeking higher expected payoffs based on prior experience. To me, this sounds like 'reward sensitivity', but not what most think of as 'risk sensitivity'. This problem can be easily fixed by changing the name of the term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We have added this information to our manuscript in L494–497. We further apologise for not clearly explaining how our lambda parameter estimates such risk-sensitive foraging. To do so here, we need to consider our Bayesian reinforcement learning model in full. This model uses observed choice-behaviour during reinforcement learning to infer our phi (information-updating) and lambda (risksensitivity) learning parameters. Thus, payoffs incurred through choice simultaneously influence estimation of each learning parameter—that is, in a sense, they are both sensitive to rewards. But phi and lambda differentially direct any reward sensitivity back on choicebehaviour due to their distinct definitions. Glossing over the mathematics, for phi, stronger reward sensitivity (bigger phi values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For lambda, stronger reward sensitivity (bigger lambda values) means stronger internal determinism about seeking the non-risk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’. We hope this information, which we have incorporated into our revised manuscript (please see L153–161), clarifies the rationale and mechanics of our reinforcement learning model, and why lamba measures risk-sensitivity.

      In addition, however, the parameter does not measure sensitivity to rewards per se - rewards are not in equation 2. As noted above, instead, equation 2 addresses the sensitivity of choice to the attraction score which can be sensitive to rewards, though in complex ways depending on the updating parameter. Second, equations 1 and 2 involve one specific assumption about how sensitivity to rewards vs. to attraction influences the probability of choosing an option. In essence, the authors split the translation from rewards to behavioral choices into 2 steps. Step 1 is how strongly rewards influence an option's attractiveness and step 2 is how strongly attractiveness influences the actual choice to use that option. The equation for step 1 is linear whereas the equation for step 2 has an exponential component. Whether a relationship is linear or exponential can clearly have a major effect on how parameter values influence outcomes. Is there a justification for the form of these equations? The analyses suggest that the exponential component provides a better explanation than the linear component for the difference between males and females in the sequence of choices made by birds, but translating that to the concepts of information updating versus reward sensitivity is unclear. As noted above, the authors' equation for reward sensitivity does not actually include rewards explicitly, but instead only responds to rewards if the rewards influence attraction scores. The more strongly recent rewards drive an update of attraction scores, the more strongly they also influence food choices. While this is intuitively reasonable, I am skeptical about the authors' biological/cognitive conclusions that are couched in terms of words (updating rate and risk sensitivity) that readers will likely interpret as concepts that, in my view, do not actually concur with what the models and analyses address.

      To answer the Reviewer’s question—yes, these equations are very much standard and the canonical way of analysing individual reinforcement learning (see: Ch. 15.2 in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky 2018 [DOI: 10.1017/CBO9781316272503]; McElreath et al. 2008 [DOI: 10.1098/rstb/2008/0131]; Reinforcement Learning by Sutton & Barto 2018). To provide a “justification for the form of these equations'', equation 1 describes a convex combination of previous values and recent payoffs. Latent values are updated as a linear combination of both factors, there is no simple linear mapping between payoffs and behaviour as suggested by the reviewer. Equation 2 describes the standard softmax link function. It converts a vector of real numbers (here latent values) into a simplex vector (i.e., a vector summing to 1) which represents the probabilities of different outcomes. Similar to the logit link in logistic regression, the softmax simply maps the model space of latent values onto the outcome space of choice probabilities which enter the categorial likelihood distribution. We can appreciate how we did not make this clear in our manuscript by not highlighting the standard nature of our analytical approach—we now do so in our revised manuscript (please see L148–149). As far as what our reinforcement learning model measures, and how it relates cognition and behaviour, please see our previous response.

      To emphasize, while the authors imply that their analyses separate the updating rate from 'risk sensitivity', both the 'updating parameter' and the 'risk sensitivity' parameter influence both the strength of updating and the sensitivity to reward payoffs in the sense of altering the tendency to prefer an option based on recent experience with payoffs. As noted in the previous paragraph, the main difference between the two parameters is whether they relate to behaviour linearly versus with an exponential component.

      Please see our two earlier responses on the mechanics of our reinforcement learning model.

      Overall, while the statistical analyses based on equations (1) and (2) seem to have identified something interesting about two steps underlying learning patterns, to maximize the valuable conceptual impact that these analyses have for the field, more thinking is required to better understand the biological meaning of how these two parameters relate to observed behaviours, and the 'risk sensitivity' parameter needs to be re-named.

      Please see our earlier response to these suggestions.

      Agent-based simulations:

      The authors estimated two learning parameters based on the behaviour of real birds, and then ran simulations to see whether computer 'birds' that base their choices on those learning parameters return behaviours that, on average, mirror the behaviour of the real birds. This exercise is clearly circular. In old-style, statistical terms, I suppose this means that the R-square of the statistical model is good. A more insightful use of the simulations would be to identify situations where the simulation does not do as well in mirroring behaviour that it is designed to mirror.

      Based on the Reviewer’s summary of agent-based forward simulation, we can see we did a poor job explaining the inferential value of this method—we apologise. Agent-based forward simulations are posterior predictions, and they provide insight into the implied model dynamics and overall usefulness of our reinforcement learning model. R-squared calculations are retrodictive, and they say nothing about the causal dynamics of a model. Specifically, agent-based forward simulation allows us to ask—what would a ‘new’ grackle ‘do’, given our reinforcement learning model parameter estimates? It is important to ask this question because, in parameterising our model, we may have overlooked a critical contributing mechanism to grackles’ reinforcement learning. Such an omission is invisible in the raw parameter estimates; it is only betrayed by the parameters in actu. Agent-based forward simulation is ‘designed’ to facilitate this call to action—not to mirror behavioural results. The simulation has no apriori ‘opinion’ about computer ‘birds’ behavioural outcomes; rather, it simply assigns these agents random phi and lambda draws (whilst maintaining their correlation structure), and tracks their reinforcement learning. The exercise only appears circular if no critical contributing mechanism(s) went overlooked—in this case computer ‘birds’ should behave similar to real birds. A disparate mapping between computer ‘birds’ and real birds, however, would mean more work is needed with respect to model parameterisation that captures the causal, mechanistic dynamics behind real birds’ reinforcement learning (for an example of this happening in the human reinforcement learning literature, see Deffner et al. 2020 [DOI: 10.1098/rsos.200734]). In sum, agent-based forward simulation does not access goodness-of-fit—we assessed the fit of our model apriori in our preregistration (https://osf.io/v3wxb)—but it does assess whether one did a comprehensive job of uncovering the mechanistic basis of target behaviour(s). We have worked to make the above points on the method and the insight afforded by agent-based forward simulation explicitly clear in our revision—please see L192–207 and L534–537.

      Reviewer #2 (Public Review):

      Summary:

      The study is titled "Leading an urban invasion: risk-sensitive learning is a winning strategy", and consists of three different parts. First, the authors analyse data on initial and reversal learning in Grackles confronted with a foraging task, derived from three populations labeled as "core", "middle" and "edge" in relation to the invasion front. The suggested difference between study populations does not surface, but the authors do find moderate support for a difference between male and female individuals. Secondly, the authors confirm that the proposed mechanism can actually generate patterns such as those observed in the Grackle data. In the third part, the authors present an evolutionary model, in which they show that learning strategies as observed in male Grackles do evolve in what they regard as conditions present in urban environments.

      Strengths:

      The manuscript's strength is that it combines real learning data collected across different populations of the Great-tailed grackle (Quiscalus mexicanus) with theoretical approaches to better understand the processes with which grackles learn and how such learning processes might be advantageous during range expansion. Furthermore, the authors also take sex into account revealing that males, the dispersing sex, show moderately better reversal learning through higher reward-payoff sensitivity. I also find it refreshing to see that the authors took the time to preregister their study to improve transparency, especially regarding data analysis.

      Thank you—we are pleased to receive this positive evaluation, particularly concerning our efforts to improve scientific transparency via our study’s preregistration (https://osf.io/v3wxb).

      Weaknesses:

      One major weakness of this manuscript is the fact that the authors are working with quite low sample sizes when we look at the different populations of edge (11 males & 8 females), middle (4 males & 4 females), and core (17 males & 5 females) expansion range. Although I think that when all populations are pooled together, the sample size is sufficient to answer the questions regarding sex differences in learning performance and which learning processes might be used by grackles but insufficient when taking the different populations into account.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results (it might; so might muchneeded population replicates—see L310), but our Bayesian models still allow us to learn a lot from our current data. We now explain this in our revised manuscript—please see L452–457.

      Another weakness of this manuscript is that it does not set up the background well in the introduction. Firstly, are grackles urban dwellers in their natural range and expand by colonising urban habitats because they are adapted to it? The introduction also fails to mention why urban habitats are special and why we expect them to be more challenging for animals to inhabit. If we consider that one of their main questions is related to how learning processes might help individuals deal with a challenging urban habitat, then this should be properly introduced.

      In L74–75 (previously L53–56) we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We hope this included information sufficiently answers the Reviewer’s question. We have worked towards flushing out how urban-imposed challenges faced by grackles, such as the wildlife management efforts introduced in L64–65 (now L85–86), may apply to animals inhabiting urban environments more broadly; for example, we now include an entire paragraph in our Introduction detailing how urban environments may be characterised differently to nonurban environments, and thus why they are perhaps more challenging for animals to inhabit— please see L56–71.

      Also, the authors provide a single example of how learning can differ between populations from more urban and more natural habitats. The authors also label the urban dwellers as the invaders, which might be the case for grackles but is not necessarily true for other species, such as the Indian rock agama in the example which are native to the area of study. Also, the authors need to be aware that only male lizards were tested in this study. I suggest being a bit more clear about what has been found across different studies looking at: (1) differences across individuals from invasive and native populations of invasive species and (2) differences across individuals from natural and urban populations.

      We apologise for not including more examples of such learning differences. We now include three examples (please see L43–49), and we are careful to call attention to the fact that these data cover both resident urban and non-urban species as well as urban invasive species (please see L49–50). We also revised our labelling of the lizard species (please see L44). We are aware only male lizards were tested but this information is not relevant to substantiating our use of this study; that is, to highlight that learning can differ between urbandwelling and non-urban counterparts. We hope the changes we did make to our manuscript satisfy the Reviewer’s general suggestion to add biological clarity.

      Finally, the introduction is very much written with regard to the interaction between learning and dispersal, i.e. the 'invasion front' theme. The authors lay out four predictions, the most important of which is No. 4: "Such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range." The authors, however, never return to this prediction, at least not in a transparent way that clearly pronounces this pattern not being found. The model looking at the evolution of risk-sensitive learning in urban environments is based on the assumption that urban and natural environments "differ along two key ecological axes: environmental stability 𝑢 (How often does optimal behaviour change?) and environmental stochasticity 𝑠 (How often does optimal behaviour fail to pay off?). Urban environments are generally characterised as both stable (lower 𝑢) and stochastic (higher 𝑠)". Even though it is generally assumed that urban environments differ from natural environments the authors' assumption is just one way of looking at the differences which have generally not been confirmed and are highly debated. Additionally, it is not clear how this result relates to the rest of the paper: The three populations are distinguished according to their relation to the invasion front, not with respect to a gradient of urbanization, and further do not show a meaningful difference in learning behaviour possibly due to low sample sizes as mentioned above.

      Thank you for highlighting a gap in our reporting clarity. We now take care to transparently report our null result regarding our fourth prediction; more specifically, that we did not detect credible population-level differences in grackles’ learning (please see L130). Regarding our evolutionary model, we agree with the Reviewer that this analysis is only one way of looking at the interaction between learning phenotype and apparent urban environmental characteristics. Indeed, in L282–288 (now L325–329) we state: “Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on”. But we can see now that ‘core’ is too strong a word, and instead ‘supposed’, ‘purported’ or ‘theorised’ would be more accurate—we have revised our wording throughout our manuscript to say as much (please see, for example, L24; L56; L328). We also further highlight the preliminary nature of our evolutionary model, in terms of allowing a narrow but useful first-look at urban eco-evolutionary dynamics—please see L228–232. Finally, we now detail the theorised characteristics of urban environments in our Introduction (rather than in our Results; please see L56–71), and we hope that by doing so, how our evolutionary results relate to the rest of our paper is now better set up and clear.

      In conclusion, the manuscript was well written and for the most part easy to follow. The format of eLife having the results before the methods makes it a bit harder to follow because the reader is not fully aware of the methods at the time the results are presented. It would, therefore, be important to more clearly delineate the different parts and purposes. Is this article about the interaction between urban invasion, dispersal, and learning? Or about the correct identification of learning mechanisms? Or about how learning mechanisms evolve in urban and natural environments? Maybe this article can harbor all three, but the borders need to be clear. The authors need to be transparent about what has and especially what has not been found, and be careful to not overstate their case.

      Thank you, we are pleased to read that the Reviewer found our manuscript to be generally digestible. We have worked to add further clarity, and to tempter our tone (please see our above and below responses).

      Reviewer #1 (Recommendations For The Authors):

      Several of the results are based on CIs that overlap zero. Tone these down somewhat.

      We apologise for overstating our results, which we have worked to tone down in our revision. For instance, in L185–186 we now differentiate between estimates that did or did not overlap zero (please also see our response to Reviewer 2 on this tonal change). We note we do not report confidence intervals (i.e., the range of values expected to contain the true estimate if one redoes the study/analysis many times). Rather, we report 89% highest posterior density intervals (i.e., the most likely values of our parameters over this range). We have added this definition in L459, to improve clarity.

      The literature review suggesting that urban environments are more unpredictable is not convincing. Yes, they have more noise and light pollution and more cars and planes, but does this actually relate to the unpredictability of getting a food reward when you choose an option that usually yields rewards?

      To answer the Reviewer’s question—yes. But we can see that by not including empirical examples from the literature, we did a poor job of arguing such links. In L43–49 we now give three empirical examples; more specifically, we state: “[...] experimental data show the more variable are traffic noise and pedestrian presence, the more negative are such human-driven effects on birds' sleep (Grunst et al., 2021), mating (Blickley et al., 2012), and foraging behaviour (Fernández-Juricic, 2000).” We note we now detail such apparently stable but stochastic urban environmental characteristics in our Introduction rather than our Results section, to hopefully improve the clarity of our manuscript (please see L56–71). We further note that we cite three literature reviews—not one—suggesting urban environments are stable in certain characteristics and more unpredictable in others (please see L59–60). Finally, we appreciate such characterisation is not certain, and so in our revision we have qualified all writing about this potential dynamic with words such as “apparent”, “supposed”, “theorised”, “hypothesised” etc.

      It would be interesting to see if other individual traits besides sex affect their learning/reversal learning ability and/or their learning parameters. Do you have data on age, size, condition, or personality? Or, the habitat where they were captured?

      We do not have these data. But we agree with the Reviewer that examining the potential influence of such covariates on grackles’ reinforcement learning would be interesting in future study, especially habitat characteristics (please see L306–309).

      For most levels of environmental noise, there appears to be an intermediate maximum for the relationship between environmental stability and the risk sensitivity parameter. What does this mean?

      There is indeed an intermediate maximum for certain values of environmental stochasticity (although the differences are rather small). The most plausible reason for this is that for very stable environments, simulated birds essentially always “know” the rewarded solution and never need to “relearn” behaviour. In this case, differences in latent values will tend to be large (because they consistently get rewarded for the same option), and different lambda values (in the upper range) will produce the same choice behaviour, which results in very weak selection. While in very unstable environments, optimal choice behaviour should be more exploratory, allowing learners to track frequently-changing environments. We now note this pattern in L240–248.

      Reviewer #2 (Recommendations For The Authors):

      L2: I'd encourage the authors to reconsider the term "risk-sensitive learning", at least in the title. It's not apparent to me how 'risk' relates to the investigated foraging behaviour. Elsewhere, risk-reward sensitivity is used which may be a better term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We have added this information to our manuscript in L494–497. In explaining our reinforcement model, we also now detail how risk relates to foraging behaviour. Specifically, in L153–161 we now state: “Both learning parameters capture individual-level internal response to incurred reward-payoffs, but they differentially direct any reward sensitivity back on choice-behaviour due to their distinct definitions (full mathematical details in Materials and methods). For 𝜙, stronger reward sensitivity (bigger values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For 𝜆, stronger reward sensitivity (bigger values) means stronger internal determinism about seeking the nonrisk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’.” We hope this information clarifies why lamba measures risk-sensitivity, and why we continue to use this term.

      L1-3: The title is a bit misleading with regard to the empirical data. From the data, all that can be said is that male grackles relearn faster than females. Any difference between populations actually runs the other way, with the core population exhibiting a larger difference between males and females than the mid and edge populations.

      It is customary for a manuscript title to describe the full scope of the study. In our study, we have empirical data, cognitive modelling, and evolutionary simulations of the background theory all together. And together these analytical approaches show: (1) across three populations, male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—outperform female counterparts in reversal learning; (2) they do this via risk-sensitive learning, so they’re more sensitive to relative differences in reward payoffs and choose to stick with the ‘safe’ i.e., rewarding option, rather than continuing to ‘gamble’ on an alternative option; and (3) risk-sensitive learning should be favoured in statistical environments characterised by purported urban dynamics. So, we do not feel our title “Leading an urban invasion: risk-sensitive learning is a winning strategy” is misleading with regard to our empirical data; it just doesn’t summarise only our empirical data. Finally, as we now state in L312–313, we caution against speculating about any between-population variation, as we did not infer any meaningful behavioural or mechanistic population-level differences.

      L13: "Assayed", is that correctly put, given that the authors did not collect the data?

      Merrian-Webster defines assay as “to analyse” or “examination or determination as to characteristics”, and so to answer the Reviewer’s question—yes, we feel this is correctly put. We note we explicitly introduce in L102–103 that we did not collect the data, and we have an explicit “Data provenance” section in our methods (please see L342–347).

      L42-46: The authors provide a single example of how learning can differ between populations from more urban and more natural habitats. I would like to point out that many of these studies do not directly confirm that the ability in question has indeed led to the success of the species tested (e.g. show fitness consequences). Then the authors could combine these insights to form a solid prediction for the grackles. As of now, this looks like cherry-picking supportive literature without considering negative results.

      Here are some references that might be helpful in identifying relevant literature to cite:

      Szabo, B., Damas-Moreira, I., & Whiting, M. J. (2020). Can cognitive ability give invasive species the means to succeed? A review of the evidence. Frontiers in Ecology and Evolution, 8, 187.

      Griffin AS, Tebbich S, Bugnyar T, 2017. Animal cognition in a human-dominated world. Anim Cogn 20(1):1-6.

      Kark, S., Iwaniuk, A., Schalimtzek, A., & Banker, E. (2007). Living in the city: Can anyone become an "urban exploiter"? Journal of Biogeography, 34(4), 638-651.

      We apologise for not including more examples of such learning differences. We now include three examples (please see L43–49). We are aware that direct evidence of fitness consequences is entirely lacking in the scientific literature on cognition and successful urban invasion; hence why such data is not present in our paper. But we now explicitly point out a role for likely fitness-affecting anthropogenic disturbances on sleep, mate, and foraging behaviour on animals inhabiting urban environments (please see L63–68). We hope these new data bolster our predictions for our grackles. Finally, the Reviewer paints a (in our view) inaccurate picture of our use of available literature. Nevertheless, to address their comment, we now highlight a recent meta-analysis advocating for further research to confirm apparent ‘positive’ trends between animal ‘smarts’ and successful ‘city living’ (please see L43).

      L64: Is their niche historically urban, or have they recently moved into urban areas?

      In L74–75 (previously L53–56) we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We hope this included information sufficiently answers the Reviewer’s question.

      L66-67: This is an important point that is however altogether missing from the discussion.

      We thank the Reviewer for highlighting a gap in our discussion regarding populationlevel differences in grackles’ reinforcement learning. In L310–312 we now state: “The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. Nevertheless, the currently available data do not show meaningful population-level behavioural or mechanistic differences in grackles’ reinforcement learning, and we should thus be cautious about speculating on between-population variation”.

      L68-71: The paper focuses on cognitive ability. The whole paragraph sets up the prediction of why male grackles should be better learners due to their dispersal behaviour. This example, however, focuses on aggression, not cognition. Here is a study showing differences in learning in male and female mynas that might be better suited:

      Federspiel IG, Garland A, Guez D, Bugnyar T, Healy SD, Güntürkün O, Griffin AS, 2017. Adjusting foraging strategies: a comparison of rural and urban common mynas (Acridotheres tristis). Anim Cogn 20(1):65-74.

      We thank the Reviewer for suggesting this paper. We feel it is better suited to substantiating our point in the Discussion about reversal learning not being indicative of cognitive ability—please see L276–277.

      L73: Generally, I suggest not writing "for the first time" as this is not a valid argument for why a study should be conducted. Furthermore, except for replication studies, most studies investigate questions that are novel and have not been investigated before.

      The Reviewer makes a fair point—we have removed this statement.

      L80-81: Here again, this is left undiscussed later on.

      By ‘this’ we assume the Reviewer is referring to our hypothesis, which is that sex differences in dispersal are related to sex differences in learning in an urban invader— grackles. At the beginning of our Discussion, we state how we found support for this hypothesis (please see L250–261); and in our ‘Ideas and speculation’ section, we discuss how these hypothesis-supporting data fit into the literature more broadly (please see L294–331). We feel this is therefore sufficiently discussed.

      L77-81: This sentence is very long and therefore hard to read. I suggest trying to split it into at least 2 separate sentences which would improve readability.

      Per the Reviewer’s useful suggestion, we have split this sentence into two separate sentences—please see L97–115.

      L83: Please explain choice-option switches. I am not aware of what that is and it should be explained at first mention.

      We apologise for this operational oversight. We now include a working definition of speed and choice-option switches at first mention. Specifically, in L107–108 we state: “[...] we expect male and female grackles to differ across at least two reinforcement learning behaviours: speed (trials to criterion) and choice-option switches (times alternating between available stimuli)”.

      L83-87: Again, a very long sentence. Please split.

      We thank the Reviewer for their suggestion. In this case we feel it is important to not change our sentence structure because we want our prediction statements to match between our manuscript and our preregistration.

      L96-97: Important to not overstate this. It merely demonstrates the potential of the proposed (not detected) mechanism to generate the observed data.

      As in any empirical analysis, our drawn conclusions depend on causal assumptions about the mechanisms generating behaviour (Pearl, J. (2009). Causality). Therefore, we “detected” specific learning mechanisms assuming a certain generative model, namely reinforcement learning. As there is overwhelming evidence for the widespread importance of value-based decision making and Rescorla-Wagner updating rules across numerous different animals (Sutton & Barto (2018) Reinforcement Learning), we would argue that this assumed model is highly plausible in our case. Still, we changed the text to “inferred” instead of “detected” learning mechanisms to account for this concern—please see L123–124.

      L99: "urban-like settings" again a bit confusing. The authors talk about invasion fronts, but now also about an urbanisation gradient. Is the main difference between the size and the date of establishment, or is there additionally a gradient in urbanisation to be considered?

      We now include a paragraph in our Introduction detailing apparent urban environmental characteristics (please see 56–71), and we now refer to this dynamic specifically when we define urban-like settings (please see L126–127). To answer the Reviewer’s question—we consider both differences. Specifically, we consider the time since population establishment in our paper (with respect to our behavioural and mechanistic modelling), as well as how statistical environments that vary in how similar they are to apparently characteristically urban-like environments, might favour particular learning phenotypes (with respect to our evolutionary modelling). We hope the edits to our Introduction as a whole now make both of the aims clear.

      L11-112: Above the authors talk about a comparable number of switches (10.5/15=0.7), and here of fewer number of switches (25/35=0.71), even though the magnitude of the difference is almost identical and actually runs the other way. The authors are probably misled by their conservative priors, which makes the difference appear greater in the second case than in the first. Using flat priors would avoid this particular issue.

      Mathematically, the number of trials-to-finish and the number of choice-optionswitches are both a Poisson distributed outcome with rate λ (we note lambda here is not our risk-sensitivity parameter; just standard notation). As such, our Poisson models infer the rate of these outcomes by sex and phase—not the ratio of these outcomes by sex and phase. So comparing the magnitude of divided medians of choice-option-switches between the sexes by phase is not a meaningful metric with respect to the distribution of our data, as the Reviewer does above. For perspective, 1 vs. 2 switches provides much less information about the difference in rates of a Poisson distribution than 50 vs 100 (for the former, no difference would be inferred; for the latter, it would), but both exhibit a 1:2 ratio. To hopefully prevent any such further confusion, and to focus on the fact that our Poisson models estimate the expected value i.e., the mean, we now report and graph (please see Fig. 2) mean and not median trialsto-finish and total-switch-counts. Finally, we can see that our use of the word “conservative” to describe our weakly informative priors is confusing, because conservative could mean either strong priors with respect to expected effect size (not our parameterisation) or weak priors with respect to such assumptions (our parameterisation). To address this lack of clarity, we now state that we use “weakly informative priors” in L457–458.

      L126: It is not clear what risk sensitivity means in the context of these experiments.

      Thank you for pointing out our lack of clarity. In L153–161 we now state: “Both learning parameters capture individual-level internal response to incurred reward-payoffs, but they differentially direct any reward sensitivity back on choice-behaviour due to their distinct definitions (full mathematical details in Materials and methods). For 𝜙, stronger reward sensitivity (bigger values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For 𝜆, stronger reward sensitivity (bigger values) means stronger internal determinism about seeking the nonrisk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’.” We hope this information clarifies what risk sensitivity means and measures, with respect to our behavioural experiments.

      L128-129: I find this statement too strong. A plethora of other mechanisms could produce similar patterns, and you cannot exclude these by way of your method. All you can show is whether the mechanism is capable of producing broadly similar outcomes as observed

      In describing the inferential value of our reinforcement learning model, we now qualify that the insight provided is of course conditional on the model, which is tonally accurate. Please see L161.

      L144: As I have already mentioned above, here is the first time we hear about unpredictability related to urban environments. I suggest clearly explaining in the introduction how urban and natural environments are assumed to be different which leads to animals needing different cognitive abilities to survive in them which should explain why some species thrive and some species die out in urbanised habitats.

      Thank you for this suggestion. We now include a paragraph in our Introduction detailing as much—please see L56–71.

      L162: "almost entirely above zero" again, this is worded too strongly.

      In reporting our lambda across-population 89% HPDI contrasts in L185–186, we now state: “[...] across-population contrasts that lie mostly above zero in initial learning, and entirely above zero in reversal learning”. Our previous wording stated: ““[...] across-population contrasts that lie almost entirely above zero”. The Reviewer was correct to point out that this previous wording was too strong if we considered the contrasts together, as, indeed, we find the range of the contrast in initial learning does minimally overlap zero (L: -0.77; U: 5.61), while the range of the contrast in reversal learning does not (L: 0.14; U: 4.26). This rephrasing is thus tonally accurate.

      L178-179: I think it should be said instead that the model accounts well for the observed data.

      We have rephrased in line with the Reviewer’s suggestion, now stating in L217–218 that “Such quantitative replication confirms our reinforcement learning model results sufficiently explain our behavioural sex-difference data.”

      L188-190: I am not convinced this is a general pattern. It is quite a bold claim that I don't find to be supported by the citations. Why should biotic and abiotic factors differ in how they affect behavioural outcomes? Also, events in urban environments such as weekend/weekday could lead to highly regular optimal behaviour changes.

      Please see our response to Reviewer 1 on this point. We note we now touch on such regular events in L94–96.

      L209-211: The first sentence is misleading. The authors have found that males and females differ in 'risk sensitivity', that their learning model can fit the data rather well, and that under certain, not necessarily realistic assumptions, the male learning type is favoured by natural selection in urban environments. A difference between core, middle, and edge habitats however is barely found, and in fact seems to run the other way than expected.

      In our study, we found: (1) across three populations, male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—outperform female counterparts in reversal learning; (2) they do this via risk-sensitive learning, so they’re more sensitive to relative differences in reward payoffs and choose to stick with the ‘safe’ i.e., rewarding option, rather than continuing to ‘gamble’ on an alternative option; (3) we are sufficiently certain risk-sensitive learning generates our sex-difference data, as our agentbased forward simulations replicate our behavioural results (not because our model ‘fits’ the data, but because we inferred meaningful mechanistic differences—see our response to Reviewer 1 on this point); and (4) under theorised dynamics of urban environments, natural selection should favour risk-sensitive learning. We therefore do not feel it is misleading to say that we mapped a full pathway from behaviour to mechanisms through to selection and adaptation. Again, as we now state in L311–313, we caution against speculating about any between-population variation, as we did not infer any meaningful behavioural or mechanistic population-level differences. And we note the Reviewer is wrong to assume an interaction between learning, dispersal, and sex requires population-level differences on the outcome scale—please see our discussion on phenotypic plasticity and inherent species trait(s) in L313–324.

      L216: "indeed explain" again worded too strongly.

      We have tempered our wording. Specifically, we now state in L218: “sufficiently explain”. This wording is tonally accurate with respect to the inferential value of agent-based forward simulations—please see L192–207 on this point.

      L234: "reward-payoff sensitivity" might be a better term than risk-sensitivity?

      Please see our earlier response to this suggestion. We note we have changed this text to state “risk-sensitive learning” rather than “reward-payoff sensitivity”, to hopefully prevent the reader from concluding only our lambda term is sensitive to rewards—a point we now include in L153–154.

      L234-237: I think these points may be valuable, but come too much out of the blue. Many readers will not have a detailed knowledge of the experimental assays. It therefore also does not become clear how they measure the wrong thing, what this study does to demonstrate this, or whether a better alternative is presented herein. It almost seems like this should be a separate paper by itself.

      We apologise for this lack of context. We now explicitly state in L275 that we are discussing reversal learning assays, to give all readers this knowledge. In doing so, we hope the logic of our argument is now clear: reversal learning assays do not measure behavioural flexibility, whatever that even is. The Reviewer’s suggestion of a separate paper focused on what reversal learning assays actually measure, in terms of mechanism(s), is an interesting one, and we would welcome this discussion. But any such paper should build on the points we make here.

      L270-288: Somewhere here the authors have to explain how they have not found differences between populations, or that in so far as they found them, they run against the originally stated hypothesis.

      We thank the Reviewer for these suggestions. In L310—313 we now state: “The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. Nevertheless, the currently available data do not show meaningful population-level behavioural or mechanistic differences in grackles’ reinforcement learning, and we should thus be cautious about speculating on between-population variation”.

      L284: should be "missing" not "missed out"

      We have made this change.

      L290-291: It is unclear what "robust interactive links" were found. A pattern of sexbiased learning was found, which can potentially be attributed to evolutionary pressures in urban environments. An interaction e.g. between learning, dispersal, and sex can only be tentatively suggested (no differences between populations). Also "fully replicable" is a bit misleading. The analysis may be replicable, but the more relevant question of whether the findings are replicable we cannot presently answer.

      We apologise for our lack of clarity. By “robust” we mean “across population”, which we now state in L333. We again note the Reviewer is wrong to assume an interaction between learning, dispersal, and sex requires population-level differences on the outcome scale— please see our discussion on phenotypic plasticity and inherent species trait(s) in L313–324. Finally, the Reviewer makes a good point about our analyses but not our findings being replicable. In L334 we now make this distinction by stating “analytically replicable”.

      L306-315: I think you have a bit of a sample size issue not so much when populations are pooled but when separated. This might also factor in the fact that you do not really find differences across the populations in your analysis. When we look at the results presented in Figure 2 (and table d), we can see a trend towards males having better risk sensitivity in core (HPDI above 0) and middle populations (HPDI barely crossing 0) but the difference is very small. Especially the results on females are based on the performance of only 8 and 4 females respectively. I suggest making this clear in the manuscript.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results; it might; so might muchneeded population replicates—see L310. But our Bayesian models still allow us to learn a lot from our current data, and, at present, we infer no meaningful population-level behavioural or mechanistic differences in grackles’ behaviour. To make clear the inferential sufficiency of our analytical approach, we now include some of the above points in our Statistical analyses section in L452–457. Finally, we caution against speculating on any between-population variation, as we now highlight in L311—313 of our Discussion.

      Figure 2: I think the authors should rethink their usage of colour in this graph. It is not colour-blind friendly or well-readable when printed in black and white.

      We used the yellow (hex code: #fde725) and green (hex code: #5ec962) colours from the viridis package. As outlined in the viridis package vignette (https://cran.rproject.org/web/packages/viridis/index.html), this colour package is “designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency. The color maps are also perceptually-uniform, both in regular form and also when converted to black-and-white for printing”.

      Figure 3B: Could the authors turn around the x-axis and the colour code? It would be easier to read this way.

      We appreciate that aesthetic preferences may vary. In this case, we prefer to have the numbers on the x-axis run the standard way i.e., from small to large. We note we did remove the word ‘Key’ from this Figure, in line with the Reviewer’s point about these characteristics not being totally certain.

      I also had a look at the preregistration. I do think that there are parts in the preregistration that would be worth adding to the manuscript:

      L36-40: This is much easier to read here than in the manuscript.

      We changed this text generally in the Introduction in our revision, so we hope the Reviewer will again find this easier to read.

      L49-56: This is important information that I would also like to see in the manuscript.

      We no longer have confidence in these findings, as our cleaning of only one part of these data revealed considerable experimenter oversight (see ‘Learning criterion’).

      L176: Why did you remove the random effect study site from the model? It is not part of the model in the manuscript anymore.

      The population variable is part of the RL_Comp_Full.stan model that we used in our manuscript to assess population differences in grackles’ reinforcement learning, the estimates from which we report in Table C and D (please note we never coded this variable as “study cite”). But rather than being specified as a random effect, in our RL_Comp_Full.stan model we index phi and lambda by population as a predictor variable, to explicitly model population-level effects. Please see our code:

      https://github.com/alexisbreen/Sex-differences-in-grackles- learning/blob/main/Models/Reinforcement%20learning/RL_Comp_Full.stan

      L190-228: I am wondering if the model validation should also be part of the manuscript as well, rather than just being in the preregistration?

      We are not sure how the files were presented to the Reviewer for review, but our study preregistration, which includes our model validation, should be part of our manuscript as a supplementary file.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1.1) This work introduces a new method of imaging the reaction forces generated by small crawling organisms and applies this method to understanding locomotion of Drosophila larva, an important model organism. The force and displacement data generated by this method are a qualitative improvement on what was previously available for studying the larva, improving simultaneously the spatial, temporal, and force resolution, in many cases by an order of magnitude. The resulting images and movies are quite impressive.

      We thank the reviewer for their recognition of the achievements our work presents and for their feedback with regard to what they consider our most important findings and the points raised in their review. We will address these points individually below.

      (1.2) As it shows the novel application of recent technological innovations, the work would benefit from more detail in the explanation of the new technologies, of the rationales underlying the choice of technology and certain idiosyncratic experimental details, and of the limitations of the various techniques. In the methods, the authors need to be sure to provide sufficient detail that the work can be understood and replicated. The description of the results and the theory of motion developed here focus only on forces generated when the larva pushes against the substrate and ignores the equally strong adhesive forces pulling the larva onto the substrate.

      As the reviewer correctly points out, our present work adapts a recently developed set of methods (namely, ERISM and WARP) for use with small soft-bodied animals. The foundational methods have been described in detail in previous publications (refs, 23 and 26). However, upon reflection, we agree that more information can be provided to ensure our work is more accessible and reproducible. We also agree that some additional clarifying information on our approach could be helpful. We have addressed this in the following ways:

      (1) We have included a detailed Key Resources table in the methods section to allow for maximum transparency on equipment and reagent sourcing. This can now be found on Pages 16-19.

      (2) We have modified the ‘Freely behaving animals force imaging’ section of the Materials and Methods section to include more detailed information on practical aspects of conducting experiments. These changes can be found on page 23-24 (lines 566–567, 571-577).

      (3) We have re-ordered the Materials and Methods section, such that microcavity fabrication and microcavity characterisation occur prior to the description of ERISM and WARP experiments - this change should hopefully aid replication. Details regarding the application of a silicone well to the surface of microcavities have also been added (lines 472-474).

      (4) We have added additional text in the Introduction and Results (Pages 3-4 and 7, lines 56-86, and 152-153) to explain our rationale for using ERISM/WARP and additional text in the discussion that discusses the potential role(s) of adhesive forces in larval locomotion (Page 12, lines 301307).

      (1.3) The substrate applies upward, downward, and horizontal forces on the larva, but only upward and downward forces are measured, and only upward forces are considered in the discussions of "Ground Reactive Forces." An apparent weakness of the WARP technique for the study of locomotion is that it only measures forces perpendicular to the substrate surface ("vertical forces" in Meek et al.), while locomotion requires the generation of forces parallel to the substrate ("horizontal forces"). It should be clarified that only vertical forces are studied and that no direct information is provided about the forces that actually move the larva forward (or about the forces which impede this motion and are also generated by the substrate). Along with this clarification, it would be helpful to include a discussion of other techniques, especially micropillar arrays and traction force microscopy, that directly measure horizontal forces and of why these techniques are inappropriate for the motions studied here.

      We attempted to provide a streamlined Introduction in our initial submission and then compared ERISM/WARP to other methods in our discussion. We are happy to provide a brief overview of substrate force measurement methods in the introduction to help set the stage for readers. The Introduction section of our revised manuscript now contains the following comparison of different mechanobiological imaging techniques on pages 3-4 lines 56-86:

      ‘However, in the field of cellular mechanobiology, many new force measuring techniques have been developed which allow measurement of comparatively small forces from soft structures exhibiting low inertia (15–17) often with relatively high spatial-resolution. Early methods such as atomic force microscopy required the use of laser-entrained silicon probes to make contact with a cell of interest (15). This approach is problematic for studying animal behaviour due to the risk of the laser and probe influencing behaviour. Subsequently, techniques have been developed which allow indirect measurement of substrate interactions. One such approach is Traction Force Microscopy (TFM) in which the displacement of fluorescent markers suspended in a material with known mechanical properties relative to a zero-force reference allows for indirect measurement of horizontally aligned traction forces (17–19). This technique allows for probe-free measurement of forces, but the need to obtain a precise zero-force reference would make time-lapse measurements on behaving animals challenging; further, depending on the version used, it has insufficient temporal resolution for the measurement of forces produced by many behaving animals, despite recent improvements (20). A second approach revolves around the use of micropillar arrays; in this technique, horizontally-aligned traction forces are measured by observing the deflection of pillars made of an elastic material with known mechanical properties. This approach can be limited in spatial resolution and introduces a non-physiological substrate that may influence animal behavior (21,22).

      Recently we have introduced a technique named Elastic Resonator Interference Stress Microscopy (ERISM) which allows for the optical mapping of vertically aligned GRFs in the pico and nanonewton ranges with micrometre spatial resolution by monitoring local changes in optical resonances of soft and deformable microcavities. This technique allows reference-free mapping of substrate deformations and calculation of vertically directed GRFs; it has been used to study a range of questions related to exertion of cellular forces (23–25). Until recently, this technique was limited by its low temporal resolution (~10s), making it unsuitable for recording substrate interaction during fast animal movements, but a further development of ERISM known as wavelength alternating resonance pressure microscopy (WARP), has been demonstrated to achieve down to 10 ms temporal resolution (26). Given ERISM/WARP allows for probe-free measurement of vertical ground reaction forces with high spatial and temporal resolution, it becomes an attractive method for animal-scale mechanobiology.’

      (1.4) The larvae studied are about 1 mm long and 0.1 mm in cross-section. Their volumes are therefore on order 0.01 microliter, their masses about 0.01 mg, and their weights in the range of 0.1 micronewton. This contrasts with the force reported for a single protpodium of 1 - 7 micronewtons. This is not to say that the force measurements are incorrect. Larvae crawl easily on an inverted surface, showing gravitational forces are smaller than other forces binding the larva to the substrate. The forces measured in this work are also of the same magnitude as the horizontal forces reported by Khare et al. (ref 32) using micropillar arrays.

      I suspect that the forces adhering the larva to the substrate are due to the surface tension of a water layer. This would be consistent with the ring of upward stress around the perimeter of the larva visible in S4D, E and in video SV3. The authors remark that upward deflection of the substrate may be due to the Poisson's ratio of the elastomer, but the calibration figure S5 shows that these upward deflections and forces are much smaller than the applied downward force. In any case, there must be a downward force on the larva to balance the measured upward forces and this force must be due to interaction with the substrate. It should be verified that the sum of downward minus upward forces on the gel equals the larva's weight (given the weight is neglible compared to the forces involved, this implies that the upward and downward forces should sum to 0).

      We have carefully calculated the forces exerted by protopodia and are confident in the accuracy of our measurements as reported. We further agree with the reviewer’s suggestion that gravitational forces can be largely neglected.

      As the reviewer points out, one would expect forces due to upward and downward deflections to cancel when considering the entire system. However, we see indications that the counteracting / balancing force often acts over a much larger area than the acting force, e.g. a sharp indentation by a protopodium might be counteracted by an upward deflection over a 10-20 fold larger radius and hence 100 to 400-fold larger area, thereby reducing the absolute value of the upward deflection at any given pixel surrounding the indentation. This in turn increases error in determining the integrated upward deformation, making it difficult to perform an absolute comparison of acting and counteracting force. Further, recording the entire counteracting force induced deformation would require acquiring data with a prohibitively large field of view.

      We agree that in some situations, water surface tension may be adhering animals to the substrate. Importantly, this is a challenge that the animal faces outside the lab in its natural environment of moist rotting fruit and yeast. The intricate force patterns seen in our study in the presence of water surface tension are therefore ecologically relevant. In other situations (e.g. preparing for pupation), larvae are able to stick to dry surfaces, suggesting that other adhesive forces such as mucoid adhesion can also come into play in certain behavioural contexts. A full characterization of the effects of water tension and mucoid adhesion are beyond the scope of this study. However, we have now added a sentence on pages 8 and 12 commenting on these other biomechanical forces at play:

      ‘We also observed that the animals travel surrounded by a relatively large water droplet (lines 189-190).’

      ‘We observed that larvae travel surrounded by moisture from a water droplet, which produces a relatively large upwardly directed force in a ring around the animal. The surface tension produced by such a water droplet likely serves a role in adhering the animal to the substrate. However, during forward waves, we found that protopodia detached completely during SwP, suggesting this surface tensionrelated adhesion force can be easily overcome by the behaving animal. (lines 301-307) .’

      (1.5) Much of the discussion and the model imply that the sites where the larva exerts downward force on the gel are the sites where horizontal propulsion is generated. This assumption should be justified. Can the authors rule out that the larva 'pulls' itself forward using surface tension instead of 'pushing' itself forward using protopodia?

      Determining the exact ‘sites’ where horizontal propulsion is generated is challenging. In our conceptual model, movement is not initiated by protopodia per se, but rather by a constellation of muscle contractions, which act upon the hydrostatic skeleton, which in turn causes visceral pistoning that heaves larvae forward. This is based on previous findings in Ref 31. While there are indeed downward protopodial ‘vaulting’ forces prior to initiation of swing, we propose that the main function of protopodia is not to push the larvae forward, but rather to provide anchoring to counteract opposing forces generated by muscles. We agree that water surface tension could also be sculpting biomechanical interactions; however, a full characterization of how water surface tension shapes larval locomotion is beyond the scope of this study.

      Since we have observed larvae move over dry terrain (e.g. glass) without an encasing water bubble, we do not believe that an encasing water bubble is strictly required for locomotion. We have also seen no obvious locomotion related modulations in the pulling forces created by water bubbles encasing larva, which would be expected if animals were somehow using water tension to pull themselves forward. Overall, the most likely explanation is that larvae use a mixture of biomechanical tactics to suit the moment in a given environment. This represents a challenge but also an opportunity for future research.

      We have now added additional text in the ‘Functional subdivisions within protopodia’ subsection to discuss these nuances (page 14, lines 382-387):

      ‘This increased force transmitted into the substrate is unexpected as the forces generated for the initiation of movement should arise from the contraction of the somatic muscles. We propose that the contraction of the musculature responsible for sequestration acts to move haemolymph into the protopodia thus exerting an increased pressure onto the substrate while the contact area decreases as a consequence of the initiation of sequestration.’

      and (page 15, lines 398-399):

      ‘Water surface films appear to facilitate larval locomotion in general but the biomechanical mechanisms by which they do this remain unclear.’

      (1.6) More detail should be provided about the methods, their limitations, and the rationale behind certain experimental choices.

      We thank the reviewer for this comment. As this significantly overlaps with a point raised earlier, we kindly direct them to our answer to comment #1.2 above.

      (1.7) Three techniques are introduced here to study how a crawling larva interacts with the substrate: standard brightfield microscopy of a larva crawling in an agarose capillary, ERISM imaging of an immobilized larva, and WARP imaging of a crawling larva. The authors should make clear why each technique was chosen for a particular study - e.g. could the measurements using brightfield microscopy also be accomplished using WARP? They should also clarify how these techniques relate to and possibly improve on existing techniques for measuring forces organisms exert on a substrate, particularly micropillar arrays and Traction Force Microscopy.

      Indeed, each of the three methods used has a specific merit. The brightfield microscopy was selected to track features on the animal’s body and to provide a basic control for the later measurements. However, this technique cannot directly measure the substrate interaction, it only allows inferences to be made from tracked features at the substrate interface. ERISM provides high resolution maps of the indentation induced by the larva; it is also extensively validated for mapping cell forces and the data analysis is robust against defects on the substrate (refs 23, 24 and 25). However, as we explain in the manuscript, ERISM lacks the temporal resolution needed to monitor mechanical activity of behaving larva. Its use was therefore limited to the study of anaesthetised animals. For mapping forces exerted by behaving larva, we used WARP which is a further development of ERISM that offers higher frame rates but at the cost of requiring more extensive calibration (Supplementary Figure S4). The streamlined introduction of the different methods in our original manuscript originates from our attempt to be as concise as possible. However, as state in response to comment #1.2, we agree that additional explanation and discussion will be helpful for readers and that it will helpful to briefly refer to other methods for force mapping. We have now added references to a variety of techniques in the Introduction (Page 3-4, lines 56-86) as stated in a prior response.

      (1.8) As written, "(ERISM) (19) and a variant, Wavelength Alternating Resonance Pressure microscopy (WARP) (20) enable optical mapping of GRFs in the nanonewton range with micrometre and millisecond precision..." (lines 53-55) may generate confusion. ERISM as described in this work has a much lower temporal resolution (requires the animal to be still for 5 seconds - lines 474-5); In this work, WARP does not appear to have nanonewton precision (judging by noise on calibration figures) and it is not clear that it has millisecond precision (the camera used and its frame rate should be specified in the methods).

      Previous studies have demonstrated the capabilities and limitations of ERISM and WARP. Upon reflection, we agree that our wording here could be more precise. To clarify our claim, we now separate the statements on ERISM and WARP in the introduction as follows (page 4, lines 78-83):

      “Until recently, this technique was limited by its low temporal resolution (~10s) making it unsuitable for use in recording substrate interaction during fast animal movements, but a further development of ERISM known as wavelength alternating resonance pressure microscopy (WARP), has been demonstrated to achieve down to 10 ms temporal resolution (26)”

      While WARP can achieve comparable force resolution as ERISM when used in a cellular context (c.f. Ref 26), we agree that for the present study, the resolution was in the 10s of nanonewton range, due to the need to use stiffer substrates and larger fields of view.

      The camera used in our work was specified in the appropriate subsection of the Materials and Methods (“All WARP and ERISM images were acquired using an Andor Zyla 4.2 sCMOS camera (Andor Technology, Belfast, UK)”). We apologise that the exact frame rate used in our current work was not mentioned in our original manuscript; this has now been added to the ‘Freely behaving animals force imaging’ section of the Materials and Methods (page 23, lines 574-577).

      (1.9) It would be helpful to have a discussion of the limits of the techniques presented and tradeoffs that might be involved in overcoming them. For instance, what is the field of view of the WARP microscope, and could it be increased by choosing a lower power objective? What would be required to allow WARP microscopy to measure horizontal forces? Can a crawling larva be imaged over many strides by recentering it in the field of view, or are there only particular regions of the elastomer where a measurement may be made?

      We agree with the reviewer that some discussion of the limitations of our technique will allow readers to have a more informed appreciation of what we are capable of measuring using WARP. However, as this is the first work to ever demonstrate such measurements, the limitations and tradeoffs cannot all be known with certainty at the present stage.

      To answer your individual questions:

      (1) There is a trade-off between numerical aperture and the ability to resolve individual interference fringes. Since our approach to calculate displacement from reflection maps relies upon counting of individual fringe transitions, going to a lower powered objective risks having these fringes blend and thus the identification of the individual transitions becoming impossible. The minimum numerical aperture of the objective will therefore generally depend on the steepness of indentations produced by the animals; the steeper an indentation, the closer the neighbouring fringes and thus the higher the required magnification to resolve them.

      (2) From WARP and ERISM data, one can make inferences about horizontal forces, as is described in detail in our earlier publications about ERISM (ref, 23). However, quantitation of horizontal forces at sufficient temporal resolution to allow the investigation of behaving Drosophila larva is currently not possible.

      (3) Many strides can indeed be imaged using our technique, however, this comes with additional technical challenges. Whether or not the animal itself can be recentred is an ongoing challenge. We have found that the animals are amenable to recentring themselves within the field of view if chasing an attractive odorant. However, manual recentering using a paintbrush risks destroying the top surface of the soft elastic resonator and recentering the microscope stage would require real-time object tracking which has been outside the scope of this original work, given the other challenging requirements on hardware and optics for obtaining high quality force maps.

      To provide more information on limitations of our technique, we have added the following text into the discussion (pages 13-14, lines 356-370).

      ‘Despite the substantial advances they have provided, the use of WARP and ERISM also brings challenges and has several technical limitations. For example, fabrication of resonators is much more challenging than preparation of the agarose substrates conventionally used for studying locomotion of Drosophila. This problem is compounded by the fragility of the devices owing to the fragility of the thin gold top mirror. This becomes problematic when placing animals onto the microcavities, as often the area local to the initial placement of the animal is damaged by the paintbrush used to move the animals. Further, as a result of the combining of the two wavelengths, the effective framerate of the resultant displacement and stress maps is equal to half of the recorded framerate of the interference maps. To be able to monitor fast movements, recording at very high framerates is therefore necessary which, depending on hardware, might require imaging at reduced image size, but this in turn reduces the number of peristaltic waves that can be recorded before the animal escapes the field of view. A further limitation is that WARP and ERISM are sensitive mainly to forces in the vertical direction; this is complementary to TFM, which is sensitive to forces in horizontal directions. Using WARP in conjunction with high speed TFM (possibly using the tuneable elastomers presented here) could provide a fully integrated picture of underlying vertical and horizontal traction forces during larval locomotion.’ And further on page 13, lines 337-341:

      ‘More detailed characterisation of this behaviour remains a challenge owing to the changing position of the mouth hooks. Due to their rigid structure and the relatively large forces produced in planting, mouth hooks produce substrate interaction patterns which our technique struggles to map accurately due to overlapping interference fringes ambiguating the fringe transitions.’

      We trust that the above discussion and our modifications to our manuscript resulting from these will address the reviewer’s concerns.

      Reviewer #2 (Public Review):

      (2.1) With a much higher spatiotemporal resolution of ground dynamics than any previous study, the authors uncover new "rules" of locomotory motor sequences during peristalsis and turning behaviors. These new motor sequences will interest the broad neuroscience community that is interested in the mechanisms of locomotion in this highly tractable model. The authors uncover new and intricate patterns of denticle movements and planting that seem to solve the problem of net motion under conditions of force-balance. Simply put, the denticulated "feet" or tail of the Drosophila larva are able to form transient and dynamic anchors that allow other movements to occur.

      We thank the reviewer for their feedback and the information regarding which of our results is likely to resonate most impactfully with readers from a biological background.

      The biology and dynamics are well-described. The physics is elementary and becomes distracting when occasionally overblown. For example, one doesn't need to invoke Newton's third law, per se, to understand why anchors are needed so that peristalsis can generate forward displacements. This is intuitively obvious.

      We are sorry to hear that the reviewer found some of the physics details distracting. To address this concern, we have simplified some of the language while still attempting to keep the core arguments intact. For context and analogy, we still believe that including a brief reference to the laws of motion is helpful for some readers to explain some of our results and highlight their general implications, especially with regard to anchoring against reaction forces.

      One of our objectives is to make this article accessible and interesting for biologists and physicists at all levels. We feel it is important to reach out to both communities and try to be inclusive as possible in our writing. Newton’s 3rd law is clearly relevant for our study and it is a common point of reference for anyone with a highschool education, and so we feel it is appropriate to mention it as a way to help readers across disciplines understand the biophysical challenges faced by the animals we study.

      (2.2) Another distracting allusion to "physics" is correlating deformation areas with displaced volume, finding that "volume is a consequence of mass in a 2nd order polynomial relationship". I have no idea what this "physics" means or what relevance this relationship has to the biology of locomotion.

      Upon reflection, we agree that this language may be overly complex and distracts from what is, at its core, a simple, but important principle governing how Drosophila larvae interact with their substrates. The point we are trying to make is that our data show that forces exerted by an animal are proportional in a non-linear way to contact area. This suggests that to increase force exerted on the substrate, an animal must increase contact area. We do not observe contact area remaining constant while force increases, or vice versa. To make this result more clear, we have made several changes in our revised manuscript. Figure 5B no longer shows the relationship between the protopodial contact area and the displaced volume of the elastic resonator, but instead now shows the protopodial contact area and recorded force transmitted into the substrate. This then shows that in order to increase force transmitted into the substrate, these animals must increase their contact area. We have made changes to the figure legend of Figure 5 and the statements in the Results section accordingly (Page 9, lines 220-222).

      2.3 The ERISM and WARP methods are state-of-the-art, but aside from generally estimating force magnitudes, the detailed force maps are not used. The most important new information is the highly accurate and detailed maps of displacement itself, not their estimates of applied force using finite element calculations. In fact, comparing displacements to stress maps, they are pretty similar (e.g., Fig 4), suggesting that all experiments are performed in a largely linear regime. It should also be noted that the stress maps are assumed to be normal stresses (perpendicular to the plane), not the horizontal stresses that are the ones that actually balance forces in the plane of animal locomotion.

      We largely agree with the statement made by the reviewer here. However, we have found that in many contexts, audiences appreciate having the absolute number of the forces and stresses involved reported. Therefore, where possible, we have used stress maps, rather than displacement maps. We also observe that while stress and displacement maps show similar patterns, features sometimes appear sharper in the stress map, which is a result of the finite element algorithm being able to attribute a broad indentation to a somewhat more localised downward force. We have thus opted to keep to original stress maps. We have been more explicit about WARP and ERISM being more tuned to recording vertically directed forces throughout the revised manuscript (lines 75, 78, 86, 162, 301, 305, 336).

      We have also modified our Discussion section to encourage further investigation of our proposed model using a technique more tuned to horizontal stresses (pages 12-13, lines 324-328):

      ‘However, WARP microscopy is best suited to measurements of forces in the vertical direction, and though we can make inferences such as this as they are a consequence of fundamental laws of physics, we present this conclusion as a testable prediction which could be confirmed using a force measurement technique more tuned to horizontally directed forces relative to the substrate.’

      (2.4) But none of this matters. The real achievements are the new locomotory dynamics uncovered with these amazing displacement measurements. I'm only asking the authors to be precise and down-to-earth about the nature of their measurements.

      We thank the reviewer for their perceptiveness in finding that though the forces are interesting, the interactions themselves are the most noteworthy result here. We trust that with the changes made in our revised manuscript, the description is now more “down-to-earth”, more concise where appropriate, and accurate as to which results are particularly important and novel.

      (2.5) It would be good to highlight the strength of the paper -- the discovery of new locomotion dynamics with high-resolution microscopy -- by describing it in simple qualitative language. One key discovery is the broad but shallow anchoring of the posterior body when the anterior body undertakes a "head sweep". Another discovery is the tripod indentation at the tail at the beginning of peristalsis cycles.

      We thank the reviewer for this recommendation. We agree that including a more explicit statement of some of our findings, especially with regards to these new posterior tripod structures and the whole-abdomen preparatory anchoring prior to head sweeps, would make the paper more impactful. As a result, we have modified the discussion section to include a statement for each new result and have also amended our abstract as a result (lines 407-416):

      “Here we have provided new insights into the behaviour of Drosophila larval locomotion. We have provided new quantitative details regarding the GRFs produced by locomoting larvae with high spatiotemporal resolution. This mapping allowed the first detailed observations of how these animals mitigate friction at the substrate interface and thus provide new rules by which locomotion is achieved. Further, we have ascribed new locomotor function to appendages not previously implicated in locomotion in the form of tripod papillae, providing a new working hypothesis of how these animals initiate movement. These new principles underlying the locomotion outlined here may serve as useful biomechanical constraints as called for by the wider modelling community (39).”

      (2.6) As far as I know, these anchoring behaviors are new. It is intuitively obvious that anchoring has to occur, but this paper describes the detailed dynamics of anchoring for the first time. Anchoring behavior now has to be included in the motor sequence for Drosophila larva locomotion in any comprehensive biomechanical or neural model.

      We agree with the reviewer on this. We think it is best to let our colleagues reflect on our findings and then decide how best to include them in future models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please be sure to describe in a figure caption or in the methods the details of the optical setup, especially the focal lengths of all the lenses, including the objective, and part numbers of the LEDs and filters. It would be helpful to have a figure in the main paper explaining the principles of ERISM/WARP microscopy along with the calibration measurements and computational pipeline (this would mainly combine elements already in the supplement). Such a figure should also include details of the setup that are alluded to in the methods but not fully explained (for instance, a "silicone well" is referred to in the methods but never described). The calibration of elastomer stiffness that now appears in the main text could be made a supplementary figure, unless there is some new art in the fabrication of the elastomers that should be highlighted as an advance in the main text.

      We appreciate the importance of explaining our methods to readers.

      In response to the public comments, we have added further details in our methods section to clarify practical aspects and ensure that readers will be able to reproduce our work.

      In Supplemental Figure 2, we show the full optical light path for ERISM and WARP along with named components. In addition, the principles of ERISM and WARP microscopy have already been extensively described in previous publications (See Refs 23-26). In light of this, we feel that the best approach in this paper is to direct readers to those publications.

      We feel that it is appropriate to present the calibration of elastomer stiffness in the main text because this is indeed a new innovation that is not just about making the elastomers but making force sensors based on these different materials. This is really important because it shows how researchers can tune the stiffness of an ERISM/WARP elastomer to match the type of tissue or organism under study. This is really the key technical advance that enables whole animal biomechanics across a range of animal sizes, so we think it is appropriate to keep it in the main text.

      We want to make sure that we do not oversell this point, and we feel that we make it sufficiently clear in the main text of our manuscript that making elastomer based force sensors of appropriate stiffness is important, when we state

      “First, we developed optical microcavities with mechanical stiffnesses in the range found in hydrogel substrates commonly used for studying Drosophila larval behaviour, i.e. Young’s modulus (E) of 10-30kPa (36–38).” (p. 5, ll. 124) and later

      “Here we used Drosophila larvae as a test case, but our methods now allow elastic optical resonators to be tuned to a wide range of animal sizes and thus create new possibilities for studying principles of neuro-biomechanics across an array of animals.” (p. 12, ll. 337)

      I would appreciate a description of the "why" behind some experimental choices, as understanding the motivation would be helpful for other researchers looking to adopt these techniques.

      We have now added additional text in the introduction and discussion that explains the rationale behind our experimental choices. in more detail. Please see our response to Reviewer 1’s public comments on the same point.

      (1) The WARP and ERISM experiments were conducted on a collagen coated gold surface rather than agarose. Why? EG does agarose not adhere to the gold, or would its thickness interfere with the measurement?

      The gold layer is applied above the elastomer and the collagen on top of the gold layer makes the gold a more natural biological surface for the animals. Agarose is unsuitable as an elastomer because it would dry during the vacuum based deposition of the gold. It is also unsuitable as a surface coating on top of the gold as the coating on the gold needs to very thin to preserve the spatial and mechanical resolution of our sensors. Further, processing of agarose generally requires temperatures of 60°C and higher which we find can damage the elastomer / gold films.

      (2) The ERISM measurements are made on a cold anesthetized animal right as it starts to wake up (visible mouth-hooks movement), which presents some difficulty. Why not start imaging while the animal is still completely immobile? Or why not use a dead larva?

      This approach allowed us to get measurements of forces exerted by denticles that are physiologically and biomechanically accurate. In dead or fully anesthetized animals, one cannot be sure that the forces exerted by denticles and denticle bands are representative of the forces exerted by an animal with active hydrostatic control.

      (3) In the ERISM setup the monochromator is spatially filtered by focusing through pinhole, while in the WARP setup, the LEDs are not.

      Yes that’s correct. The LED light sources used in WARP have better spatial homogeneity than the tungsten filament used in ERISM and so a pinhole is not required in WARP.

      (4) SV4 shows the interference image of a turning larva (presumably from one illumination wavelength) rather than a reconstruction of the displacement or stresses. Why?

      We felt that in this particular case the interference images provided a clearer representation of the behavioural sequence, showing both the small indentations generated by individual denticles and the larger indentations of the animal overall.

      Lines 49-50 "a lack of methods with sufficient spatiotemporal resolution for measuring GRFs in freely behaving animals has limited progress." This needs a discussion of what sufficient spatial and temporal resolutions would be and how existing methods fall short of these goals.

      We have now rewritten the introduction to include an overview of other alternative approaches and of what we see as the requirements here. See our response to the public comments.

      Figure caption 1B (line 789) refers to "concave areas of naked cuticle (black line) which generally do not interact with the substrate" While I think this might be supported by later WARP images, it's not clear how the technique of figure 1 measures interaction, which could e.g. be mediated by surface tension of a transparent fluid.

      The technique of Figure 1 provides qualitative information which as the reviewer points out is validated by WARP measurements later.

      Lines 184-189 "However, unexpectedly, we observed an additional force on the substrate when protopodia leave the substrate (SI) and when they are replanted (ST). To investigate whether this force was due to an active behaviour or due to shifting body mass, we plotted integrated displacement (i.e. displaced volume) against the contact area for each protopodium, combining data from multiple forwards waves (Figure 5B). Area is correlated with displaced volume for most time points, indicating that volume is a consequence of mass in a 2nd order polynomial relationship." I couldn't follow this argument at all.

      We have now reworded this section and explained our rationale. Also see our response to a similar critique in Reviewer 2’s public comments.

      Generally the authors might reconsider their use of acronyms. e.g. (244-246) "SI latencies were much more strongly correlated with wave duration across most segments than ST latencies. SIs scale with SwP and this could be mediated by proprioceptor activity in the periphery" is made more difficult to parse by the abbreviations.

      As we need to refer to these terms multiple times throughout the manuscript, we feel the use of acronyms is appropriate here.

      The video captions are inadequate. Please expand on them to explain clearly what is shown, and also describe in the methods how the data were acquired and processed. For instance, it seems that in SV3 a motion correction algorithm is applied so that the larva appears stationary even as it crawls forward. I think "fourier filtered" means that the images were processed with a spatial high pass filter - this should be explained and the parameters noted.

      We have revisited the video captions provided in the supplementary information document and conclude that these contain the important information. The mode of acquisition are described in the methods, e.g. Video 1 and 2 see section in Methods on “Denticle band kinematic imaging” and Videos 3 and 4 see section in Methods on WARP. Supplementary Video 3 does not make use of motion correction; indeed, one can see the larvae moving upwards/forwards in the field of view. We apologize for not explaining the Fourier filtering process for Video 3. We have now modified the video caption to read as follows:

      Video SV3. WARP imaging during forwards peristalses.

      Video showing high frame rate displacement maps produced by a freely behaving Drosophila larva. Displacement maps were Fourier filtered to make denticulated cuticle more readily visible and projected in 3D to show the effects of substrate interaction. Details of the Fourier filtering procedure were described elsewhere [Kronenberg et al, Nat Cell Biol 19, 864–872 (2017)].

      What were the reflectances of the bottom (10 nm Au/Cr) and top (15nm Au) metal layers at the wavelengths used? I imagine the bottom layer should be less than 38%, the top layer higher, and the product of the square of the bottom transmission and the top reflectance coefficients equal to the bottom reflectance (to make the two paths of the interferometer contribute equal intensity), but none of this is stated.

      The reflectance of the gold mirrors was studied in detail in prior work on ERISM. See Kronenberg et al, Nat Cell Biol 19, 864–872 (2017). We therefore refrained from adding a complete optical characterization of the ERISM sensors again here. In brief, we found that a reflectance >13% at each Au mirror is required for reliable ERISM measurements.

      The description of the gold coated elastomer as a microcavity is confusing to me. Does the light really make multiple round trips between the plates before returning to the detector? The loss of light on each round trip would depend on the reflectance and parallelism of the top and bottom mirrors. From the WARP calculation it's appears that there is only one round trip - a pi/2 phase shift results from the calculation for one round trip: 2pi*2nL 5nm/(630nm)^2, with n = 1.4 and L = 8 microns - if there were two round trips, the phase shift would be pi etc. Would this better be described as a mostly common path interferometer?

      The physics of our devices is best described within the framework of thin film interference and (weak) microcavity optics. Indeed, light can make multiple roundtrips, though it gets attenuated with each reflection. The complete calculation of the multiple roundtrips is only required to obtain quantitative information on the amount of light that is reflected. The spectral position of minima in reflectance can also be obtained from assuming one roundtrip which is what is done in the description of the WARP calculations.

      Figure 2 e,f: the line fits appear to be dominated by the data points at 2 s. If these are removed, do the fits change? To support the argument that 2e shows a correlation and 2f does not, some kind of statistical test, ideally a hierarchical bootstrap, should be conducted to compare between the two measurements.

      If we remove the data points at 2 s, then R^2’s for swing initiation latencies change as follows: A2: 0.35 to 0.005; A4: 0.78 to 0.31; A6: 0.61 to 0.01. The data in 2e,f are the averages from 3 waves in each animal and so the data points at 2 s are not simply the result of single ‘rogue’ waves but rather averages of several trials. Further, if all individual waves are plotted, we can see that the overall trends are still visible.

      We don’t think it is appropriate to remove the data at 2 s from our analysis, but we take the point regarding statements about presence or absence of correlation in a formal sense. We have therefore changed the wording in the description of 2e,f to refer simply to the fact that wave duration can ‘largely determine' latencies in some instances, but is less able to in other instances, as is suggested by the R^2 (coefficient of determination) data. In discussion, we have also adjusted our wording.

      Figure 4 - please provide in the main figure or as a supplement the full images (i.e. not cropped to the assumed shape of the larva)

      We do not feel that it is necessary or helpful to provide the full images given that the focus of the analysis is on dynamics of protopodia movements.

      Figure 5e top: single data points around wave duration 0.6s appear to dominate fit lines. Does removing these points alter the fits? To support the argument that 5e top shows a correlation and 5e bottom does not, some kind of statistical test, ideally a hierarchical bootstrap, should be conducted to compare between the two measurements.

      In Figure 5e, we are showing all waves analysed across animals. If we remove the datapoints at 0.6 s, A2 R^2 changes from 0.24 to 0.05, A4 R^2 changes from 0.48 to 0.11, A6 R^2 changes from 0.69 to 0.34; however we don’t feel it is appropriate to remove these data from our analysis. We take the point about needing to be cautious about making claims about correlation versus no correlation and have now reworded description of these results along same lines as Figure 4.

      It appears from the methods (467-489) that animals were kept wet for warp imaging but not for ERISM imaging. Please confirm or explain further the presence or absence of a water layer in these two sets of measurements, as this could affect the adhesion forces.

      In each case, the animals were transferred onto experimental substrates with a moistened paintbrush. We have added text explicitly stating this in the methods section.

      Kim et al. Nature Methods 2017 (10.1038/nmeth.4429) describes recording two images separated by less than 60 microseconds using a scientific CMOS camera with a frame rate of 200 Hz. This is accomplished by triggering a pulsed LED once at the end of one frame's capture window and then a second time at the beginning of the next frame's window (see Supplementary Figure 10). I'm not sure if this trick is widely known, but it's worth considering if the authors are running into a problem with movement between the two wavelength exposures in their WARP setup.

      Thank you for this tip. We will take this under consideration for future work.

      Is the setup compatible with optogenetics? (EG is the red light dim enough that it wouldn't activate CsChrimson, or could a longer wavelength led be used for interferometry?) If so, activation of mooncrawler descending neuron (MDN) could be used to study backward crawling (or thermogenetic activation of MDN), e.g. to contrast the sites and order of "anchoring" between the two directions of crawling.

      The set-up is potentially compatible with optogenetics. We are in the process of exploring this in current ongoing work.

      Reviewer #2 (Recommendations For The Authors):

      Simplify/reduce the commentary about force measurements, and highlight the clear, qualitative descriptions of the novel locomotion patterns that they have observed. The microscopy and movements seem to matter more than the ground force estimations.

      We have addressed these issues in our responses to Reviewer 2’s public comments.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank the reviewers for their valuable feedback which has improved this work greatly from its original form, and are elated to have such glowing reviews of the revised work published alongside the revised preprint. Reviewer 3 raises some final salient points, which deserve a brief address here.

      Teeth: We thank the reviewer for clarifying their points. We do make the assumption that the ecological parameter space of toothed and beaked organisms will be comparable. Both are governed by the same set of physical principles and have the jaw bone as the most likely point of failure (teeth are harder than bone, and keratinous rhamphothecae are malleable and can be regrown with relative ease when deformed). Differences in stress/strain distribution between toothed and beaked organisms will occur but are already accounted for in our methods as we model both the teeth and rhamphotheca and will observe these different effects. We have added an explicit statement of this hypothesis to the Methods section of the manuscript.

      Cranial kinesis: In our opinion, it is a safe assumption that the lower jaws of extant birds and enantiornithines are comparable. We do not see why the acquisition of kinesis in the upper jaw would generally affect the functional role of or constraints on the lower jaw. One possibility we discussed is that a quickly-moving kinetic premaxilla could let the lower jaw move a shorter distance during effective prey capture and lower the selection for speed (i.e. allow jaw-closing MA to remain higher). While we have added this possibility to our call for the investigation of cranial kinesis, we consider it too speculative to begin altering interpretations of fossil taxa. All raw measurement data remains available so that, if evidence is found for cranial kinesis having predictable effects on our measured parameters, future researchers can re-analyse our data and update any ecological predictions accordingly.

      Organization: To our knowledge eLife format incorporates what one would think of as a Conclusions section into the Discussion. Our Discussion section currently contains 18 subheadings which should guide a reader to any specific topic of interest. The Discussion also progresses from a more narrow to broad focus which we and several colleagues find intuitive.

      We thank all three reviewers once again for their feedback that has improved this work and their kind words throughout the process.


      The following is the authors’ response to the original reviews.

      We thank all three reviewers for their detailed reviews, and generally agree with their feedback. To accompany the reviewed preprint of this manuscript, we wished to respond to comments from the reviewers so that they (and the public) will know what we are planning to incorporate in the revised manuscript we are currently preparing. If there are any comments on our plans in the meantime, please let us know.

      • Reviewer 1, on concerns regarding identification of ontogenetic stage and comparison of taxa from different ontogenetic stages: It is fair to say that enantiornithine ontogeny is still poorly understood, though we believe all current evidence points to each specimen used in this study to being adequately mature for comparison to the extant birds used in the study. Stages of skeletal fusion are the standard method of assessing enantiornithine ontogeny (Hu and O'Connor 2017), and our comparison of histological work (Atterholt, Poust et al. 2021) to skeletal stages in Table S4 suggests a transition from juvenile to subadult in stage 0 or 1 and from subadult to adult within stage 3. Thus, the specimens we quantitatively examine in this study, all at stages 2 or 3 (Figure S10), are advanced subadults or adults. It is well-known that many living animals considered “adults” would be considered subadults or even juveniles to a palaeontologist (Hone, Farke et al. 2016). So, even if some individuals in this study are not fully skeletally mature, they should have obtained the morphology which they would possess for most of their lives and thus the morphology which undergoes selective pressure. We will add this context to the “Bohaiornithid Ontogeny” section and thank the reviewer for seeking more detail for this point.

      • Reviewer 2, on need of a context figure: We have an artistic life reconstruction of a bohaiornithid in preparation, and can include that in the revised manuscript as a figure.

      • Reviewer 2, on raptor claw categories: We explain these categories in-depth in a previous work (Miller, Pittman et al. 2023). However, we will now add a short summary of that explanation to this work so that this manuscript will become self-contained in this regard. In short, the “large raptor” category includes extant birds with records of regularly taking prey which cannot be encircled with the pes, while birds in the “small raptor” have no such records. As Reviewer 2 points out this does often follow phylogenetic lines, but not always. E.g. most owls specialise in taking small prey, but the great horned owl Bubo virginianus regularly takes mammals and birds larger than its pes (Artuso, Houston et al. 2020); and conversely we can only find reports of the common black hawk Buteogallus anthracinus taking prey samll enough for the pes to encircle (Schnell 2020) despite other accipiters frequently taking large prey. In both cases these taxa plot in PCA nearer to other large or small raptors (respectively) than to their phylogenetic relatives.

      • Reviewer 3, on teeth vs beaks: We are not aware of any foods which are exclusive to toothed or beaked animals. There are some aspects of extant bird biology that may affect the way a certain diet may need to be adapted to which we do comment on, e.g. discussion of alternatives to the crop and ventriculus for processing plant matter in the Bohaiornithid Ecology and Evolution section. For functional studies, e.g. FEA, we have included the rhamphotheca in toothless models which serves the same role as teeth, to be a feeding surface. It should not matter, in theory, if the feeding surface is hard or soft as mechanical failure occurs in high stress/strain states regardless of the medium. If having teeth necessarily increases or decreses overall stress/strain relative to a beak (and from our work this does not appear to be the case), this would in turn necessarily limit dietary options. So, all models in our work should be directly comparable.

      As an additional note on this topic, we address tooth shape in bohaiornithids at the end of the Bohaiornithid Ecology and Evolution section. We specifically note that their tooth shape is likley controlled by phylogeny in the current version, though we will add a note in the upcoming version that the morphospace of bohaiorntihid teeth overlaps that of many other clades with purportedly diverse diets, which is consistent with a hypothesis of diverse diets within the clade.

      • Reviewer 3, on cranial kinesis: Our FE models should be unaffected by cranial kinesis, as these are two-dimensional and model the akinetic lower jaw only. Some mediolateral kinesis may be relevant in the mandible in the form of “wishboning” in different taxa, but its prevalence in extant birds is currently unknown. The preservation of enantiornithines (two-dimensionally and typically in lateral view) limits the ability to capture any mediolateral function regardless.

      Our models of mechanical advantage do not account for any cranial kinesis. This is a necessary simplifcation. The nature of cranial kinesis in extant birds, and the role that it plays in feeding, is poorly understood. Cranial kinesis will increase gape, but we don’t yet know how/if it affects jaw closing force and speed (moreover, given the variation in quadrate and hinge morphology present in extant birds, this is also something that is likely to be highly diverse). We have therefore modelled the extant birds’ jaw closing systems as having one, akinetic out lever (the jaw joint to the bite point), to match the situation in our fossil taxa. This is a common simplification that has been used previously with success (Corbin, Lowenberger et al. 2015, Olsen 2017). However, we acknowledge that this simplification may introduce some error. Unfortunately, until the mechanics of cranial kinesis – and the variation in the anatomy and performance of kinetic structures in extant birds – are better understood, we cannot determine exactly what that error looks like. We therefore have greater confidence in the inter-species comparability this conservative, akinetic approach (in other words, we may not be making assumptions that are 100% accurate, but we are at least making the same assumption across all taxa, so it should be comparable in its error). We will add a section in the Mechanical Advantage and Functional Indices discussion calling for further research into the mechanics of cranial kinesis so future mechanical advantage work in birds can take this matter into account.

      • Reviewer 3, on skull reconstruction: This issue is partly addressed in the Bohaiornithid Skull Reconstruction section, though we agree that adding more mentions of it in the MA and FEA Discussion sections and the Bohaiornithid Ecology and Evolution sections will benefit the manuscript. Most notably Shenqiornis and Sulcavis have similar ecological interpretations, but much of the Shenqiornis skull reconstruction uses Sulcavis bones. Longusunguis is the only other taxon which takes more than two bones from a different taxon, and in this case all but the quadrate are not used in any quanitative measurements. We have ensured that the skull reconstructions presented in Figure 2 show what portions of the skull come from what specimen so that as new material is discovered and phylogenetic relationships are updated it will be clear to future readers which parts of reconstructions will need to be updated.

      • Reviewer 3, on data availability: All data including FEA models and raw measurement data are included in the same repository as the scripts, which we will make clear in the manuscript. Good catch on the data link being dead, we will publish it now.

      As a final note, it was brought to our attention by another colleague that the original manuscript’s ancestral state reconstrction lacked an outgroup. An updated reconstruction using Sapeornis as an outgroup will be included in the revised manuscript. The addition of the outgroup does not change any conclusions of the manuscript.

      We once again thank our reviewers for their valuable feedback and will submit a revised version of this manuscript for publication shortly. Please let us know if you have any additional comments after reading our response that we can take onboard in our revision.

      References

      Artuso, C., C. S. Houston, D. G. Smith and C. Rohner (2020). Great Horned Owl (Bubo virginianus), version 1.0. Birds of the World. A. F. Poole. Ithaca, NY, USA, Cornell Lab of Ornithology.

      Atterholt, J., A. W. Poust, G. M. Erickson and J. K. O'Connor (2021). "Intraskeletal osteohistovariability reveals complex growth strategies in a Late Cretaceous enantiornithine." Frontiers in Earth Science 9: 640220.

      Corbin, C. E., L. K. Lowenberger and B. L. Gray (2015). "Linkage and trade‐off in trophic morphology and behavioural performance of birds." Functional ecology 29(6): 808-815.

      Hone, D. W. E., A. A. Farke and M. J. Wedel (2016). "Ontogeny and the fossil record: what, if anything, is an adult dinosaur?" Biology letters 12(2): 20150947.

      Hu, H. and J. K. O'Connor (2017). "First species of Enantiornithes from Sihedang elucidates skeletal development in Early Cretaceous enantiornithines." Journal of Systematic Palaeontology 15(11): 909-926.

      Miller, C. V., M. Pittman, X. Wang, X. Zheng and J. A. Bright (2023). "Quantitative investigation of Mesozoic toothed birds (Pengornithidae) diet reveals earliest evidence of macrocarnivory in birds." iScience 26(3): 106211.

      Olsen, A. M. (2017). "Feeding ecology is the primary driver of beak shape diversification in waterfowl." Functional Ecology 31(10): 1985-1995.

      Schnell, J. H. (2020). Common Black Hawk (Buteogallus anthracinus), version 1.0. Birds of the World. A. F. Poole and F. B. Gill. Ithaca, NY, USA, Cornell Lab of Ornithology.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current work by Kulich et al. examines the dynamic relocalization of NGR1 (LAZY2) a member of the LAZY protein family which is key for auxin redistribution during gravitropic responses. After gravistimulation of the triple mutant ngr123 (lazy234), the PIN3 activating kinase D6PK is not polarized in the columella cells.

      Strengths:

      The authors show a thorough characterization of NGR1 relocalization dynamics after gravistimulation.

      Weaknesses:

      Genetically the relocalization of D6PK depends on the LAZY protein family, but some essential details are missing in this study. On the one hand, NGR1-GFP does not associate with the BFA compartments and maintains its association with the PM and amyloplasts. On the other hand, D6PK relies on GNOM, via vesicle trafficking sensitive to BFA, suggesting that D6PK follows a different relocalization route than NGR1 which is BFA-insensitive. Based on these observations, D6PK relocalization requires the LAZY proteins, but D6PK and NGR1 relocalize through independent routes. How can this be interpreted or reconciled?

      Response: Since we demonstrated that D6PK does not relocalize in the absence of NGR proteins, we conclude that NGR1 acts upstream of D6PK. The molecular mechanism driving this interaction is not fully understood; however, it is evident that NGR1 triggers the mobilization of D6PK. Despite previous investigations into D6PK mobility, the underlying mechanisms remain elusive. Notably, despite its sensitivity to BFA, D6PK does not localize to BFA bodies and does not undergo conventional endocytosis (https://doi.org/10.1016/j.devcel.2014.05.006). We fully acknowledge the importance and interest in gaining a better understanding of these processes, and it will be a focal point of our future research.

      Two other works (now published) provide valuable and fundamental findings related to the mechanism examined in the current manuscript and display complementary and similar results to the ones shown in the current manuscript. Given the similarities in the examined mechanisms, these preprints should be referenced, recognized, and discussed in the manuscript under review. It is assumed that the three projects were independently developed, but the results of these previous works should be addressed and taken into account at least during the discussion and when drawing any conclusions. This does not mean that this work is less relevant. On the contrary, some of the observations that seem to be redundant are more solid, and firm conclusions can now be drawn from them.

      Response: We have included and discussed these works in the revised discussion

      Reviewer #2 (Public Review):

      Summary:

      This manuscript addresses what rapid molecular events underly the earliest responses after gravity-sensing via the sedimentation of starch-enriched amyloplasts in columella cells of the plant root cap. The LAZY or NEGATIVE GRAVITROPIC RESPONSE OF ROOTS (NGR) protein family is involved in this process and localizes to both the amyloplast and to the plasma membrane (PM) of columella cells.

      The current manuscript complements and extends Nishimura et al., Science, 2023. Kulich and colleagues describe the role of the LZY2 protein, also called NGR1, during this process, imaging its fast relocation and addressing additional novel points such as molecular mechanisms underlying NGR1 plasma membrane association as well as revealing the requirement of NGR1/LZY2, 3,4 for the polar localization of the AGCVIII D6 protein kinase at the PM of columella cells, in which NGR1/LZY2 acts redundantly with LZY3 and LZY4.

      The authors initially monitored relocalization of functional NGR1-GFP in columella cells of the ngr1 ngr2 ngr3 triple mutant after 180-degree reorientation of the roots. Within 10 -15 min NGR1-GFP signal disappeared from the upper PM after reorientation and reappeared at the lower PM of the reoriented cells in close proximity to the sedimented amyloplasts. Reorientation of NGR1-GFP occurred substantially faster than PIN3-GFP reorientation, at about the same time or slightly later than a rise in a calcium sensor (GCaMP3) just preceding a change in D2-Venus auxin sensor alterations. Reorientation of NGR1-GFP proved to be fast and not dependent on a brefeldin A-sensitive ARF GEF-mediated vesicle trafficking, unlike the trafficking of PIN proteins, like PIN3, or the AGCVIII D6 protein kinase. Strikingly, the PM association of NGR1-GFP was highly sensitive to pharmacological interference with sterol composition or concentration and phosphatidylinositol (4)kinase inhibition as well as dithiothreitol (DTT) treatment interfering with thioester bond formation e.g. during S-acylation. Indeed, combined mutation of a palmitoylation site and polybasic regions of NRG1 abolished its PM but not its amyloplast localization and rendered the protein non-functional during the gravitropic response, suggesting NRG1 PM localization is essential for the gravitropic response. Targeting the protein to the PM via an artificially introduced N-terminal myristoylation and an ROP2-derived polybasic region and geranylgeranylation site partially restored its functionality in the gravitropic response.

      Strengths:

      This timely work should be of broad interest to plant, cell and developmental biologists across the field as gravity sensing and signaling may well be of general interest. The point that NGR1 is rapidly responsive to gravistimulation, polarizes at the PM in the vicinity to amyloplast and that this is required for repolarization of D6 protein kinase, prior to PIN relocation is really compelling. The manuscript is generally well-written and accessible to a general readership. The figures are clear and of high quality, and the methods are sufficiently explained for reproduction of the experiments.

      Weaknesses:

      Statistical analysis has been performed for some figures but is lacking for most of the quantitative analyses in the figure legends.

      Response: We added this information to the figure legends

      The title claims a bit more than what is actually shown in the manuscript: While auxin response reporter alterations are monitored, "rapid redirection of auxin fluxes" are not really directly addressed and, while D6PK can activate PIN proteins in other contexts, it is not explicitly shown in the manuscript that PIN3 is a target in the context of columella cells in vivo. A title such as "Rapid redirection of D6 protein kinase during Arabidopsis root gravitropism relies on plasma membrane translocation of NGR proteins" would reflect the results better.

      Response: We modified the title to Rapid translocation of NGR proteins driving polarization of PIN-activating D6 protein kinase during root gravitropism

      Fig. 4: The point that D6PK is transcytosed cannot be made here based on the data of these authors. They should have used a photoswitchable version of NGR1 to show that the same molecules observed at the upper PM are translocated to the lower PM. Nishimura and colleagues actually did that for NGR4. However, this is a lot of work and maybe for NGR1 that fusion would have too low fluorescence intensity (as it was the case for NGR3). So, I think a rewording would be sufficient such as NGR-dependent reorientation of D6PK plasma membrane localization" as this does not say, from where it comes to the lower PM. Theoretically, the signal could also be amyloplast-derived or newly synthesized (or just folded) NGR1-GFP.

      Response: We fully agree and rephrased the text using translocation instead of transcytosis

      The authors make a model in which D6PK AGCVIII kinase-dependent on NGRs activates PIN3 to drive auxin fluxes. However, alterations in auxin responses are observed prior to PIN3 reorientation. They should explain this discrepancy better and clearly describe that this is a working hypothesis for the future rather than explicitly proven, yet.

      Reviewer #3 (Public Review):

      The mechanism controlling plant gravity sensing has fascinated researchers for centuries. It has been clear for at least the past decade that starch-filled plastids (termed statoliths) in specialised gravity-sensing columella cells sense changes in root orientation, triggering an asymmetric auxin gradient that alters root growth direction. Nevertheless, exactly how statolith movement triggers PIN auxin efflux carrier activation and auxin gradient formation has remained unclear until very recently. A series of new papers (in Science and Cell) and this manuscript report how LAZY proteins (also referred to as NEGATIVE GRAVITROPIC 50 RESPONSE OF ROOTS; NGR) play a pivotal role in regulating root gravitropism. In terms of their overall significance, their collective findings provide seminal insights into the very earliest steps for how plant roots sense gravity which are arguably the most important papers about root gravitropism in the past decade.

      In the current manuscript, Kulich et al initially report (through creating a functional NGR1-GFP reporter) that "NGR1-GFP displayed a highly specific columella expression, which was most prominent at the PM and the statolith periphery." Is NGR1-GFP expressed in shoot tissues? If yes, is it in starch sheath (the gravity-sensing equivalent of root columella cells)? The authors also note "NGR1-GFP signal from the PM was not evenly distributed, but rather polarized to the lower side of the columella cells in the vicinity of the sedimented statoliths (Fig. 1A)." and (when overexpressing NGR-GFP) "chloroplasts in the vicinity of the PM strongly correlated with NGR1 accumulating at the PM nearby, similar to the scenario in columella" suggesting that NGR1 does not require additional tissue-specific factors (i.e. trafficking proteins or lipids) to assist in its intracellular movement from plastid to PM.

      Response: Yes, NGR1, also called LAZY2 is expressed in the inner hypocotyl tissues, according to https://doi.org/10.1104/pp.17.00942. Unfortunately, we saw very little signal with our NGR-GFP construct, possibly due to NGR1-GFP weak signal and/or NGR1 being expressed only exclusively in the inner tissues.

      Next, the authors study the spatiotemporal dynamics of NGR1-GFP re-localisation with other early gravitropic signals and/or components Calcium, auxin, and PIN3. The temporal data presented in Figure 1 illustrates how the GCaMP calcium reporter (in panel E) revealed "the first signaling event in the root gravitropic bending is the statolith removal from the top membrane, rather than its arrival at the bottom" It appeared that the auxin DII-VENUS reporter was also changing rapidly (panel G) - was this detectable BEFORE statolith re-sedimentation?

      Response: In our data (Figure 1G), we observe that the increase in signal at the top side begins prior to starch sedimentation, in contrast to the bottom side, where the decrease starts only after starch grains land on the bottom membrane. While this observation aligns with our hypothesis and other data, we refrained from commenting on it due to the small differences between the first 2-3 timepoints, which are obscured by noise. This phenomenon arises because the DII response relies on protein degradation and is relatively slow. Hence, for rapid tracking of the auxin response, we utilized auxin-induced calcium as a proxy, with NPA treatment serving as a negative control.

      Please can the authors explain their NPA result in Fig 1E? Why would treatment with the auxin transport inhibitor NPA block Ca signalling (unless the latter was dependent on the former)?

      Response: Auxin induces rapid calcium transients (e.g., http://dx.doi.org/10.1016/j.cub.2015.10.025). Consequently, when auxin reaches the bottom elongation zone approximately 5-6 minutes after rotation, we observe an increased GCaMP signal at this location. Notably, when we inhibit PIN function using NPA, the GCaMP signal persists, but the difference between the top and bottom diminishes. This validates that the calcium transients at the bottom side can be interpreted as monitoring increase in auxin accumulation as a result of auxin transport.

      They go on to note "This initial auxin asymmetry is mediated by PIN-dependent auxin transport, despite visible polarization of PIN3 can be detected only later" which suggests that PIN activity was being modified prior to PIN polarisation.

      In contrast to other proteins involved in gravity response like RLDs and PINs, NGR1 localization and gravity-induced polarization does not undergo BFA-sensitive endocytic recycling by ARF-GEF GNOM. This makes sense given NGR1 is initially targeted to plastids, THEN the PM. Does NGR1 contain a cleavable plastid targeting signal? The authors go on to elegantly demonstrate that NGR1 PM targeting relies on palmitoylation through imaging and mutagenesis-based transgenic ngr rescue assays.

      Response: Yes, there is weakly conserved plastid targeting signal on NGR1. Although we also started researching in this direction, we quickly realized, that two other groups showed very comprehensive data regarding NGR plastid localization.

      Finally, the authors demonstrate that gravitropic-induced auxin gradient formation is initially dependent on PIN3 auxin efflux activation (prior to PIN3 re-localisation). This early PIN3 activation process is dependent on NGR1 re-targeting D6PK (a PIN3 activating kinase). This elegant molecular mechanism integrates all the regulatory components described in the paper into a comprehensive root gravity sensing model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Line 83: This construct fully rescued the agravitropic bending phenotype of the ngr1/2/3 triple mutant (see further).

      What does it mean the see further in this context?

      Response: It is a reference to the second part of the manuscript (Fig. 3, Supplementary Fig S3, Fig S4), where we extensively address the complementation with wild type and point mutated versions of NGR. There we show that the construct we are using is functional. This does not prove, but strongly imply that the GFP signal we obtain is relevant. We updated the text to point this out.

      Line 101: Timing of events during the gravitropic response

      When describing the equipment employed and the rotation applied to the samples, "the vertical stage microscope and minimized the time required for rotating the sample. 180{degree sign} rotation..."

      The authors mentioned a travel time of 5 minutes first and later of 15 minutes for the relocalization of NGR1. Are these two different experiments? Were there two different rotation angles or degrees applied? Could the authors please rephrase this part of the description to answer these questions and help the reader understand how the assay performed?

      Response: We added this explanation to the text.

      Figure 1 E, F, and G.

      Could the authors please provide pictures and/or videos for the PIN3 localization dynamics, intracellular calcium transients, and auxin reporter DII-Venus? In other words, show the complementing images for Figure 1E, 1F, and 1G as the authors did for Figure 2D where authors presented the pictures and the corresponding quantification plots.

      Response: We wanted to avoid overcrowding the figure, but we would also love to show the videos. Therefore, we did additional supplementary movie 3, where we put all the additional observations.

      Line 194: This implies the existence of posttranslational modifications such as S-acylation to associate with PM.

      Why is this specific modification suggested/examined and no other modification? What is the criteria to select this kind of modification? Based on what premises? Could the authors elaborate on that? Could the authors please include references?

      Response: Thank you for this comment. We of course first checked the prediction tools which have shown very strongly conserved S-acylation side. We now clarified this in the text and added other modifications as an example. Later on, we rule out myristoylation (that happens on the glycins) and prenylation (it happens only at the C-terminus CAAX box).

      Line 255: NGR1 PM localization is synergistically mediated by polybasic regions and a palmitoylation site

      Similarly to the previous commentary, How and why are these regions examined/analyzed? Likewise, why is the palmitoylation site selected? Please provide some background, criteria, and references.

      Response: Here, we clearly state that the prediction of the palmitoylation site is made based on the GPS lipid prediction tool.

      As for the polybasic region, these can be seen upon manual inspection of the primary protein sequence. We simply looked at the protein and saw it there. We rephrased the text so that it is more clear.

      Reviewer #2 (Recommendations For The Authors):

      Please, proofread the manuscript for style and minor language errors.

      Statistical analysis has been performed for some figures but is lacking for most of the quantitative analyses in the figure legends. Where it has been performed it is not given what "n" number of roots, cells, or plasma membranes were analyzed NGR1-GFP and no information is given whether the data is derived from a representative experiment or several or pooled data from several experiments. This certainly requires revision in Fig. 1D-G, Fig. 2B-D, Fig. S2 B,E, Fig. 3B,D, F-H, Fig. S.3 B,D, Fig. S. 4 ,E-H, Fig. 4 D.

      Response: Thank you, we added this information to the figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This fascinating paper by M. Alfatah et al. describes work to uncover novel genes affecting lifespan in the budding yeast S. cerevisiae, eventually identifying and further characterizing a gene, YBR238C, now named AAG1 by the authors. The authors began by considering published gene sets pulled from the Saccharomyces genome database that described increases or decreases in either chronological lifespan or replicative lifespan in yeast. They also began with gene sets known to be downregulated upon treatment with the lifespan-extending TOR inhibitor rapamycin.

      YBR283C was unique in being largely uncharacterized, downregulated upon rapamycin treatment, and linked to both increased replicative lifespan and increased chronological lifespan upon deletion.

      The authors show that YBR283C may act to negatively regulate mitochondrial function, in ways that are both dependent on and independent of the stressresponsive transcription factor Hap4, largely by looking at relative expression levels of relevant mitochondrial genes.

      In a hard-to-fully interpret but well-documented series of experiments the authors note that the two paralogues YBR283C and RMD9 (which have ~66% similarity) (a) have opposite effects when acting alone, and (b) appear to interact in that some phenotypes of ybr283c are dependent on RMD9.

      A particularly interesting finding in light of the current literature and of the authors' strategy in identifying YBR283C is that changes in electron transport chain genes upon rapamycin treatment appear to be affected via YBR283C.

      Based on a series of experiments the authors move to conclude the existence of "a feedback loop between TORC1 and mitochondria (the TORC1-Mitochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes."

      Strengths

      Overall, this study describes a great deal of new data from a large number of experiments, that shed light on the potential specific roles of YBR238C and its paralog RMD9 in aging in yeast, and also underscore the potential of an approach looking for "dark matter" such as uncharacterized genes when seining the increasing deluge of published datasets for new hypotheses to test. This work when revised will become a valuable addition to the field.

      Weaknesses

      A paralog of YBR283C, RMD9, also exists in the yeast genome. While the authors indicate that part of their interest in YBR283C lies in its uncharacterized nature, its paralogue, RMD9, is not uncharacterized but is named due to its phenotype of Required for Meiotic nuclear Division, which is not mentioned or discussed anywhere in the manuscript currently.

      In the context of the current work, in addition to the cited Hillen, H.S et al. and Nouet C. et al, the authors might be very interested in the 2007 Genetics paper "Translation initiation in Saccharomyces cerevisiae mitochondria: functional interactions among mitochondrial ribosomal protein Rsm28p, initiation factor 2, methionyl-tRNAformyltransferase and novel protein Rmd9p" (PMID: 17194786), which does not appear to be cited or discussed in the current version of the manuscript.

      Thank you for your thorough and insightful review of our manuscript. We value your positive feedback and recognition of the strengths in our study. Your constructive comments have been carefully considered, leading to the inclusion of RMD9, identified as 'Required for Meiotic Nuclear Division,' and the addition of the relevant reference (PMID: 12586695) in the revised manuscript. This information has been incorporated into the second paragraph of the "The YBR238C paralogue RMD9 deletion decreases the lifespan of cells" results section.

      Furthermore, we appreciate the reviewer's suggestion to include the 2007 Genetics paper on translation initiation in Saccharomyces cerevisiae mitochondria (PMID: 17194786). This citation has been integrated into our revised manuscript.

      We believe that these revisions significantly strengthen the manuscript and address the concerns raised by Reviewer #1. We thank the reviewer for their time and valuable input.

      Reviewer #2 (Public Review):

      The effectors of cellular aging in yeast have not been fully elucidated. To address this, the authors curated gene expression studies to link genes influenced by rapamycin - a well-known mediator of longevity across model systems - to genes known to affect chronological and replicative lifespan (RLS) in yeast. Through their analyses, they find one gene, ybr238c, whose deletion increases both CLS and RLS upon deletion and that is downregulated by rapamycin. Curiously, despite these selection criteria, the authors only use CLS as a proxy for cellular aging throughout their study and do not explore the effects of ybr238c deletion on RLS. This does not diminish their conclusions, but given the importance of this phenotype in their selection criteria, it is surprising that the authors did not choose to test both types of aging throughout their study.

      Nonetheless, the authors demonstrate that deletion of ybr238c increases CLS across multiple yeast strains and through multiple assays. The authors also test the effects of YBR238C overexpression on lifespan and find the opposite effect, with overexpression yeast showing decreased survival relative to wild-type cells, consistent with "accelerated aging" as the authors propose. The authors also note that ybr238c has a paralog, rmd9, whose deletion decreases CLS and seems to be epistatic to ybr238c, as a double ybr238c/rmd9 mutant has decreased CLS relative to a wild-type strain.

      Collectively, the data presented by the authors convincingly demonstrate that ybr238c influences lifespan in a manner that is distinct from (and likely opposite to) rmd9. However, the authors then link the increased CLS in Δybr238c yeast to mitochondrial function using only a handful of assays that do not directly test mitochondrial function. These include total cellular ATP levels, levels of reactive oxygen species, and the transcript levels of select nuclear-encoded mitochondrial genes. Yeast is well established to generate ATP through non-mitochondrial pathways such as glycolysis in fermentive conditions. While it is possible that the ATP levels assayed in the manuscript were tested in stationary phase, which would more likely reflect "mitochondrial function," the methods nor the figure legends contain these details, which are critical for the interpretation of these data. Similarly, ROS can be generated through non-mitochondrial pathways, and the transcription of nuclear-encoded mitochondrial genes is an indirect measure of mitochondrial function at best. Thus, the authors' proposed connection of ybr238c to mitochondrial function is correlative and should be substantiated with assays that more closely align with organellar function, such as respirometry or assaying the activity of oxidiative phosphorylation complexes. Finally, the authors attempt to tie the phenotypes of mitochondrial dysfunction caused by the deletion of ybr238c to TORC1 signaling, as the gene is influenced by rapamycin. However, the presentation of the data, such as reporting ATP levels as relative percentages or failing to perform appropriate statistical comparisons between conditions in which the authors derive conclusions, renders the data difficult to interpret. As such, this manuscript establishes that ybr238c is rapamycin responsive and influences CLS, but its influence on mitochondrial activity and ties to TORC1 signaling remain speculative.

      We would like to express our gratitude to Reviewer #2 for the thoughtful feedback on our manuscript. We have carefully considered your comments and have made comprehensive revisions to address the concerns raised.

      We appreciate the suggestion to investigate the role of YBR238C in replicative lifespan (RLS). However, we want to bring to your attention that four previous studies (references 7, 39, 40, and 41) have already identified the involvement of YBR238C in the RLS phenotype. Given the existing body of literature on this aspect, we chose not to duplicate these efforts in our study.

      Instead, we focused our efforts on validating the role of YBR238C in chronological lifespan (CLS) phenotype, a finding reported in only one genome-wide study (reference 38). To enhance the comprehensiveness of our study, we performed analyses on different phenotypes, including mitochondria activity and oxidative stress, under both logarithmic-phase (condition for RLS) and stationary phase (condition for CLS). We now clearly indicate the logarithmic-phase/stationary phase conditions in the figure legends of the manuscript, specifying whether the conditions are relevant to RLS or CLS. Additional results of the new experiments have been included in the revised manuscript as supplementary figures (S3E-S3I).

      To address concerns about the indirect nature of our mitochondrial function assays, we have performed relative mitochondria content (S3F), quantification of ROS levels from fermentative to stationary phase conditions (S3G), and assessment in respiratory glycerol medium (S3H), which provides a more direct insight into mitochondrial biology. Additionally, we have investigated the resistance of ybr238c∆ cells to H2O2 toxicity and found them to be more resistant compared to wild-type cells.

      We believe these revisions strengthen the scientific rigor and clarity of our study. We sincerely appreciate the guidance from Reviewer #2, and we hope these modifications address the concerns raised effectively.

      Reviewer #3 (Public Review):

      Summary: The study by Alfatah et al. presented a role for YBR238C in mediating lifespan through improved mitochondrial function in a TOR1-dependent metabolic pathway. The authors used a dataset comparison approach to identify genes positively modulating yeast chronological (CLS) and Replicative (RLS) lifespan when deleted, and their expression is reduced under Rapamycin treatment condition. This approach revealed an unknown, mitochondria-localized yeast gene YBR238C, and through mechanistic studies, they identified its paralogous gene RMD9 regulating lifespan in an antagonistic effect.

      Strengths:

      Findings have valuable implications for understanding the YBR238C-mediated, mitochondrial-dependent yeast lifespan regulation, and the interplay between two paralogous genes in the regulation of mitochondrial function represents an inserting case for gene evolution.

      Weaknesses:

      Overall, the implication/findings of this study are restricted only to the yeast model since these two genes do not have any homology in higher eukaryotes. The primary methods must be carefully designed by considering two different metabolic states: respiration-associated with CLS and fermentation-associated with RLS in a single comparative approach. Yeast CLS and RLS are two completely different processes. It is already known that most gene-regulating CLS is not associated with RLS or vice versa. The method section is poorly written and missing important information. The experimental approaches are poorly designed, and variability across the datasets (e.g., media condition "YPD," "SC" etc.) and their experimental conditions are not well described/considered; thus, presented data are not conclusive, which decreases the overall rigor of the study.

      We sincerely appreciate your thorough review of our manuscript and your insightful comments. We acknowledge the limitation of our study being yeast-specific due to the absence of homologous genes in higher eukaryotes. However, we would like to highlight the significance of our findings in revealing a feedback loop between mitochondrial function and TORC1 signaling (TORC1-Mitochondria-TORC1 or TOMITO signaling process) in cellular lifespan regulation.

      Our interpretation of the experimental results is grounded in recent literature. Two studies (references 62 and 63) support our findings by demonstrating TORC1 activation after mitochondrial electron transport chain dysfunction and the delay in brain pathology progression upon TORC1 inhibition, respectively. These studies, discussed in our manuscript, reinforce the relevance of our work in a broader biological context.

      We recognize the importance of carefully designing our primary methods to account for the different metabolic states associated with cellular processes, such as respiration in cellular lifespan (CLS) and fermentation in replicative lifespan (RLS). We want to bring to your attention that four previous studies (references 7, 39, 40, and 41) have already identified the involvement of YBR238C in the RLS phenotype. To avoid duplicating these efforts, we have chosen not to reiterate these findings in our study. However, we have clarified the logarithmic-phase/stationary phase conditions in the figure legends, specifying their metabolic states relevance to RLS or CLS. Additionally, we have included new supplementary figures (S3E-S3I) to provide further details on the new experiments conducted.

      We appreciate your feedback regarding the clarity and completeness of our method section. In the revised manuscript, we have invested additional effort to enhance the clarity of the method section, providing a more detailed account of the experimental procedures, including the missing information you identified.

      We believe these revisions strengthen the scientific rigor and clarity of our study. We sincerely appreciate the guidance from Reviewer #3, and we hope these modifications address the concerns raised effectively.

      Reviewer #1 (Recommendations For The Authors):

      Thank you for your detailed review and valuable recommendations. We have carefully addressed each of your comments in the revised manuscript. The specific changes made include:

      (1) "TORC1 positively regulates aging, and its inhibition increases lifespan in various eukaryotic organisms including yeast and mammalian 13,26,27,29,30." Here I would suggest replacing "mammalian" with "mammals".

      We have amended the sentence as recommended.

      (2) "Next, we experimentally tested whether the transcriptome longevity signatures are associated with enhanced mitochondrial metabolism, whether the cellular energy level has gone up and cellular stress responses are induced with a switch to oxidative metabolism 47,48." Here I would replace "transcriptome longevity signatures is" with "transcriptome longevity signatures are".

      We have amended the sentence as recommended.

      (3) "Thus, HAP4-independent mechanism does exist through which YBR238C also affects cellular aging (Figure 3I)." I would replace "Thus, HAP4-independent" with "Thus, a HAP4-independent".

      We have amended the sentence as recommended.

      (4) "We examined other mitochondrial dysfunctional conditions to confirm that suppressive effect of rapamycin is not only specific to YBR238C-OE." I would change "that suppressive effect" to "that the suppressive effect".

      We have amended the sentence as recommended.

      (5) "Understanding the mechanism of aging will also require to understand the role of many genes of yet unknown function as YBR238C at the beginning of this work." I would switch "require to understand" to "require understanding".

      We have amended the sentence as recommended.

      (6) "The gene lists that modulate cellular lifespan in aging model organism yeast Saccharomyces cerevisiae were extracted from database SGD 22 and GenAge 23 (as of 8th November 2022)" "yeast" should not be italicized.

      Corrected.

      (7) Figure 1, panels C and D, ybr238c should be italicized.

      Corrected.

      (8) Figure 2B, top left-most (oxidative phosphorylation) network. I might consider repositioning some labels to make them more readable if possible.

      Thank you for your feedback. The figure labels in Figure 2B are default from Metascape analysis, so repositioning isn't feasible. However, we have indicated in the figure legends that the full set of genes for functional enrichment analysis and the MCODE complex is available in Additional File 3.

      (9) Figure 4E, rmd9, pet100, and cox6 should be italicized.

      Corrected.

      (10) Figure 5C, rmd9 and rmd9 ybr238c should be italicized. Corrected.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for your detailed review and valuable recommendations. We have carefully addressed each of your comments in the revised manuscript. The specific changes made include:

      (1) The presentation of data as heatmaps (Figures 1F, 3D, 4C, 4G, 5B, 5H, 5L, 6K) obfuscates the quantitative nature of the data. These data would be much stronger if presented as bar graphs with appropriate statistical analysis. If the authors prefer the visual of the heat map, there should be some statistical analysis performed to accompany these figures. This is particularly important for Figure 3D, in which the authors state "We found that HAP4 deletion significantly decrease the ETC complex I-V genes' expression" (bottom of page 8). As no statistical analyses were performed, the authors should refrain from using such language as it is unsupported by the data as analyzed.

      Thank you for your insightful comments and suggestions regarding the presentation of our data. We appreciate the attention you have given to Figures 1F, 3D, 4C, 4G, 5B, 5H, 5L, and 6K.

      In response to your feedback, we have carefully re-evaluated our approach. Considering the large volume of data associated with our lifespan analysis at different time points, we initially chose to visualize it using heatmaps to comprehensively capture the complexity of the results. However, we have now incorporated quantification information into the heatmaps.

      For Figure 3D, which addresses the impact of HAP4 deletion on the expression of ETC complex I-V genes, we have replaced the heatmap with a bar graph. This modification allows for a clearer representation of the quantitative nature of the data. Moreover, we have conducted thorough statistical analyses comparing data between ybr238c∆ and ybr238c∆ hap4∆ to support the statements made in the text. The results of these analyses are now included in the revised figure. Moreover, we also replaced the Figure 6K heatmap with a bar graph.

      We believe that these changes enhance the interpretability and robustness of our findings. We are grateful for your guidance, and we are confident that these adjustments will strengthen the overall quality of our manuscript.

      (2) The presentation of ATP data, given its importance in supporting the core conclusions of this manuscript, is poor. The conditions under which yeast was collected are not reported, making these data impossible to interpret; total cellular ATP levels would be significantly altered and influenced by separate pathways in fermentive versus stationary phases. Minimally, the authors should describe the conditions of yeast growth (e.g., age, culture media) in which these measurements were made. The presentation of relative ATP percentages is problematic, particularly with measurements that deviate so far from wild-type ATP levels in conditions such as those in Figure 6A, in which the authors report that rapamycin induces a 1200% increase in cellular ATP. Previous papers have established that ATP levels in yeast hover around 4 mM and are stable through the cell cycle and across nutrient conditions (PMID: 30858198, 35438635). Given this, the reported ATP levels would be expected to be near 48 mM, which is strongly outside of the typically accepted values of 1-10 mM for this metabolite. Without understanding the contexts in which these measurements are made, as well as the absolute values for these measurements (which would be easily achievable through the use of a standard curve of ATP), these data are uninterpretable. Furthermore, it seems unlikely that yeast would be able to accommodate shifts of ATP levels that span an order of magnitude without dire cellular consequences, particularly during rapamycin treatment.

      We appreciate the valuable feedback from the reviewer regarding the importance of providing detailed information on yeast growth conditions for interpreting ATP data. In response to this suggestion, we have enhanced the figure legends associated with the relevant figures to include a comprehensive description of the yeast growth conditions. This now specifies the age of the culture, culture media composition, and other pertinent parameters.

      In addressing the concern raised about the rapamycin-induced ATP increase, we have carefully re-examined our experimental procedures. We performed additional experiments and confirmed the consistency of our findings in logarithmic-treated cultures. The results remain in alignment with our initial observations, reinforcing the reliability and reproducibility of our data.

      (3) As stated above, the inference of mitochondrial function from cellular ATP levels, cellular ROS levels, and gene expression of a handful of nuclear-encoded genes is not sound. The authors should include further experimentation as evidence of mitochondrial functionality, such as respirometry or metabolic flux experiments.

      Thank you for your constructive feedback on our manuscript. We appreciate your careful consideration of our work. In response to your concerns regarding the indirect nature of our mitochondrial function assays, we have implemented the following changes: We have incorporated additional assays to provide a more direct insight into mitochondrial biology. Specifically, we performed relative mitochondria content analysis (S3F) and quantified ROS levels under fermentative to stationary phase conditions (S3G). These assays offer a more direct and comprehensive assessment of mitochondrial function. Furthermore, we conducted experiments in respiratory glycerol medium (S3H) to complement our previous findings.

      To further support our claims, we investigated the resistance of ybr238c∆ cells to H2O2 toxicity. Our results demonstrate that these cells exhibit increased resistance compared to wild-type cells. This additional evidence strengthens the link between mitochondrial function and cellular response to oxidative stress.

      We believe these adjustments address your concerns and significantly enhance the robustness of our study. We hope you find these modifications satisfactory. We are grateful for your valuable input, which has undoubtedly improved the clarity and reliability of our findings.

      (4) Multiple gene expression analyses are performed on n=2 measurements, and this should be bolstered by further replicates. Many bar graphs do not have accompanying statistics; these should be added. Some statistical tests are performed across inappropriate comparisons, such as Figure 3G, in which expression levels of mitochondrial genes in both deletion and overexpression strains should be compared to a wild-type control rather than to each other.

      Thank you for your thorough review and constructive feedback on our manuscript. We appreciate your careful examination of our work. In response to your comments, we have made the following revisions to address your concerns: The multiple gene expression analysis in our study focused specifically on ETC genes. It is important to note that ETC genes themselves represent multiple replicates within the ybr238c deletion and overexpression cells, as illustrated in Figures 4D, 4G, and 6B.

      We acknowledge and appreciate your observation regarding Figure 3G. To address this concern, we have revised the statistical comparisons. The expression levels of mitochondrial genes in the overexpression strain are now appropriately compared to a wild-type control. This correction has been applied in the figure that correctly corresponds to text in the manuscript.

      (5) Figure 2B is uninterpretable as it stands, as most gene symbols are obscured.

      We appreciate the reviewer's attention to Figure 2B and the feedback provided. Regarding the gene labels in Figure 2B, we would like to clarify that these labels are default outputs from the Metascape analysis, and unfortunately, repositioning them within the current figure layout isn't feasible without compromising the integrity of the information.

      However, we have taken the reviewer's concern seriously and have made efforts to address the interpretability issue. To provide readers with access to the full set of genes for functional enrichment analysis and the MCODE complex, we have included this information in Additional File 3. The figure legends have been updated accordingly to guide readers to refer to Additional File 3 for a more detailed examination of the gene symbols and their annotations.

      We hope that this solution addresses the concern raised by the reviewer.

      (6) The conclusions to be drawn from Figure 3A are not clear, and this figure is cited only once in the text along with two other figures (page 8).

      Thank you for your valuable feedback. We have carefully considered your comments and made revisions to improve the clarity of the conclusions drawn from Figure 3A.

      (7) Figure 6K reports a range of 100-200% cell survival - how does a cell have 200% survival? Isn't survival binary (i.e., you survive or you are dead)? Perhaps this is meant to be relative to another condition; this should be more clearly stated in the figure, or the axis should be normalized to a maximum of 100% survival.

      Thank you for your guidance and valuable feedback. Based on your recommendation, we have made significant changes to Figure 6K in the revised manuscript. Specifically, we replaced the heatmap with a bar graph to enhance clarity. Additionally, we would like to highlight that cell survival of combined treated cells is measured relative to the control treatment, which is considered 100% survival. This aims to provide a more accurate and comprehensible representation of the data. We believe these modifications contribute to a clearer presentation of our findings.

      (8) The authors state that "TORC1 inhibition in yeast and human cells with mitochondrial dysfunction suppresses their accelerated aging." No studies of aging were done in human cells; survival in response to mitochondrial toxins does not reveal aging phenotypes. To state such is a substantial overstatement and should be amended to perhaps "cellular survival" rather than directly linked to aging.

      We appreciate the careful review of our manuscript and the constructive feedback provided by the reviewer. In response to the concern raised regarding the statement about TORC1 inhibition and accelerated aging in human cells, we have revised the relevant passage as follows: "In turn, TORC1 inhibition in yeast and human cells with mitochondrial dysfunction enhances their cellular survival." We believe that this modification accurately reflects the outcomes of our experiments and addresses the concern raised by the reviewer. We would like to express our gratitude for the valuable feedback, which has contributed to the improvement of our manuscript. Thank you for your thoughtful consideration.

      Reviewer #3 (Recommendations For The Authors):

      Thank you for your detailed review and valuable recommendations. We have carefully addressed each of your comments in the revised manuscript. The specific changes made include:

      The authors should have attempted to fully characterize the RLS and CLS phenotype of strains lacking the YBR238C and RMD9 gene, the single most important gene identified in this study. Before further characterization, its association with aging must be tested to replicate findings from the literature. Although Figure 3 shows partially characterized CLS in SC medium, different media conditions could be tested, and the full spectrum of CLS lifespan curves should be represented. RLS phenotypes of these cells were not analyzed throughout the study.

      We appreciate the suggestion to investigate the role of YBR238C in both Replicative Lifespan (RLS) and Chronological Lifespan (CLS). However, it's essential to note that the involvement of YBR238C in the RLS phenotype has been previously documented in four studies (references 7, 39, 40, and 41). Considering the established literature on this matter, we chose not to duplicate these efforts in our study.

      Our primary focus was on confirming the role of YBR238C in the chronological lifespan (CLS) phenotype, as indicated by a genome-wide study (reference 43). Accordingly, we also conducted an analysis of the role of RMD9 in CLS. The methods and figure legends explicitly state that CLS experiments for prototrophic CEN.PK113-7D strains were conducted in synthetic defined (SD) medium containing 6.7 g/L yeast nitrogen base with ammonium sulfate without amino acids and 2% glucose. For auxotrophic BY4743 strains, SD medium was supplemented with histidine (40 mg/L), leucine (160 mg/L), and uracil (40 mg/L).

      It is important to clarify that SC medium was not used for CLS analysis. Instead, we employed SD medium, recommended for CLS analysis (reference 15; PMID: 22768836). The CLS experiments were conducted using three different methods, providing a comprehensive representation of the entire CLS lifespan (Figures 1C, 1D, 1E, and 1F).

      While we did not present the Replicative Lifespan (RLS) phenotype explicitly, we performed experiments such as mitochondrial activity and ROS production under both CLS and RLS conditions. These additional analyses contribute valuable insights into the broader implications of YBR238C and RMD9 on cellular function.

      We believe that these clarifications and the inclusion of additional experimental details enhance the robustness and validity of our findings. We hope these explanations address the concerns raised by the reviewer and contribute to the overall improvement of our manuscript.

      In addition, authors include RNAseq data from Rapamycin-treated cells to identify differentially expressed genes. Notably, genes with decreased expression were used to compare KO strains' lifespan phenotype. Additional RNAseq analyses were performed on individual KO cells. The methodology section needs to be better written with information on which media and metabolic state that these cells are collected after treatment with rapamycin. If the cells are collected during logarithmic growth, the data can be compared with RLS aging gene sets only. A separate experiment has to be performed on stationary cells (respiratory) to collect RNAseq data after rapamycin treatment, then can be compared to the CLS aging gene set.

      Thank you for your insightful comments and considerations regarding our methodology for obtaining Rapamycin response genes (RRGs). We appreciate the opportunity to address your concerns and provide further clarification on our experimental approach.

      As mentioned in our manuscript, we obtained RRGs by treating logarithmic cells with 50 nM Rapamycin for 1 hour, and the details have been included in supplementary Figure S1C legends. Our primary objective was to compare these RRGs with agingassociated genes that modulate both Replicative Lifespan (RLS) and Chronological Lifespan (CLS). We acknowledge the significance of this comparison and believe that our approach, treating logarithmic cells, is suitable for achieving this goal.

      It is important to note that the use of a higher concentration of Rapamycin for treatment renders the cells less efficient in terms of growth, resulting in a very low optical density (OD) at 72 hours, as illustrated in Figure 6H. Unfortunately, due to this limitation in growth efficiency, obtaining Rapamycin response genes at the stationary phase was not feasible in our experimental setup.

      As the experimental conditions vary among the reports and the gene expression signature significantly changes under different metabolic conditions, the media condition that samples are collected for RNAseq analyses should match the media condition that the lifespans of those KO strains are tested. However, more information needs to be detailed on these methodologies. For example, the transcriptomic signature of the YBR238C KO strain should be done under both fermentative and respiratory conditions to understand the true gene expression signature associated with CLS and RLS. Throughout the manuscript, these two metabolic conditions and associated lifespan types (CLS vs. RLS) are not differentiated and treated as the same, probably causing the biggest confounding effect that resulted in the identification of a single yeast-specific gene.

      We obtained the transcriptomic signature of the YBR238C KO strain from logarithmic phase cultures. This consistency was maintained to align with the Rapamycin Response Genes (RRGs) obtained from logarithmic cells treated with rapamycin. Detailed methodology and metabolic status information is provided in the method section and relevant figure legends.

      To broaden the scope of our study, we conducted analyses on various phenotypes, including mitochondrial activity and oxidative stress, under both logarithmic phase (relevant to Replicative Lifespan, RLS) and stationary phase (relevant to Chronological Lifespan, CLS). We have now explicitly indicated the logarithmic phase/stationary phase conditions in the figure legends of the manuscript, specifying their relevance to RLS or CLS.

      Results from these additional experiments have been incorporated into the revised manuscript as supplementary figures (S3E-S3I). We believe that these clarifications and the inclusion of additional experimental details enhance the robustness and validity of our findings. We trust that these explanations effectively address the concerns raised by the reviewer and contribute to the overall improvement of our manuscript.

      YBR238C gene KO effect on mitochondrial function missing comprehensive characterization. Whether the improved mito function caused by increased mtDNA copy number and/or increased mitochondrial number could be easily tested by analyzing normalizing RNAseq reads from mtDNA genes to reads from nucDNA genes. Data could be further combined with western blot specific to mito membrane proteins to analyze mito copy number.

      Thank you for your insightful comments and suggestions. Following your recommendation, we conducted an assessment of relative mitochondrial content (see Figure S3F) and observed significantly higher mtDNA content in the ybr238c∆ compared to the wild type (see Figure S3F). Additionally, we have incorporated the methodology for mitochondrial DNA copy number analysis in the methods section.

      The two paralogous gene interaction is an interesting observation. However, in yeast, it is known that deletion of one of the paralogous genes causes copy number amplification of the certain chromosome that the other paralogous gene is located, causing aneuploid chromosome. Many of the observed phenotypes can be associated with increased chromosome copy number and should be carefully tested. However, the authors did not consider this important point. Simply, using RNA seq data normalized read/per chromosome could be plotted to analyze the karyotype of YBR238C and RMD9 KO cells.

      We appreciate your thoughtful consideration of our work and the suggestion to investigate chromosome copy number variations. While we did not directly test the chromosome copy, we want to highlight that our study extensively explores the impact of YBR238C on cellular lifespan through an RMD9-dependent mechanism (Figure 5). Deletion of YBR238C increases, whereas overexpression of YBR238C decreases the expression of its paralog, RMD9 (Figure 5F). Furthermore, this phenotype is associated with the lifespan of YBR238C-deleted and overexpressed cells. In our study, we have thoroughly investigated this aspect.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the care and the detail shown by the Reviewers. Their comments have made our article more focused and more accessible to a general audience.

      We would like to begin with a comment about the last sentence of the “eLife assessment”. The evolution of metamorphosis in insects was a major triumph in animal evolution that subsequently impacted almost every aspect of plant and animal evolution in the terrestrial and freshwater aquatic biospheres. Unlike the metamorphoses of most other groups, whose evolutions are lost in time, insect evolution arose relatively recently (~400 mya) and insect orders have branched off at various points in this evolution and have persisted to modern times. Although these “relic” groups also have undergone millions of years of evolution and specialization, they still provide us with windows into how this progression may have come about. The study of these groups provides a unique opportunity to explore the mechanisms that underlie major life history shifts and should be of interest to anyone interested in evolution – not just entomologists.

      Reviewer #1 (Public Review):

      Summary:

      This paper provides strong evidence for the roles of JH in an ametabolous insect species. In particular, it demonstrates that:

      • JH shifts embryogenesis from a growth mode to a differentiation mode and is responsible for terminal differentiation during embryogenesis. This, and other JH roles, are first suggested as correlations, based on the timing of JH peaks, but then experimentally demonstrated using JH antagonists and rescue thereof with JH mimic. This is a robust approach and the experimental results are very convincing.

      • JH redirects ecdysone-induced molting to direct formation of a more mature cuticle

      • Kr-h1 is downstream of JH in Thermobia, as it is in other insects, and is a likely mediator of many JH effects

      • The results support the proposed model that an ancestral role of JH in promoting and maintaining differentiation was coopted during insect radiations to drive the evolution of metamorphosis. However, alternate evolutionary scenarios should also be considered.

      Strengths:

      Overall, this is a beautiful, in-depth student. The paper is well-written and clear. The background places the work in a broad context and shows its importance in understanding fundamental questions about insect biology. The researchers are leaders in the field, and a strength of this manuscript is their use of a variety of different approaches (enzymatic assays, gene expression, agonists & antagonists, analysis of morphology using different types of microscopy and detection, and more) to attack their research questions. The experimental data is clearly presented and carefully executed with appropriate controls and attention to detail. The 'multi-pronged' approach provides support for the conclusions from different angles, strengthening conclusions. In sum, the data presented are convincing and the conclusions about experimental outcomes are well-justified based on the results obtained.

      Weaknesses:

      This paper provides more detail than is likely needed for readers outside the field but also provides sufficient depth for those in the field. This is both a strength and a weakness. I would suggest the authors shorten some aspects of their text to make it more accessible to a broader audience. In particular, the discussion is very long and accompanied by two model figures. The discussion could be tightened up and much of the text used for a separate review article (perhaps along with Figure 11) that would bring more attention to the proposed evolution of JH roles.

      We appreciate the comments about the strengths and weaknesses of the paper. To deal with the weaknesses, we have condensed some of the Results to make them less cumbersome and the Discussion has been completely revised, keeping a sharp focus on the actions of JH in Thermobia embryos and how these actions relate to the status quo functions of JH in insects with metamorphosis. As part of the revision of the Discussion, we have replaced Figures 10 and 11.

      Reviewer #1 (Recommendations For The Authors):

      In keeping with my public review, this paper is very strong and I have very few suggestions for improvement. They are:

      (1) Thermobia are extant insects and are not ancestral insects. It is likely that they retain features found in an insect ancestor. However, these insects have been evolving for a very long time, and for any one feature, many changes may have occurred, both gain and loss of gene function and morphology. Further, even for morphological features present in an extant species that are the same as an ancestor, genetic pathways regulating this feature may have changed over time (see for examples papers from the Haag and Pick labs). Although I realize this is a small, possibly almost semantic point, I feel it is important to be precise here. For example, in the title, "before" is speculative as there could have been a different role in the ancestor with the role in embryogenesis arising in lineages leading to Thermobia; similarly in the abstract, "this ancestral role of JH' is an overstatement since we cannot actually measure the ancestral role.

      Since the title has already been cited in a Perspectives review, we decided to keep the title as is.

      (2) I don't understand the results in Met and myo in Fig. 3B. Perhaps include them in the explanation of Fig.3 and not after the description of Fig. 4 and explain them in more detail (or perhaps not include them at all?). I don't really understand the statistical analysis of these panels either.

      We have revised the figure legends to explain the statistics.

      (3) Another point regarding language - talking about the embryo being "able" to go through a developmental stage implies decision-making. I would suggest dropping that wording (e.g, in the description of Fig. 5C). Similarly, in explaining Fig. 6B, it would be more correct to say "JH treatment no longer inhibited" than as written "could no longer inhibit" (implying 'no matter how hard it tried, it still couldn't do it')

      We have removed the “can’t” wording. Figure 6 has been revised

      Reviewer #2 (Public Review):

      The authors have studied in detail the embryogenesis of the ametabolan insect Thermobia domestica. They have also measured the levels of the two most important hormones in insect development: juvenile hormone (JH) and ecdysteroids. The work then focuses on JH, whose occurrence concentrates in the final part (between 70 and 100%) of embryo development. Then, the authors used a precocene compound (7-ethoxyprecocene, or 7EP) to destroy the JH producing tissues in the embryo of the firebrat T. domestica, which allowed to unveil that this hormone is critically involved in the last steps of embryogenesis. The 7EP-treated embryos failed to resorb the extraembryonic fluid and did not hatch. More detailed observations showed that processes like the maturational growth of the eye, the lengthening of the foregut and posterior displacement of the midgut, and the detachment of the E2 cuticle, were impaired after the 7EP treatment. Importantly, a treatment with a JH mimic subsequent to the 7EP treatment restored the correct maturation of both the eye and the gut. It is worth noting that the timing of JH mimic application was essential for correcting the defects triggered by the treatment with 7EP.

      This is a relevant result in itself since the role of JH in insect embryogenesis is a controversial topic. It seems to have an important role in hemimetabolan embryogenesis, but not so much in holometabolans. Intriguingly, it appears important for hatching, an observation made in hemimetabolan and in holometabolan embryos. Knowing that this role was already present in ametabolans is relevant from an evolutionary point of view, and knowing exactly why embryos do not hatch in the absence of JH, is relevant from the point of view of developmental biology.

      The unique and intriguing aspect of juvenile hormone is its status quo action in the control of metamorphosis. Our reason for dealing with an insect group that branched off from the line of insects that eventually evolved metamorphosis, was to gain insight into the ancestral functions of this hormone. Our data from Thermobia as well as that from grasshoppers and crickets indicate that the developmental actions of JH were originally confined to embryogenesis where it promoted the terminal differentiation of the embryo. Its actions in promoting differentiation also included suppressing morphogenesis. This latter function was not pronounced during embryogenesis because JH only appeared after morphogenesis was essentially completed. However, it was a preadaptation that proved useful in more derived insects that delayed aspects of morphogenesis into the postembryonic realm. JH was then used postembryonically to inhibit morphogenesis until late in juvenile growth when JH disappears, and this inhibition is released.

      Then, the authors describe a series of experiments applying the JH mimic in early embryogenesis, before the natural peak of JH occurs, and its effects on embryo development. Observations were made under different doses of JHm, and under different temporal windows of treatment. Higher doses triggered more severe effects, as expected, and different windows of application produced different effects. The most used combination was 1 ng JHm applied 1.5 days AEL, checking the effects 3 days later. Of note, 1.5 days AEL is about 15% embryonic development, whereas the natural peak of JH occurs around 85% embryonic development. In general, the ectopic application of JHm triggered a diversity of effects, generally leading to an arrest of development. Intriguingly, however, a number of embryos treated with 1 ng of JHm at 1.5 days AEL showed a precocious formation of myofibrils in the longitudinal muscles. Also, a number of embryos treated in the same way showed enhanced chitin deposition in the E1 procuticle and showed an advancement of at least a day in the deposition of the E2 cuticle.

      While the experiments and observations are done with great care and are very exhaustive, I am not sure that the results reveal genuine JH functions. The effects triggered by a significant pulse of ectopic JHm when the embryo is 15% of the development will depend on the context: the transcriptome existing at that time, especially the cocktail of transcription factors. This explains why different application times produce different effects. This also explains why the timing of JHm application was essential for correcting the effects of 7EP treatment. In this reasoning, we must consider that the context at 85% development, when the JH peaks in natural conditions and plays its genuine functions, must be very different from the context at 15% development, when the JHm was applied in most of the experiments. In summary, I believe that the observations after the application of JHm reveal effects of the ectopic JHm, but not necessarily functions of the JH. If so, then the subsequent inferences made from the premise that these ectopic treatments with JHm revealed JH functions are uncertain and should be interpreted with caution.

      We disagree with the reviewer. An analogous situation would be in exploring gene function in which both gain-of-function and loss-of-function experiments often provide complementary insights into how a gene functions. We see JH effects only when its receptor, Met, is present and JH can induce its main effector protein, Kr-h1. The latter gives us confidence that we are looking at bona fide JH effects. We have also kept in mind, though, that the nature of the responding tissues is changing through time. Nevertheless, we see a consistent pattern of responses in the embryo and these can be related to its postembryonic effects in metamorphic insects.

      Those inferences affect not only the "JH and the progressive nature of embryonic molts" section, but also, the "Modifications in JH function during the evolution of hemimetabolous and holometabolous life histories" section, and the entire "Discussion". In addition to inferences built on uncertain functions, the sections mentioned, especially the Discussion, I think suffer from too many poorly justified speculations. I love speculation in science, it is necessary and fruitful. But it must be practiced within limits of reasonableness, especially when expressed in a formal journal.

      We have tried to dial back the speculation.

      Finally, In the section "Modifications in JH function during the evolution of hemimetabolous and holometabolous life", it is not clear the bridge that connects the observations on the embryo of Thermobia and the evolution of modified life cycles, hemimetabolan and holometabolan.

      Our Figure 12 should put this into perspective.

      Reviewer #2 (Recommendations For The Authors):

      Main points

      (1) Please, reduce the level of overinterpretation of ectopic treatment experiments with JHm, since the resulting observations represent effects, but not necessarily functions of JH.

      We have revised this section to indicate that the “effects” of ectopic treatments provide insights into the function of JH. Using a genetic analogy, both “loss-of-function” and “gain-of-function” experiments provide insights into a given gene. (see response to Public Comments)

      (2) Especially in the sections "JH and the progressive nature of embryonic molts" and "Modifications in JH function during the evolution of hemimetabolous and holometabolous life histories", and the entire "Discussion", please keep the level of speculation within reasonable limits, avoiding especially the inference of conclusions on the basis of speculation, itself based on previous speculation.

      We have toned down some of the speculation and provided reasons why it is worth suggesting.

      (3) Please revisit the argued roles of myoglianin in the story, in light of its effects as an inhibitor of JH production, repressing the expression of JHAMT, as has been reliably demonstrated in hemimetabolan species (DOI: 10.1073/pnas.1600612113 and DOI: 10.1096/ fj.201801511R).

      Our appreciation to the reviewer. We are more explicit about the relationship between JH and myo.

      Minor points

      (4) Please keep the consistency of the scientific binomial nomenclature for the species mentioned. For example, read "Manduca sexta" (in italics) at the first mention, and then "M. sexta" (in italics) in successive mentions (instead of reading "Manduca" on page 17, and then "Manduca sexta" on page 18, for example). The same for "Drosophila" ("Drosophila melanogaster" first, and then "D. melanogaster"), "Thermobia" ("Thermobia domestica" first, and then "T. domestica"), etc. In the figure legends, I recommend using the complete name: Thermobia domestica, in the main heading.

      Where there is no possibility of confusion, we intend to use Thermobia, rather than T. domestica, etc. We think that it is easier for a non-specialist to read and it is commonly done in endocrine papers.

      (5) There is no purpose in evolution and biological processes. Thus, I suggest avoiding expressions that have a teleological aftertaste. For example (capitals are mine), on p. 3 "appears to have been extended into postembryonic life where it acts TO antagonize morphogenic and allow the maintenance of a juvenile state".

      We have tried to avoid teleological wording.

      (6) The title "The embryonic role of juvenile hormone in the firebrat, Thermobia domestica, reveals its function before its involvement in metamorphosis" contains a redundancy ("role" and "function"), and an apparent obviousness ("before its involvement in metamorphosis"). I suggest a more straightforward title. Something like "Juvenile hormone plays developmental functions in the embryo of the firebrat Thermobia domestica, which predate its status quo action in metamorphosis".

      As noted above, we are retaining the title since it has already been cited.

      (7) Page 2. "The transition from larva to adult then occurred through a transitional stage, the pupa, thereby providing the three-part life history diagnostic of the "complete metamorphosis" exhibited by holometabolous insects (reviews: Jindra, 2019; Truman & Riddiford, 2002, 2019)". I suggest adding the reference ISBN: 9780128130209 9 7 8 - 0 - 1 2 - 8 1 3 0 2 0 - 9, as the most comprehensive and recent review on complete metamorphosis.

      Done

      (8) Page 3. "These severe developmental effects suggest that the developmental role of JH in insects was initially CONFINED to the embryonic domain" (capitals are mine). This appears contradictory with the observations of Watson, 1967, on the relationships between the apparition of scales and JH, mentioned shortly before by the authors.

      This is explained in the Discussion. Although JH can suppress scale appearance in the J4 stage, we have not been able to show that scales appearance is caused by changes in the juvenile JH titer.

      (9) Page 4. "we measured JH III levels during Thermobia embryogenesis at daily intervals starting at 5 d AEL". Why not before, like in the case of ecdysteroids? The authors might perhaps argue that the levels of Kr-h1 expression are consistently low from the very beginning, according to Fernandez-Nicolas et al, 2022 (reference cited later in the manuscript).

      (10) Page 4. "Ecdysteroid titers through embryogenesis and the early juvenile instars were measured using the enzyme immunoassay method (Porcheron et al., 1989) that is optimized for detecting 20-hydroxyecdysone (20E)". The antibody generated by Porcheron (and now sold by Cayman) recognizes ecdysone and 20-hydroxyecdysone alike. But that's not relevant here. I would refer to "ecdysteroids" when mentioning measurements. Also in figure 2B (and "juvenile hormone III" without the formula, in Panel A, for harmonization). And I would not expand on specifications, like those at the beginning of page 5, or towards the end of page

      We thank the reviewer for this important correction.

      (12) ("the fact that we detected only a slight rise in ecdysteroids at this time (Fig 2B) is likely due to the assay that we used being designed to detect 20E rather than ecdysone").

      Omitted.

      (11) Page 5. "Low levels of Kr-h1 transcripts were present at 12 hr after egg deposition, but then were not detected until about 6 d AEL when JH-III first appeared". There is a very precise Kr-h1 pattern in Fernandez-Nicolas et al. 2023 (reference mentioned later in the manuscript).

      (12) Page 5. "notably myoglianin (myo), have become prominent as agents that promote the competence and execution of metamorphosis in holometabolous and hemimetabolous insects (He et al., 2020; Awasaki et al., 2011)". See my note 3 above.

      The myoglianin issue has been revised.

      (13) Page 5. "a drug that suppresses JH production". Rather, "a drug that destroys the JH producing tissues". Why the way, do the authors know when the CA are formed in T. domestica embryo development?

      We prefer to keep our original wording. There have been some cases in which precocene has blocked JH production but did not kill the CA cells. We do not have observations that show that 7EP kills the CA cells in Thermobia embryos.

      (14) Page 5. "subsequent treatment with a JHm". I would say here that the JHm is pyriproxyfen, not on page 6 or page 7. Thus, to be consistent, after the first mention of "pyriproxyfen (JHm)" on page 5, I'd consistently use the abbreviation "JHm".

      (15) Page 9. "Limb loss in such embryos was often STOCHASTIC, i.e., in a given embryo some limbs were completely lost while others were maintained in a reduced state" (capitals are mine). The meaning of "stochastic" is random, involving a random variable; it is a concept usually associated to probability theory and related fields. I suggest using the less specialized word "variable", since to ascertain that the values are really stochastic would require specific mathematical approaches.

      We are still using stochastic because the loss is random.

      (16) Page 10. "9E). Indeed, the JH treatment redirects the molt to be more like that to the J2 stage, rather than to the E2 (= J1) stage". Probably too assertive given the evidence available (see my points 1 and 2 above).

      We do not see a problem with our conclusion. In response to the JHm treatment, the embryo produced a smooth, rather than a “pebbly” cuticle, failed to make the J1-specific egg tooth, and attempted to make cuticular lenses (a J2 feature). This ability of premature JH exposure to cause embryos to “skip” a stage is also seen in locusts (Truman & Riddiford, 1999) and crickets (Erezyilmaz et al., 2004). The JHm treatment resulted in the production of smooth cuticle, lack of a hatching tooth, and an attempt to make cuticular lenses.

      (17) Page 11. "early JHM treatment", read "early JHm treatment".

      Corrected

      (18) Page 11. "likely. A target of JH, and likely Kr-h1, in Thermobia is myoglianin...". Please see my notes 1, 2, and especially 3, above.

      This has been revised

      (19) Page 13. "the locust, Locusta americana (Aboulafia-Baginshy et al.,1984)". Please read "the locust, Locusta migratoria (Aboulafia-Baginshy et al.,1984)".

      Corrected

      (20) Page 13 "Acheta domesticus" three times. The correct name now is "Acheta domestica", after harmonizing the declension of the specific name with the generic one. See additionally my note 4 above.

      Acheta domesticus has been used in hundreds (thousands?) of papers since it was originally named by Linnaeus. We will continue to use it.

      (21) Page 15, "(also called the vermiform larva (Bernays, 1971) redirects embryonic development to form an embryo with proportions, cuticular pigmentation, cuticular sculpturing and bristles characteristic of a nymph, while pronymph modifications, such as the cuticular surface sculpturing (Bernays, 1971)". The reference "Bernays, 1971" is indeed "Bergot et al., 1971".

      There was a mistake in the references. The Bernays reference was omitted from the revised Discussion

      (22) Page 16. "Since JH also induces Kr-h1 in embryos of many insects, including Thermobia". I'm not sure that this has been studied in many insects. In any case, any reference would be useful.

      (23) Page 17. "Tribolium casteneum". Please read "Tribolium castaneum".

      Changed

      (24) Page 17. "...results in a permanent larva that continues to molt well after it has surpassed its critical weight (He et al., 2019)". The paper of He et al., 2019 is preceded by two key papers that previously demonstrate (and in hemimetabolan insects) that myoglianin is a determining factor in the preparation for metamorphosis: DOI: 10.1073/pnas.1600612113 and DOI: 10.1096/ fj.201801511R). See my note 3 above.

      Corrected in revision

      (25) Page 18. "These persisting embryonic primordia join the wing primordia in delaying their morphogenesis into postembryonic life". This reader does not understand this sentence.

      Made clearer in the revision.

      (26) Page 18. "is first possible in the commercial silkworm (Daimon et al., 2015)". Please mention the scientific Latin name of the species, Bombyx mori.

      (27) Page 19. "The functioning of farnesol derivatives in growth versus differentiation control extends deep into the eukaryotes.../... this capacity was eventually exploited by the insects to provide the hormonal system that regulates their metamorphosis". This information appears quite out of place.

      We have retained this point.

      (28) Page 21. Heading "Hormones". I suggest using the heading "Bioactive compounds", as neither pyriproxyfen nor 7-ethoxyprecocene are hormones.

      Done

      (29) Page 29, legend of figure 1. "Photomicrographs" is somewhat redundant. The technical word is "micrographs". "Thermobia domestica" appears in the explanation of panel C, but this is not necessary, as the name appears in the main heading of the legend.

      Done

      (30) Page 30, legend of figure 2. Panel B, see my comment 10 above. Why embryonic age is expressed in % embryo development in panel C (and in days in panels A and B)?

      All have been converted to days AEL

      (31) Page 35, legend of figure 5. "Photomicrograph" see my note 28 above.

      Done

      (32) Page 40, figure 10. In panel A, the indication of the properties of JH is misleading. The arrow going to promoting differentiation and maturation is OK, but the repression sign that indicates suppression of morphogenetic growth and cell determination seems to suggest that JH has retroactive effects. In panel B, I suggest to label "Flies" instead of "Higher Diptera", which is an old-fashioned term. In any case, see my general comments 1 and 2, above, about speculation.

      Figure has been completely revised

      (33) Figure 11. See my general comments 1 and 2, above, about speculation.

      Figure has been revised

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors use inhibitors and mimetics of juvenile hormone (JH) to demonstrate that JH has a key role in late embryonic development in Thermobia, specifically in gut and eye development but also resorption of the extraembryonic fluid and hatching. They then exogenously apply JH early in development (when it is not normally present) to examine the biological effects of JH at these stages. This causes a plethora of defects including developmental arrest, deposition of chitin, limb development, and enhanced muscle differentiation. The authors interpret these early effects on development as JH being important for the shift from morphogenetic growth to differentiation - a role that they speculate may have facilitated the evolution of metamorphosis (hemi- and holo-metaboly). This paper will be of interest to insect evo-devo researchers, particularly those with interests in the evolution of metamorphosis.

      Strengths:

      The experiments are generally conducted very well with appropriate controls and the authors have included a very detailed analysis of the phenotypes.

      The manuscript significantly advances our understanding of Thermobia development and the role of JH in Thermobia development.

      The authors interpret this data to present some hypotheses regarding the role of JH in the evolution of metamorphosis, some aspects of which can be addressed by future studies.

      Weaknesses:

      The results are based on using inhibitors and mimetics of JH and there was no attempt to discern immediate effects of JH from downstream effects. The authors show, for instance, that the transcription of myoglianin is responsive to JH levels, it would have been interesting to see if any of the phenotypic effects are due to myoglianin upregulation/suppression (using RNAi for example). These kinds of experiments will be necessary to fully work out if and how the JH regulatory network has been co-opted into metamorphosis.

      We agree completely and should be a feature of future work.

      The results generally support the authors' conclusions. However, the discussion contains a lot of speculation and some far-reaching conclusions are made about the role of JH and how it became co-opted into controlling metamorphosis. There are some interesting hypotheses presented and the author's speculations are consistent with the data presented. However, it is difficult to make evolutionary inferences from a single data point as although Thermobia is a basally branching insect, the lineage giving rise to Thermobia diverged from the lineages giving rise to the holo- and hemimetabolous insects approx.. 400 mya and it is possible that the effects of JH seen in Thermobia reflect lineage-specific effects rather than the 'ancestral state'. The authors ignore the possibility that there has been substantial rewiring of the networks that are JH responsive across these 400 my. I would encourage the authors to temper some of the discussion of these hypotheses and include some of the limitations of their inferences regarding the role of JH in the evolution of metamorphosis in their discussion.

      We have tried to be less all-encompassing in the Discussion. The strongest comparisons can be made between ametabolous and hemimetabolous insects and we have focused most of the Discussion on the role of JH in that transition. We still include some discussion of holometabolous insects because the ancestral embryonic functions of JH may be somehow related to the unusual reappearance of JH in the prepupal period. We have reduced this discussion to only a few sentences.

      Reviewer #3 (Recommendations For The Authors):

      (1) The overall manuscript is very long (especially the discussion), and the main messages of the manuscript get lost in some of the details. I would suggest that the authors move some of the results to the supplementary material (e.g. it might be possible to put a lot of the detail of Thermobia embryogenesis into the supplementary text if the authors feel it is appropriate). The discussion contains a lot of speculation and I suggest the authors make this more concise. One example: At the moment there is a large section on the modification in JH function during the evolution of holo and hemi-metabolous life history strategies. There are some interesting ideas in this section and the authors do a good job of integrating their findings with the literature - but I would encourage the authors to limit the bulk of their discussion to the specific things that their results demonstrate. E.g. The first half of p17 contains too much detail, and the focus should be on the relationship with Thermobia (as at the bottom of p17).

      Section has been revised and is more focused

      (2) I would also suggest a thorough proofread of the manuscript, I have highlighted some of the errors/points of confusion that I found in the list below - but this list is unlikely to be exhaustive . We appreciate catching the errors. Hopefully the final version is better proofed.

      (3) It might be me, but I found the wording in the second half of the abstract a bit confusing. Particularly the statement about the redeployment of morphogen systems - could this be stated more clearly?

      Abstract has been revised.

      (4) Introduction

      a. "powered flight" rather than 'power flight'

      Done

      b. 'brought about a hemimetabolous lifecycle' implies causality which hasn't been shown and directionality to evolution - suggest 'facilitated the evolution of a hemi...". Similar comment for 'subsequent step to complete metamorphosis'.

      c. Bottom of p2 - unclear whether you are referring to hemi- holo- or both

      d. Suggest removing sentence beginning "besides its effects..." as the relevance of the role of JH in caste isn't clear.

      Kept sentence but removed initial clause

      e. State that Thermoia is a Zygentoma.

      Done

      f. Throughout - full species names on first usage only, T. domestica on subsequent usages.

      We will continue to use genus names for the reason given above.

      Gene names e.g. kr-h1 in italics.

      g. 'antagonise morphogens"? rather than 'antagonise morphoentic'.

      Done

      (5) Results

      a. Unclear why drawings are provided rather than embryonic images in Fig. 1A

      We think that the points can be made better with diagrams.

      b. Top of p4, is 'slot' the correct word?

      Corrected

      c. Unclear why the measurements of JHIII weren't measured before 5 days AEL, especially given that many of the manipulative experiments are at earlier time points than this. I appreciate that, based on kr-h1, levels that JHIII is also likely to be low.

      d. Reference for the late embryonic peak of 20E being responsible for the J2 cuticle?

      Clarified that this is an assumption

      e. Clarify "some endocrine related transcripts" why were these ones in particular picked? Kr-h1 is a good transcriptional proxy for JH and Met is the JH-receptor, why myoglianin and not some of the other transcriptional proxies of neuroendocrine signalling?

      Hopefully, the choice is clearer.

      f. Fig 2C rather than % embryo development for the gene expression data please represent this in days (to be consistent with your other figures).

      It is now consistent with other parts of figure.

      g. In Fig. 3 the authors do t-tests, because there are three groups there needs to be some correction for multiple testing (e.g. Bonferroni) can the authors add this to the relevant methods section?

      We think that pair-wise comparisons are appropriate.

      h. Fig. 3 legend: you note that you treat stage 2 juveniles with 7EP - I couldn't tell what AEL this corresponded to.

      This is after hatching so AEL does not apply.

      i. Top of p7 'deformities' rather than 'derangements'?

      Done

      j. Regarding the dosage effects of embryonic abnormalities - it would be good to include these in the supp material, as it convinces the reader that the effects you have seen aren't just due to toxicity.

      It is not clear what the objection is.

      k. Bottom of p7 'problematic' not 'problematical'

      Done

      l. P8 Why are the clusters of Its important? - provide a bit more interpretation for the reader here.

      This is clear in the revised version.

      m. P9 Why is the modulation of transcription of kr-h1, met, and myo important in this context

      Explained

      n. P9 'fig. 7F'? there is no Fig. 5F

      Thanks for catching the typo.

      o. Fig. 7B add to the legend which treatment the dark and light points correspond to.

      We think it is obvious from the labeling on Fig 7B.

      (6) Discussion:

      a. What do we know about how terminal differentiation is controlled in non-insect arthropods? Most of the discussion is focused on insects (which makes sense as JH is an insect-specific molecule), but if the authors are arguing the ancestral role of JH it would be useful to know how their findings relate to non-insect arthropods.

      We have not been able to find any information about systemic signals being involved in non-insect arthropods.

      b. There is no Fig. 5E (are they referring to 7E?)

      Yes, it should have been Fig. 7E.

      c. Is myoglianin a direct target of JH in other species?

      Other reports are in postembryonic stages and show that myoglianin suppresses JH production. Our paper is the first examination in embryos and we find that the opposite is true – i.e., that JH treatment suppresses myoglianin production. We suspect that these two signaling systems are mutually inhibitory. It would be interesting to see whether treatment of a post-critical weight larva with JH (which would induce a supernumerary larval molt) would also suppress myoglianin production (as we see in Thermobia embryos).

      d. P12 What is the evidence that JH interacts with the first 20E peak to alter the embryonic cuticle?

      We are not sure what the issue is. The experimental fact is that treatment with JH before the E1 ecdysteroid peak causes the production of an altered E1 cuticle. We are faced with the question of why is this molt sensitive to JH when the latter will not appear until 3 or 4 days later? A possible answer is that the ecdysone response pathway has a component that has inherent JH sensitivity. The mosquito data suggest that Taiman provides another link between JH and ecdysone action

      e. Top of p13 - this paragraph can be cut down substantially. Although this is evidence that JH can alter ecdysteriods - it is in a species that is 400 my derived from the target species. Is it likely to be the exact same mechanism? I would encourage the authors to distil and retain the most important points.

      This paragraph has been shortened and focused.

      f. Bottom of p13 - what does this study add to this knowledge?

      The response of Thermobia embryos to JH treatment is qualitatively the same as seen in other short germband embryos. This similarity supports the assumption that the same responses would have been seen in their last common ancestor.

      g. P19 the last paragraph in the conclusions is really peripherally relevant to the paper and is a bit of a stretch, I would encourage the authors to leave this section out.

      We agree that it is a stretch. JH and its precursor MF are the only sesquiterpene hormones. How did they come about to acquire this function? We think it is worth pointing out the farnesol metabolites have been associated with promoting differentiation in various eukaryotes. An ancient feature of these molecules in promoting (maintaining?) differentiation may have been exploited by the insects to develop a unique class of hormones. It is worth putting the idea out to be considered.

      h. P19 "conclusions" rather than 'concluding speculations'.

      Changed as suggested.

      Methods:

      It is standard practice to include at least two genes as reference genes for RT-qPCR analysis (https://doi.org/10.1186/gb-2002-3-7-research0034, https://doi.org/10.1373/clinchem.2008.112797) If there are large-scale differences in the tissues being compared (e.g. as there are here during development) then more than two reference genes may be required and a reference gene study (such as https://doi.org/10.3390%2Fgenes12010021) is appropriate. Have the authors confirmed that rp49 is stably expressed during the stages of Thermobia development that they assay here?

      We have explained our choice in the Methods.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes a new method for sequence-based remote homology detection. Such methods are essential for the annotation of uncharacterized proteins and for studies of protein evolution.

      Strengths:

      The main strength and novelty of the proposed approach lies in the idea of combining stateof-the-art sequence-based (HHpred and HMMER) and structure-based (Foldseek) homology detection methods with recent developments in the field of protein language models (the ESM2 model was used). The authors show that features extracted from high-dimensional, information-rich ESM2 sequence embeddings can be suitable for efficient use with the aforementioned tools.

      The reduced features take the form of amino acid occurrence probability matrices estimated from ESM2 masked-token predictions, or structural descriptors predicted by a modified variant of the ESM2 model. However, we believe that these should not be called "embeddings" or "representations". This is because they don't come directly from any layer of these networks, but rather from their final predictions.

      We agree that there is some room for discussion about whether the amino acid probabilities returned by pre-trained ESM-2 and the 3Di sequences returned by ESM-2 3B 3Di can be properly referred to as “embeddings”. The term “embedding” doesn’t have a formal definition, other than some kind of alternative vector representation of the input data which, preferably, makes the input data more suitable for some downstream task. In that simple sense of the word “embedding”, amino acid probabilities and 3Di sequences output by our models are, indeed, types of embeddings. We posed the question on Twitter (https://twitter.com/TrichomeDoctor/status/1715051012162220340) and nobody responded, so we are left to conclude that the community is largely ambivalent about the precise definition of “embedding”.

      We’ve added language in our introduction to make it more clear that this is our working definition of an “embedding”, and why that definition can apply to profile HMMs and 3Di sequences.

      The benchmarks presented suggest that the approach improves sensitivity even at very low sequence identities <20%. The method is also expected to be faster because it does not require the computation of multiple sequence alignments (MSAs) for profile calculation or structure prediction.

      Weaknesses:

      The benchmarking of the method is very limited and lacks comparison with other methods. Without additional benchmarks, it is impossible to say whether the proposed approach really allows remote homology detection and how much improvement the discussed method brings over tools that are currently considered state-of-the-art.

      We thank the reviewer for the comment. To address the question, we’ve expanded the results by adding a new benchmark and added a new figure, Figure 4. In this new content, we use the SCOPe40 benchmark, originally proposed in the Foldseek paper (van Kempen et al., 2023), to compare our best method, ESM-2 3B 3Di coupled to Foldseek, with several other recent methods. We find our method to be competitive with the other methods.

      We are hesitant to claim that any of our proposed methods are state-of-the-art because of the lack of a widely accepted standard benchmark for remote homology detection, and because of the rapid pace of advancement of the field in recent years, with many groups finding innovative uses of pLMs and other neural-network models for protein annotation and homology detection.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a number of exploratory applications of current protein representations for remote homology search. They first fine-tune a language model to predict structural alphabets from sequence and demonstrate using these predicted structural alphabets for fast remote homology search both on their own and by building HMM profiles from them. They also demonstrate the use of residue-level language model amino acid predicted probabilities to build HMM profiles. These three implementations are compared to traditional profile-based remote homology search.

      Strengths:

      • Predicting structural alphabets from a sequence is novel and valuable, with another approach (ProstT5) also released in the same time frame further demonstrating its application for the remote homology search task.

      • Using these new representations in established and battle-tested workflows such as MMSeqs, HMMER, and HHBlits is a great way to allow researchers to have access to the state-of-the-art methods for their task.

      • Given the exponential growth of data in a number of protein resources, approaches that allow for the preparation of searchable datasets and enable fast search is of high relevance.

      Weaknesses:

      • The authors fine-tuned ESM-2 3B to predict 3Di sequences and presented the fine-tuned model ESM-2 3B 3Di with a claimed accuracy of 64% compared to a test set of 3Di sequences derived from AlphaFold2 predicted structures. However, the description of this test set is missing, and I would expect repeating some of the benchmarking efforts described in the Foldseek manuscript as this accuracy value is hard to interpret on its own.

      The preparation of training and test sets are described in the methods under the heading “Fine tuning ESM-2 3B to convert amino acid sequences into 3Di sequences”. Furthermore, there is code in our github repository to reproduce the splits, and the entire model training process: https://github.com/seanrjohnson/esmologs#train-esm-2-3b-3di-starting-from-the-esm-2-3bpre-trained-weights

      We didn’t include the training/validation/test splits in the Zenodo repository because they are very large: train 33,924,764; validation 1,884,709; test 1,884,710 sequences, times 2 because there are both amino acid and 3Di sequences. It comes out to about 30 Gb total, and is easily rebuilt from the same sources we built it from.

      We’ve added the following sentence to the main text to clarify:

      “Training and test sets were derived from a random split of the Foldseek AlphaFold2 UniProt50 dataset (Jumper et al., 2021; van Kempen et al., 2023; Varadi et al., 2022), a reducedredundancy subset of the UniProt AlphaFold2 structures (see Methods for details).”

      To address the concern about comparing to Foldseek using the same benchmark, we’ve expanded the results section and added a new figure, Figure 4 using the SCOPe40 benchmark originally presented in the Foldseek paper, and subsequently in the ProstT5 paper to compare Foldseek with ESM-2 3B 3Di to Foldseek with ProstT5, AlphaFold2, and experimental structures.

      • Given the availability of predicted structure data in AFDB, I would expect to see a comparison between the searches of predicted 3Di sequences and the "true" 3Di sequences derived from these predicted structures. This comparison would substantiate the innovation claimed in the manuscript, demonstrating the potential of conducting new searches solely based on sequence data on a structural database.

      See response above. We’ve now benchmarked against both ProstT5 and AF2.

      • The profile HMMs built from predicted 3Di appear to perform sub-optimally, and those from the ESM-2 3B predicted probabilities also don't seem to improve traditional HMM results significantly. The HHBlits results depicted in lines 5 and 6 in the figure are not discussed at all, and a comparison with traditional HHBlits is missing. With these results and presentation, the advantages of pLM profile-based searches are not clear, and more justification over traditional methods is needed.

      We thank the reviewer for pointing out the lack of clarity in the discussion of lines 5 and 6.

      We’ve re-written that section of the discussion, and reformatted Figure 3 to enhance clarity.

      We agree, a comparison to traditional HHBlits could be interesting, but we don’t expect to see stronger performance from the pLM-predicted profiles than from traditional HHBlits, just as we don’t see stronger performance from pLM-hmmscan or pLM-Foldseek than from the traditional variants. We think that the advantages of pLM based amino acid hmm searches are primarily speed. There are many variables that can influence speed of generating an MSA and HMM profile, but in general we expect that it will be much slower than generating an HMM profile from a pLM.

      We don’t know why making profiles of 3Di sequences doesn’t improve search sensitivity, we just think it’s an interesting result that is worth presenting to the community. Perhaps someone can figure out how to make it work better.

      • Figure 3 and its associated text are hard to follow due to the abundance of colors and abbreviations used. One figure attempting to explain multiple distinct points adds to the confusion. Suggestion: Splitting the figure into two panels comparing (A) Foldseek-derived searches (lines 7-10) and (B) language-model derived searches (line 3-6) to traditional methods could enhance clarity. Different scatter markers could also help follow the plots more easily.

      We thank the reviewer for this helpful comment. We’ve reformatted Figure 3 as suggested, and we think it is much easier to read now.

      • The justification for using Foldseek without amino acids (3Di-only mode) is not clear. Its utility should be described, or it should be omitted for clarity.

      To us, the use of 3Di-only mode is of great theoretical interest. From our perspective, this is one of our most significant results. Previous methods, such as pLM-BLAST and related methods, have made use of very large positional embeddings to achieve sensitive remote homology search. We show that with the right embedding, you don’t need very many bits per position to get dramatically improved search sensitivity from Smith-Waterman, compared to amino acid searches. We also doubt that predicted 3Di sequences are the optimal small encoding for remote homology detection. This result and observation opens up an exciting avenue for future research in developing small, learned positional embeddings that are optimal for remote homology detection and amenable to SIMD-optimized pre-filtering and Smith-Waterman alignment steps.

      We’ve expanded the discussion, explaining why we are excited about this result.

      • Figure 2 is not described, unclear what to read from it.

      It's just showing that ESM-2-derived amino acid probabilities closely resemble amino acid frequencies in MSAs. We think it gives readers some visual intuition about why predicted profile HMMs perform as well as they do. We’ve added some additional explanation of it in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The paper would mainly benefit from a more comprehensive benchmark:

      We suggest that the authors extend the benchmark by including the reference methods (HHpred and Foldseek) run with their original representations, i.e., MSAs obtained with 2-3 iterations of hhblits (for HHpred) and experimental or predicted structures (for Foldseek). HHpred profile-profile comparisons and Foldseek structure-structure comparisons would be important reference points for assessing the applicability of the proposed approach in distant homology detection. It is also essential to compare the method with other emerging tools such as EBA (DOI: 10.1101/2022.12.13.520313), pLM-BLAST (DOI: 10.1101/2022.11.24.517862), DEDAL (DOI: 10.1038/s41592-022-01700-2), etc.

      We also suggest using an evolutionary-oriented database for the benchmark, such as ECOD or CATH (these databases classify protein domains with known structures, which is important in the context of including Foldseek in the benchmark). We ran a cursory benchmark using the ECOD database and generated HH-suite .hhm files (using the single_seq_to_hmm.py and hhsearch_multiple.py scripts). Precision and recall appear to be significantly lower compared to "vanilla" hhsearch runs with MSA-derived profiles. It would also be interesting to see benchmarks for speed and alignment quality.

      The pLM-based methods for homology detection are an emerging field, and it would be important to evaluate them in the context of distinguishing between homology and analogy. In particular, the predicted Foldseek representations may be more likely to capture structural similarity than homology. This could be investigated, for example, using the ECOD classification (do structurally similar proteins from different homology groups produce significant matches?) and/or resources such as MALISAM that catalog examples of analogy.

      We’ve added the SCOPe40 benchmark, which we think at least partially addresses these comments, adding a comparison to pLM-BLAST, ProstT5, and AF2 followed by Foldseek. The question of Analogy vs homology is an interesting one. It could be argued that the SCOPe40 benchmark addresses this in the difference between Superfamily (distant homology) and Fold (analogy, or very distant homology).

      Our focus is on remote homology detection applications rather than alignment quality, so we don’t benchmark alignment quality, although we agree that those benchmarks would be interesting.

      Page 2, lines 60-67. This paragraph would benefit from additional citations and explanations to support the superiority of the proposed approach. The fact that flattened embeddings are not suitable for annotating multidomain proteins seems obvious. Also, the claim that "current search implementations are slow compared to other methods" should be supported (tools such as EBA or pLM-BLAST have been shown to be faster than standard MSA-based methods). Also, as we mentioned in the main review, we believe that the generated pseudo-profiles and fine-tuned ESM2 predictions should not be called "smaller positional embeddings".

      Discriminating subdomains was a major limitation of the influential and widely-cited PfamN paper (Bileschi et al., 2022), we’ve added a citation to that paper in that paragraph for readers interested in diving deeper.

      To address the question of speed, we’ve included data preparation and search benchmarks as part of our presentation of the SCOPe40 benchmark.

      Finally, we were not sure why exactly every 7th residue is masked in a single forward pass. Traditionally, pseudo-log likelihoods are generated by masking every single token and predicting probabilities from logits given the full context - e.g. https://arxiv.org/pdf/1910.14659.pdf. Since this procedure is crucial in the next steps of the pipeline, it would be important to either experiment with this hyperparameter or explain the logic used to choose the mask spacing.

      We’ve added discussion of the masking distance to the Methods section.

      Reviewer #2 (Recommendations For The Authors):

      • While the code and data for the benchmark are available, the generation of searchable databases using the methods described for a popular resource such as Pfam, AFDB, SCOP/CATH which can be used by the community would greatly boost the impact of this work.

      3Di sequences predicted by ESM-2 3B 3Di can easily be used as queries against any Foldseek database, such as PDB, AFDB, etc. We’ve added Figure 4E to demonstrate this possibility, and added some related discussion.

      • Minor: In line 114, the text should likely read "compare lines 7 and 8" instead of "compare lines 6 and 7."

      We’ve clarified the discussion of Figure 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their tremendously helpful comments. We outline below changes we have made to the manuscript in response to each point. These include new analyses and a substantial rewrite to address the concerns about lack of clarity.

      We believe the revisions strengthen the evidence for our conclusion that grid fields can be either anchored to or independent from a task reference frame, and that anchoring is selectively associated with successful path integration-dependent behaviour. Our additional analyses of non-grid cells indicate that while some are coherent with the grid population, many are not, suggesting cell populations within the MEC may implement grid-dependent and grid-independent computations in parallel.

      We hope the reviewers will agree that our novel experimental strategy complements and avoids limitations of perturbation-based approaches, and by providing evidence to dissociate the two major hypotheses for whether and when grid cells contribute to behaviour our results are likely to have a substantial impact on the field.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Clark et. al. uncovered an association between the positional encoding of grid cell activity with good performance in spatial navigation tasks that requires path integration, highlighting the contribution of grid firing to behaviour… The conclusions of this paper are mostly well supported by data, the finding about the association between grid cell encoding and behaviour in spatial memory tasks is important. However, some aspects of the analysis need to be clarified or extended.

      Thankyou for the overview and constructive comments.

      (1) While the current dataset aims to demonstrate a "correlation" between grid cell encoding and task performance, the other variables that could confound this correlation should be carefully examined.

      (1.1) The exact breakdown of the fraction of beaconed/non-beaconed/probe trials is never shown. if the session makeup has a significant effect on the coding scheme or other results, this variable should be accounted for.

      The lack of information about the trial organisation was a substantial oversight in our preparation of the first version of the manuscript. Session make up can not account for effects on grid stability and its relationship to behavioural outcome but this was not made at all clear.

      In all sessions trial types were varied in a fixed repeating sequence. Therefore, continuous blocks of trials on which grid firing is anchored (or independent from) the track can not be explained by the mouse experiencing a particular trial type. We have revised the manuscript to make this clearer, e.g. p 5, ‘These switches could not be explained by variation between trials in the availability of cues or rewards, as these were interleaved in blocks that repeated throughout a session (see Methods), whereas periods in which grid cell activity was in a given mode extended across the repeating blocks (e.g. Figures 3D,E, 4A, 5E,F).’ and methods p 12, ‘Trials were delivered in repeating blocks throughout a recording session…’

      (1.2) The manuscript did not provide information about whether individual mice experienced sessions with different combinations of the three trial types, and whether they show different preferences in position or distance encoding even in comparable sessions. This leads to the question of whether different behaviour and activity encoding were dominated by experimental or natural differences between individual mice. Presenting the data per mouse will be helpful.

      As we note above, because trial types were interleaved in a fixed sequence, experience of a particular trial type can not account for switching between task-anchored and taskindependent firing modes. This was insufficiently clear in the first version of the manuscript.

      We varied the proportions of trials of a particular type between sessions with the aim of maximising the number of non-beaconed and probe trials. This was necessary because we find that if we introduce too high a proportion of these trials early in training then mice appear to ‘lose interest’ in the task and their performance drops off. We therefore used an approach in which we increased the proportions of non-beaconed and probe trials over training days as mice became familiar with the task. This is now described in the methods (p 12).

      Because the decision for when to vary the proportion of trial types was based on the previous day’s performance, the experimental design was not optimised for addressing the reviewer’s question about dissociating experimental from natural differences in mice. To provide some initial insight we have analysed the relationship between task anchored coding and proportion of beaconed trials in a session (Figure 3, Figure Supplement 7). While on average there is a higher proportion of trials in which grid fields are task-anchored in sessions with more beaconed trials, this effect is small and most of the variance is independent from the proportion of beaconed trials.

      (1.3) Related to the above point, in Figure 5, the mice appeared to behave worse in probe trials than non-beaconed trials. If the mouse did not know if a trial is a probe or a non-beacon trial, they should behave equivalently until the reward location and thus should stop an equal amount. If this difference is because multiple probe trials are placed consecutively, did the mouse learn that it will not get a reward and then stop trying to get rewards? Did this affect switching between position and distance coding?

      Thankyou for flagging this. This reflected an inconsistency arising from the way we detected stops that we have now corrected. Briefly, the temporal resolution of the processed location data against which the stop detection threshold was applied was insufficiently high. As a result, stops in the non-beaconed group were picked up, as they tended to be longer because mice remained still to consume rewards, whereas some stops in the probe group were missed because they were relatively short. We have corrected this by repeating the analyses on raw position data at the highest temporal resolution available. This analysis is now clearly described in the Methods (see p13 “A stop was registered in Blender3D if the speed of the mouse dropped below 4.7 cm/s. Speed was calculated on a rolling basis from the previous 100 ms at a rate of 60 Hz.”).

      (1.4) It is not shown how the behaviours (e.g., running speed away from the reward zone, licking for reward) in beaconed/non-beaconed/probe trials were different and whether the difference in behaviours led to the different encoding schemes.

      Because trial types were interleaved and repeated with a period less than the length of typical trial sequences during which grid cell activity remained either task-anchored or taskindependent, differences between trial types are unlikely to explain use of the different coding schemes. Hopefully, this is clarified by the comments above.

      To further describe the relationship between behavioural outcomes, trial types and grid anchoring, we now also show running speed as a function of location for each combination of trial types and trial outcomes (Figure 6, Figure Supplement 1). This illustrates and replicates our previous findings (Tennant et al. 2018) that running speed profiles are similar for a given trial outcome regardless of trial type (Figure 6, Figure Supplement 1A), and further further shows that the behavioural profile for a given trial outcome and trial-type does not differ when grid cells are in task-anchored and task-independent modes (Figure 6, Figure Supplement 1B). This further argues against the possibility that difference in behaviours leads to the different encoding schemes.

      (2) Regarding the behaviour and activity encoding on a trial-by-trial basis, did the behavioural change occur first, or did the encoding switch occur first, or did they happen within the same trial? This analysis will potentially determine whether the encoding is causal for the behaviour, or the other way around.

      This is a good question but our experimental design lacks sufficient statistical power to address the timing of mode switches within a trial. This is because mode switching is relatively infrequent (so the n for switching is low) and only a subset of trials are uncued (making the relevant n even lower), while at a trial level the behavioural outcome is variable (increasing the required n for adequate power).

      (3) The author determined that the grid cell coding schemes were limited to distance encoding and position encoding. However, there could be other schemes, such as switching between different position encodings (with clear spatial fields but at different locations), as indicated by Low et. al., 2021, and switching between different distant encodings (with different distance periods). If these other schemes indeed existed in the data, they might contribute to the variation of the behaviours.

      Switching between position encoding schemes appears to be rare within our dataset and unlikely to contribute to variation in behaviour. In most sessions we did not observe switching between grid phases / position encodings (e.g. Figures 2A-B, 3B-E, 4A, 5C-D, F). In one session we found switching between different phases when grid cells were taskanchored. Because the grid period was unchanged, the spatial periodograms remained similar. We report this example in the revised manuscript (Figure 5E).

      (4) The percentage of neurons categorised in each coding scheme was similar between nongrid and grid cells. This implies that non-grid cells might switch coding schemes in sync with grid cells, which would mean the whole MEC network was switching between distance and position coding. This raises the question of whether the grid cell coding scheme was important per se, or just the MEC network coding scheme.

      We very much appreciate this suggestion. We note first that while the proportion of taskanchored grid and non-grid cells is similar, task-independent periodic firing of non-grid cells is much rarer than for grid cells (Figure 2E), suggesting a dissociation between the populations. To further address the question we have included additional analyses of nongrid cells (Figure 3, Figure Supplement 5). This shows that while some non-grid cells have anchoring that switches coherently with simultaneously recorded grid cells, others do not. Figures 4 and 5 now show examples of non-grid cell activity recorded simultaneously with grid cells.

      Together, our data suggest that the MEC implements multiple coding schemes: one that is associated with the grid network and includes some non-grid cells; and one (or more) that can be independent from the grid network. This dissociation adds to the insights into MEC function that are provided by our study and is now highlighted in the abstract and discussion.

      (5) In Figure 2 there are several cell examples that are categorised as distance or position coding but have a high fraction of the other coding scheme on a per-trial basis. Given this variation, the full session data in F should be interpreted carefully, since this included all cells and not just "stable" coding cells. It will be cleaner to show the activity comparison only between the stable cells.

      We have now included examples in Figure 2A-C where the grid mode is stable throughout a session. As the view of activity at a session level is important, we have not updated Figure 2F, but have clarified the terminology to now clearly refer to classification at either season or trial levels. In addition, we have repeated the analyses shown in Figure 2F but after grouping cells according to whether their firing has a single mode on >85% of the trials (Figure 3 Figure Supplement 4). This analysis supports similar conclusions to those of Figure 2F.

      (6) The manuscript is not well written. Throughout the manuscript, there are many unexplained concepts (especially in the introduction) and methods, mis-referenced figures, and unclear labels.

      We very much appreciate the feedback and have substantially rewritten the manuscript. We have paid particular attention to explaining key concepts in the introduction and have carefully checked the figures. We welcome further feedback on whether this is now clearer.

      Reviewer #2 (Public Review):

      Clark and Nolan's study aims to test whether the stability of grid cell firing fields is associated with better spatial behaviour performance on a virtual task… This study is very timely as there is a pressing need to identify/delimitate the contribution of grid cells to spatial behaviours. More studies in which grid cell activity can be associated with navigational abilities are needed.

      Thank you for the supportive comments and highlighting the importance of the question.

      The link proposed by Clark and Nolan between "virtual position" coding by grid cells and navigational performance is a significant step toward better understanding how grid cell activity might support behaviour. It should be noted that the study by Clark and Nolan is correlative. Therefore, the effect of selective manipulations of grid cell activity on the virtual task will be needed to evaluate whether the activity of grid cells is causally linked to the behavioural performance on this task. In a previous study by the same research group, it was shown that inactivating the synaptic output of stellate cells of the medial entorhinal cortex affected mice's performance of the same virtual task (Tennant et al., 2018). Although this manipulation likely affects non-grid cells, it is still one of the most selective manipulations of grid cells that are currently available.

      Again, thank you for the supportive comments. We recognise the previous version of the manuscript did not sufficiently clarify the motivation for our approach, or the benefits of capitalising on behavioural variable variability as a complementary strategy to perturbation approaches. We now make this clearer in the revised introduction (p 2, paragraphs 2 and 3).

      When interpreting the "position" and "distance" firing mode of grid cells, it is important to appreciate that the "position" code likely involves estimating distance. The visual cues on the virtual track appear to provide mainly optic flow to the animal. Thus, the animal has to estimate its position on the virtual track by estimating the distance run from the beginning of the track (or any other point in the virtual world).

      We appreciate the ambiguity here was confusing. We have re-named the groups to ‘taskanchored’, corresponding to when grid cells encode position on the track (as well as distance as the reviewer correctly points out), and ‘task-independent’, corresponding to the group we previously referred to as distance encoding.

      It is also interesting to consider how grid cells could remain anchored to virtual cues. Recent work shows that grid cell activity spans the surface of a torus (Gardner et al., 2022). A run on the track can be mapped to a trajectory on the torus. Assuming that grid cell activity is updated primarily from self-motion cues on the track and that the grid cell period is unlikely to be an integer of the virtual track length, having stable firing fields on the virtual track likely requires a resetting mechanism taking place on each trial. The resetting means that a specific virtual track position is mapped to a constant position on the torus. Thus, the "virtual position" mode of grid cells may involve 1) a trial-by-trial resetting process anchoring the grid pattern to the virtual cues and 2) a path integration mechanism. Just like the "virtual position" mode of grid cell activity, successful behavioural performance on non-beaconed trials requires the animal to anchor its spatial behaviour to VR cues.

      Reviewer #3 (Public Review):

      This study addresses the major question of 'whether and when grid cells contribute to behaviour'. There is no doubt that this is a very important question. My major concern is that I'm not convinced that this study gives a significant contribution to this question, although this study is well-performed and potentially interesting. This is mainly due to the fact that the relation between grid cell properties and behaviour is exclusively correlative and entirely based on single cell activity, although the introduction mentions quite often the grid cell network properties and dynamics. In general, this study gives the impression that grid cells exclusively support the cognitive processes involved in this task. This problem is in part related to the text.

      Thank you for the comments. We recognise now that the previous text was insufficiently clear. We have modified the introduction to clarify the value of an approach that takes advantage of behavioural variability. Importantly, this approach is complementary to perturbation strategies we and others have used previously. In particular it addresses critical limitations of perturbation strategies which can be confounded by off-target effects and possible adaptation, both of which are extremely difficult to fully rule out. We hope that with this additional clarification it is now clear that as for any important question multiple and complementary testing strategies are required to make progres, and second, that our study makes a new and important contribution by introducing a novel experimental approach and by following this up with careful analyses that clearly distinguish competing hypotheses.

      However, it would be interesting to look at the population level (even beyond grid cells) to test whether at the network level, the link between behavioural performance and neural activity is more straightforward compared to the single-cell level. This approach could reconcile the present results with those obtained in their previous study following MEC inactivation.

      We’re unclear here about what the reviewer means by ‘more straightforward’ as clear relationships between activity of single grid cells and populations of grid cells are well established (Gardner et al., 2021; Waaga et al., 2021; Yoon et al., 2013).

      To give a clearer indication of the corresponding population level representations, as mentioned in response to Reviewer #1, we now include additional data showing many simultaneously recorded neurons, and analyses of non-grid as well as grid cells (Figures 4, 5, Figure 5 Figure Supplement 2).

      To reconcile results with our previous study of MEC inactivation we have paid additional attention to the roles of non-grid cells (following suggestions by Reviewer #1). We show that while some non-grid cells show transitions between task-anchored and task-independent firing that are coherent with the grid population, many others have more stable firing that is independent of grid representations. This is consistent with the idea that the MEC supports localised behaviour in the cued and uncued versions of the task (Tennant et al., 2018), and suggests that while grid cells preferentially contribute when cues are absent, non-grid cells could also support the cued version. We make this additional implication clear in the revised abstract and discussion.

      The authors used a statistical method based on the computation of the frequency spectrum of the spatial periodicity of the neural firing to classify grid cells as 'position-coding' (with fields anchored to the virtual track) and 'distance-coding' (with fields repeating at regular intervals across trials). This is an interesting approach that has nonetheless the default to be based exclusively on autocorrelograms. It would be interesting to compare with a different method based on the similarities between raw maps.

      While our main analyses use a periodogram-based method to identify when grid cells are / are not anchored to the task environment, we validate these analyses by examination of the rate maps in each condition (Figures 2-4). For example, when grid cells are task-anchored, according to the periodogram analysis, the rate maps clearly show spatially aligned peaks, whereas when grid cells are not anchored the peaks in their rate maps are not aligned (Figure 2A vs 2B; Figure 3B-E; Figure 4C). We provide further validation by showing that spatial information (in the track reference frame) is substantially higher when grid cell activity is task-anchored vs task-independent (Figures 2F, 3G, 4F and Figure 3 Figure Supplement 4).

      To further address this point we have carried out additional complementary analyses in which we identify task anchored vs task independent modes using a template matching method applied to the raw rate maps (Figure 6, Figure Supplement 2). These analyses support similar conclusions to our periodogram-based analyses.

      Beyond this minor point, cell categorization is performed using all trial types.

      Each trial type (i.e. beacon or non-beacon) is supposed to force mice to use different strategies and should induce different spatial representations within the entorhinal-hippocampal circuit (and not only in the grid cell system). In that context, since all trials are mixed, it is difficult to extrapolate general information.

      We recognise that the description of the task design was insufficiently clear but are unsure why ‘it is difficult to extrapolate general information’. Before addressing this point, we should first be clear that mice are not ‘forced’ to adopt any particular strategy. Rather, on uncued trials a path integration strategy is the most efficient way to solve the task. However, mice could instead use a less efficient strategy, for example by stopping at short intervals they still obtain rewards. Detailed behavioural analyses indicate that such random stopping strategies are used by naive mice, while with training mice learn to use spatial stopping strategies (Tennant et al. 2018).

      In terms of ‘extracting general information’ from the task, the following findings lead to general predictions: 1) Grid cells can exist in either task-anchored or task-independent periodic firing modes; 2) These modes can be stable across a session, but often modeswitching occurs within a session; 3) While some non-grid cells show task-independent periodic firing, this is much less common than for grid cells, which suggests a model in which many non-grid MEC neurons operate independently from the grid network; 4) When a marker cue is available mice locate a reward equally well when grid cells are in taskanchored versus task-independent modes, which argues against theories in which grid cells are a key part of a general system for localisation; 5) When markers cues are absent taskanchored grid firing is associated with successful reward localisation, which corroborates a key prediction of theories in which grid cells contribute to path integration.

      In revising the manuscript we have attempted to improve the writing to make these advances clearer, and have clarified methodological details that made interpretation more challenging than it should have been. For example, as noted in our response to Reviewer #1, we have included additional details to clarify the organisation of trials and relationships between trials, behavioural outcomes and neural codes observed.

      On page 5 the authors state that 'Since only position representations should reliably predict the reward location, ..., we reasoned that the presence of positional coding could be used to assess whether grid firing contributes to the ongoing behaviour'. I do not agree with this statement. First of all, position coding should be more informative only in a cue-guided trial. Second, distance coding could be as informative as position coding since at the network level may provide information relevant to the task (such as distance from the reward).

      Again, this point perhaps reflects a lack of clarity on our part in writing the manuscript. When grid cells are anchored to the track reference frame (now called ‘tasked anchored’, previously ‘position encoding’), then the location of the rate peaks in grid firing is reliable from trial to trial. This is the case whether or not the trial is cued. When grid cells are independent of the track reference frame (now called ‘task independent’, previously ‘distance encoding’), then the location of the firing rate peaks vary from trial to trial. In the latter case, position can not be read out directly from trial to trial.

      In principle, in the task-independent mode track position could be calculated by storing the grid network configuration at the start of the track, which would differ on each trial, and then implementing a mechanism to readout relative distance as mice move along the track. However, if mice do use this computation we would expect them to do so equally well on cued and uncued trials. By contrast, our results clearly show a dissociation between trial types in the relationship between grid firing and behavioural outcome. We highlight and discuss this possibility in the revised manuscript (p 10, ‘Alternatively, mice could in principle estimate track location with a system that utilises information about distance travelled obtained from task-independent grid representations’).

      Third, position-coding is interpreted as more relevant because it predominates in correct trials. However, this does not imply that this coding scheme is indeed used to perform correct trials.

      We have revised the manuscript to clarify our goal of distinguishing major hypotheses for the roles of grid cells in behaviour (Introduction, ‘On the one hand, theoretical arguments that grid cell populations can generate high capacity codes imply that they could in principle contribute to all spatial behaviours (Fiete et al., 2008; Mathis et al., 2012; Sreenivasan and Fiete, 2011). On the other hand, if the behavioural importance of grid cells follows from their hypothesised ability to generate position representations by integrating self-motion signals (McNaughton et al., 2006), then their behavioural roles may be restricted to tasks that involve path integration strategies.’

      By showing that performance on cued trials is similar regardless of whether grid cells are task-anchored or not, we provide strong evidence against the idea that grid firing is in general necessary for location-based behaviours. By showing that task anchoring is associated with successful localisation when cues are absent we corroborate a key prediction of hypothesised roles for grid cells in path integration-dependent behaviour. Therefore, we substantially reduce the space of behaviours to which grid cells might contribute. Importantly, this space is much larger for the MEC, which is required for cued and uncued versions of the task. We have revised the introduction and discussion to make these points clearer.

      While we believe our results add a key piece of evidence to the puzzle of when and where grid cells contribute to behaviour, we agree that further work will be required to develop and test more refined hypotheses. Alternative models also remain plausible, for example perhaps the behaviourally relevant computations are implemented elsewhere in the brain with grid anchoring to the track as an indirect consequence. Nevertheless, explanations of this kind are more difficult to reconcile with evidence that inactivation of stellate cells in the MEC impairs learning of the task, and other manipulations that modify grid firing impair performance on similar tasks. We now discuss these possibilities (discussion p 10, ‘mice could in principle estimate track location with a system that utilises information about distance travelled obtained from task-independent grid representations’).

      It could be more informative to push forward the correlative analysis by looking at whether behavioural performance can be predicted by the coding scheme on a trial-by-trial basis.

      The previous version of the manuscript showed these analyses (now in Figure 6). Thus, task anchored grid firing predicts more successful performance on uncued trials at the session level (Figure 6A-B) and at the trial level (Figure 6C-D).

      Reviewer #1 (Recommendations For The Authors):

      (1) The author particularly mentioned that the 1D tracks are different from the "cue-rich environments that are typically used to study grid cells". It is not clear what conclusions would hold for a cue-rich environment or a track, which may require relatively less path integration compared to the cue-sparse environment. This point should be discussed.

      This is an important point that we did not pay sufficient attention to in the previous version of the manuscript. Our finding of successful localisation in the cued environment when grid cells are not task anchored implies that grid anchoring is not required to solve cued tasks. The implication here is that cue rich environments may then not be the most suitable for investigation of grid roles in behaviour as non-grid mechanisms may suffice, although this does not rule out the possibility that anchored grid codes may play important roles in learning about cue rich environments. We now address this point in the discussion (p 10, ‘An implication of this result is that cue rich tracks often used to investigate grid activity patterns may not engage behaviours that require anchored grid firing.’).

      (2) It would be good to see the statistics for the number of different cells (stable position or distance encoding, and unstable cells) identified per mouse/session and the number of grid cells per session.

      These are now added to Supplemental Data 2 and will also be accessible through code and datasets that we will make available alongside the version of record.

      (3) Figure 2F: any explanation about why AG cells had high spatial information?

      Previously the calculation used bits per spike and as aperiodic cells have low firing rates the spatial information was high. We have replaced this with bits per second, which provides a more intuitive measure and no longer implies high spatial information. We have amended this in the methods (p 15, ‘Spatial information was calculated in bits per second…’).

      (4) The following methods sections should provide additional details:

      (4.1) Details of the training protocol are largely left to reference papers. The reference papers give a general outline of the training protocol, but the details are not completely comparable given the single experiment performed on these mice. More details should be given on training stages and experience at the time of the experiment.

      The task is more clearly described in the introduction (p 3), and additional details of the training protocol are now provided in the methods (p 12-13).

      (4.2) The methods reference mean speed across sessions, but it is not clear where this was used.

      This was very poor wording. We have now changed this to ‘For each session the mean speed was calculated for each trial outcome’.

      (4.3) The calculation of the spatial autocorrelogram on a per-trial basis should be more explicitly stated. Is it the average of each 10 cm increment with the centre trial?

      We have added additional information to the methods (p 16-18).

      (4.4) 1D field detection is not sufficiently explained in Figure 1/S2. This information should also appear in the methods section.

      This is now clarified on page 16 in section ‘Analysis of neural activity and behaviour during the location memory task’.

      (5) The data in Figure 4A and B only shows speed vs. location for one example mouse. The combined per mouse or per session data should also be shown.

      This is now shown in Figure 5A and Figure 5, Figure Supplemental 2

      (6) Figure 5 is somewhat confusing. Why are A/B by session and C/D by trial? The methods imply that A/B are originally averaged by cell, but that duplicate cells in the same session are excluded because behaviour versus session type is identical. This method should be valid if all grid cells within a session are all "stable". This is likely given the synchrony of code-switching between grid cells, but not all co-active grid cells behaved identically.

      It is understandable that C/D are performed by trial, but it should be made clear that it is not a comparable analysis to A/B. It is unclear what N refers to in C. The figure says by trial, but the legend says the error bar is by cell. If data is calculated by trial and then averaged by cell, this should be more clearly stated.

      In Figure 6A/B (previously Figure 5A/B) we focus our analysis on sessions in which the mode of grid firing, either task-anchored or task-independent, was relatively stable on a trialto-trial basis (see Figure 3F for definitions). This enables us to then compare behaviour averaged across each session, with sessions categorised as task-anchored and task independent. This analysis has the advantage that it focuses on large blocks of time (whole sessions) in which the mode of grid firing is unambiguous, but the disadvantage is that it excludes many sessions in which grid firing switches between task-anchored and taskindependent modes.

      Figure 6C/D (previously Figure 5C/D) addresses this limitation by carrying out similar analyses with behaviour sorted into task-anchored versus task-independent groups at the level of trials. A potential limitation for this analysis is that grid firing is somewhat variable on a trial-by-trial basis and so some trials may be mis-classified. We don’t expect this to lead to systematic bias, but it may make the data more noisy. Nevertheless, these analyses are important to include as they allow assessment of whether conclusions from 6A/B hold when all sessions are considered.

      We have added additional clarification of the rationale for these analyses to the main text (p7-8, ‘’We addressed this by using additional trial-level comparisons’). We have also added clarification in the methods section for categorisation of task-anchored versus taskindependent trials when multiple grid cells were recorded simultaneously (p 17, ‘When assigning a common classification across a group of cells recorded simultaneously...’) and an explanation for the N in the figure legend. We also clarify that the analyses use a nested random effects design to account for dependencies at the levels of sessions and mice (methods, p 20, ‘Random effects had a nested structure to account for animals and sessions…’) .

      (7) Panels E and F of Figure 5 are not explained in the main text.

      This is now corrected (see p8, ‘Additional analyses…’).

      (8) Figure 5: Since stable grid cells and all grid cells are shown, it will be better to show unstable cells, which can be compared with grid cells.

      Given that the rationale for differences between Figure 6A/B and C/D (previously Figure 5AD) were not previously clear, the reason for focussing on stable grid cells here was likely also not clear (see point 6 above). We don’t show unstable grid cells in Figure 6A-B as the behaviour averaged at the level of a session would be a mix of trials when they are taskanchored and when they are task-independent. Therefore, the analysis would not test predictions about the relationship between task-anchored vs task-independent modes and behaviour. We hope this is now clear in the manuscript given the revisions introduced to address point 6 above.

      (9) The methods describing the statistics for these experiments are also confusing. The methods section should be written more clearly, and it should be made clear in the text or figure legend whether this data is the "original" data or is processed in relation to the model, such as excluding duplicate grid cells within a session. The figure legend should also state that a GLMM was used to calculate the statistics.

      We have revised the methods section with the goal of improving clarity, adding detail and removing ambiguity. This includes updates of the methods for the GLMM analysis, which are referred to within the Figure 6 legend. A clear definition of a stable session is now also added to the Figure 6 legend.

      Reviewer #2 (Recommendations For The Authors):

      When grid fields are anchored to the virtual world (position mode), there is probably small trialto-trial variability in the firing location of the firing fields. Is this trial-to-trial variability related to the variability in the stop location? This would provide a more direct link between path integration in grid cell networks and behaviour that depends on path integration.

      When attempting to address this we find that the firing of individual grid cells is too variable to allow sufficiently precise decoding of their fields at a single trial level. This is expected given the Poisson statistics of spike generation and previous evaluations of grid coding (e.g. (Stemmler et al., 2015)).

      The conclusion of the abstract is: "Our results suggest that positional anchoring of grid firing enhances the performance of tasks that require path integration." This statement is slightly confusing. The task requires 1) anchoring the behaviour to the visual cues presented at the start of the trial and 2) path integration from thereon to identify the rewarded location. The performance is higher when grid cells anchor to the visual cues presented at the start of the trial. What the results show is that the anchoring of grid firing fields to visual landmarks enhances the performance of tasks that require path integration from visual landmarks (i.e. grid cells being anchored to the reference frame that is behaviorally relevant).

      To try to more clearly explain the logic and conclusion we have rewritten the abstract, including the final sentence.

      Similar comment for the title of Figure 5: "Positional grid coding is not required for cued spatial localisation but promotes path integration-dependent localisation." Positional coding means that grid cells are anchored to the behaviorally relevant reference frame.

      To address the lack of clarity we have modified the little of Figure 6 (previously Figure 5) to read ‘Anchoring of grid firing to the task reference frame promotes localisation by path integration but is not required for cued localisation’.

      In Figure 1, there is a wide range of beaconed (40-80%) and non-beaconed (10-60%) trials given. It is not 100% clear whether these refer to the percentage of trials of a given type within the recording sessions. Was the proportion of non-beaconed trials manipulated? If so, was the likelihood of position and distance coding changing according to the percentage of nonbeaconed trials?

      The ranges given refer to proportions across different behavioural sessions. Within any given behavioural session the proportion was constant. We now make this clear in the figure legend and in the results and methods sections.

      We did not manipulate proportions of trial types during a session. Manipulations betweens sessions were carried out with the goal of maximising the numbers of uncued trials that the mice would carry out (see response to public comments above). While the effect of trial-type at the session level is not relevant to the hypotheses we aim to test here, we have included an additional analysis of the relationship between task anchoring and the proportions of trial types in a session (Figure 3, Figure Supplement 7)(also discussed above). As disentangling the effects of learning and motivation will be complex and likely require new experimental designs we have not drawn strong conclusions or pursued the analysis further..

      I was not convinced that the labels "position" and "distance" were appropriate for the two grid cell firing modes. My understanding is that the "position" code also requires the grid cell network to estimate distance. It seems that the main difference between the "position" and "distance" modes is that when in the "position" mode, the activity on the torus is reset to a constant toroidal location when the animal reaches a clearly identifiable location on the virtual track. In the "distance" mode, this resetting does not take place.

      As previously mentioned, we agree these terms weren’t the best and have since relabelled these as “task-anchored” and “task-independent”.

      There are a few sections in the manuscript that implicitly suggest that a causal link between grid cell activity and behaviour was demonstrated. For instance: "It has been challenging to directly test whether and when grid cells contribute to behaviour.": The assumption here is that the manuscript overcomes this challenge, but the study is correlative.

      We have modified the wording to be clear that we are introducing new tests of predictions made by hypotheses about causal relationships between grid coding and behaviour (introduction, p 1-2). We also clarify that our results argue against the hypothesis that grid cells provide a general coded for behaviour, but corroborate predictions of hypotheses in which they are specifically important for path integration (discussion, p 10).

      We have modified the title abstract and main text to try to treat claims about causality with care. We now more thoroughly introduce and contrast the approach we report here with previous experiments that use perturbations (introduction, p2). While it is tempting to make stronger claims for causality with these approaches, there are also logical limitations with perturbation-based approaches, for example the challenges of fully excluding off target effects and adaptation. We now explain how these strategies are complementary. Our view is that both strategies will be required to develop strong arguments for whether and when grid cells contribute to behaviour. From this perspective, it is encouraging that our conclusions are in agreement with what are probably the most specific perturbations of grid cells reported to date (Gil et al. 2017), while perturbations that more generally affect MEC function appear to impair cued and path integration-dependent behaviours (Tennant et al. 2018). We now discuss these points more clearly (introduction, p 2).

      I am slightly confused by the references to the panels in Figure 4.

      "In some sessions, localization of the reward occurred almost exclusively when grid cells were anchored to position and not when they encoded distance (Figure 4C). Figure 4C only shows position coding.

      "In other sessions, animals localised the reward when grid firing was anchored to position or distance, but overall performance was improved on positional trials (Figure 4D-E)." The reference should probably point to Figure 4E-F or just to 4E.

      "In a few sessions, we observed spatial stopping behaviour comparable to cued trials, even when grid firing almost exclusively encoded distance rather than position (Figure 4F)." From Figure 4F, it seems that the performance on non-beaconed trials is better during "position" coding.

      We have now updated Figure 5 (Figure 4 in the original manuscript) and references to the Figure in the text. Now Figure 5 shows the activity of cells recorded in stable and unstable task-anchored and task-independent sessions (see Figure 5C-F).

      Minor issues:

      Is this correct: (Figure 4A and Figure 4, Figure Supplement 1).

      This has been corrected.

      Figure 4B: There could be an additional label for position and distance.

      Figure 4B from the original manuscript has now been removed.

      Figure 4C-F. The panels on the right side should be explained in the Figure Legend.

      Legends for Figure 5C-F (previously Figure 4C-F) have now been updated.

      Reviewer #3 (Recommendations For The Authors):

      Specific questions :

      (1) Position coding reflects a coding scheme in which fields are spaced by a fixed distance; previous studies have shown that a virtual track grid map is a slice of the 2D classic grid. In that case, the fields are still anchored to the track but would produce a completely different map. Did the authors check whether it is the case at least for some cells? If not, what could explain such a major difference?

      Το avoid confusion we now use the term ‘task-anchored’ rather than ‘position coding’ (see comments above). We should further clarify that our conclusions rest on whether or not the grid fields are anchored to the track. Task anchored firing does not require that grid fields maintain their spacing from 2D environments, only that fields are at the same track position on each trial. Thus, whether the spacing of the fields corresponds to a slice through a 2D grid makes no difference to the hypotheses we test here.

      We agree that the relationship between 1D and 2D field organisation could be an interesting future direction, for example anchoring could involve resetting the grid phase while maintaining a stable period, or it could be achieved through local distortions in the grid period. However, since these outcomes would not help distinguish the hypotheses we test here we have not included analyses to address them.

      (2) Previous studies have highlighted the role of grid cells in goal coding. Here there is an explicit reward in a particular area. Are there any grid modifications around this area? This question is not addressed in this study.

      Again, we note that the hypotheses we test here relate to the firing mode of grid cells - taskanchored or task-independent - and interpretation of our results is independent from the specific pattern of grid fields on the track. This question nevertheless leads to an interesting prediction that if grid fields cluster in the goal area then this clustering should be apparent in the task-anchored but not the task-independent firing mode.

      We test this by considering the average distribution of firing fields across all grid cells in each firing mode (Reviewer Figure 1). We find that when grid firing is task-anchored there is a clear peak around the reward zone, which is consistent with previous work by Butler et al. and Boccara et al. Consistent with our other prediction, this peak is reduced when grid cells are in the task-independent mode.

      Author response image 1.

      Plot shows the grid field distribution during stable grid cell session (> 85 % task-anchored or task-independent) (A) or during task-anchored and task-independent trials (B). Shaded regions in A and B represent standard error of the mean measured across sessions and epochs respectively.

      (3) The behavioural procedure during recording is not fully explained. Do trial types alternate within the same session by blocks? How many trials are within a block? Is there any relation between trial alternation and the switch in the coding scheme observed in a large subset of the grid cells?

      We agree this wasn’t sufficiently clear in the previous version of the manuscript. Trial types were interleaved in a fixed order within each session. We have updated the results and methods sections to provide details (see responses above).

      (4) From the examples in Figure 2 it seems that firing fields tend to shift toward the start position. Is it the case in all cells? Could this reflect some reorganisation at the network level with cells signalling the starting as time progresses?

      This is inconsistent between cells. To make this variability clear we have included additional examples of spiking profiles from different grid cells (Figure 2 - 5). Because quantification of the phenomena would not, so far as we can tell, help distinguish our core hypotheses we have not included further analyses here.

      (5) Are grid cells with different coding properties recorded in different parts of the MEC? Are there any differences between these cell categories in the 2D map?

      The recordings we made are from the dorsal region of the MEC (stated at the start of the results section). We don’t have data to speak to other parts of the MEC.

      Minor:

      There are very few grid cell examples that repeat in the different figures. I would suggest showing more examples both in the main text and supplementary material.

      We have now provided multiple additional examples in Figures 2, 4 and 5. Grid cell examples repeat in the main figures twice, in both cases only when showing additional examples are shown from the same recording session (Figure 2A example #1 with Figure 5C, Figure 3E with Figure 4A). Further similar repeats are found in the supplemental figures (Figure 3D with Figure 5, Figure Supplement 2A, Figure 3C with Figure 5, Figure Supplement 2F).

      Fig1 A-B shows the predictions in a 1D track based on distance or position coding. The A inset represents the modification of field distribution from a 2D arena to a 1D track, as performed in this study. The inset B is misleading since it represents the modifications expected from a circular track to a 1D track as in Jacob et al 2019, that is not what the authors studied. It would be better to present either the predictions based on the present study or the prediction based on previous studies. In that case, they should mention the possibility that the 1D map is a slice of the 2D map.

      The goal of Figure 1A-B is to illustrate predictions (right) based on conclusions from previous studies (left). Figure 1A shows predicted 1D track firing given anchoring to the environment typically observed in grid cell studies in 2D arenas. Figure 1B shows predicted 1D track firing given the firing shifting firing patterns observed by Jacob et al. in a circular 2D track. To improve clarity, we have modified the legend to make clear that the schematics to the right are predictions given the previous evidence summarised to the left. As we outline above, the critical prediction relates to whether the representations anchor to the track. Whether the 1D representation is a perfect slice isn’t relevant to the hypotheses tested and so isn’t included in the schematic (see comments above).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study is valuable as it sheds light on the pivotal role played by alterations in glycan metabolism within chondrocytes in the onset of cartilage degeneration and early onset of osteoarthritis (OA) through the process of hypertrophic differentiation of chondrocytes, giving insights into the identification of nascent markers for early-stage OA. Although the methods, data, and analyses broadly support the claims, the data shown by the authors are incomplete because the mechanism by which cartilage degeneration induced by changes in glycometabolism occurs has not been fully elucidated. The authors' deductions stand to gain further credence through undertaking additional experiments aimed at analyzing the mechanisms underlying the changes in glycometabolism in cartilage, such as the meticulous identification of the target glycan molecules bearing core fucose and analysis of endochondral ossification in cartilage-specific Fut8 KO mice.

      We wish to express our strong appreciation to the Reviewer for his or her insightful comments on our paper. We feel the comments have helped us significantly improve the paper. In particular, we wish to acknowledge the Reviewer’s highly valuable comments on the effect of Fut8 on endochondral ossification.

      Reviewer #1 (Public Review): :<br /> Summary:

      This study is valuable in that it may lead to the discovery of future OA markers, etc., in that changes in glycan metabolism in chondrocytes are involved in the initiation of cartilage degeneration and early OA via hypertrophic differentiation of chondrocytes. However, more robust results would be obtained by analyzing the mechanisms and pathways by which changes in glycosylation lead to cartilage degeneration.

      Strengths:

      This study is important because it indicates that glycan metabolism may be associated with pre-OA and may lead to the elucidation of the cause and diagnosis of pre-OA.

      We thank reviewer #1 for their interest in our work and their overall positive report.

      Weaknesses:

      More robust results would be obtained by analyzing the mechanism by which cartilage degeneration induced by changes in glycometabolism occurs.

      To understand the mechanisms of cartilage degeneration induced by changes in glycometabolism, we attempted additional experiments using rescue experiments with external administration of TGF-β. We had shown that the addition of mannosidase to an organ culture system of normal wild-type mouse cartilage increased TGF-β gene expression from 6 hours (Fig. 3E) and that TGF-β expression was even suppressed in chondrocytes from Fut8 cKO mice (Fig. 4D). In addition to these results, an early OA model in which mannosidase is added to the cartilage was used to test the effect of exogenous TGF-β. As a result, under TGF-β treated conditions, no degenerative changes occurred when high-mannose type N-glycans were trimmed, and proteoglycan leakage during the recovery period was significantly reduced. This was considered to be a very useful finding and it was decided to include the experimental results in Figure 4F, rather than making them supplement data.

      Reviewer #2 (Public Review):

      Summary:

      This paper consists of mostly descriptive data, judged from alpha-mannosidase-treated samples, in which they found an increase in core fucose, a product of Fut 8.

      Strengths:

      This paper is interesting in the clinical field, but unfortunately, the data is mostly descriptive and does not have a significant impact on the scientific community in general.

      We thank reviewer #2 for their interest in our work and their overall positive report. In response to your comment about our attempts to show that glycan changes occur at the precursor stage of cartilage substrate degeneration and that this glycosylation is also what triggers substrate degeneration, we would like to add that reversing cartilage substrate degeneration is a very ambitious challenge. We are currently in the preparatory stages of characterizing the appropriate glycan-substrate relationships to 'rescue' cartilage tissue from degeneration, and we hope to use this approach to provide information on the pre-developmental stages of OA.

      Weaknesses:

      If core fucose is increased, at least the target glycan molecules of core fucose should be evaluated. They also found an increase in NO, suggesting that inflammatory processes also play an important role in OA in addition to glycan changes.

      As the increase in NO was observed in the organ culture system and cartilage is a tissue without vascular invasion, we thought that the involvement of immune cells could be excluded. On the other hand, our research group has reported that chondrocytes themselves have inflammatory circuits (Ota et al., Arthritis Rheum. 2019. DOI:10.1002/art.41182), but as we did not find increased expression of NF-κB, an indicator of inflammatory amplifier activation, we concluded that inflammation was not involved in this study.

      It has already been reported that core fucose is decreased by administration of alpha-mannosidase inhibitors. Therefore, it is expected that alpha-mannosidase administration increases core fucose.

      The report by Toegel et al. that the synthesis of complex-type N-glycans (Man2a1, Mgat2) is predicted in human OA chondrocytes along with the expression of Fut8 also led to the expectation that administration of α-mannosidase would increase core fucose. However, there was no conclusive evidence that administration of α-mannosidase increased core fucose; in 1987, Vignon et al performed an enzyme assay on experimental OA cartilage (rabbit ACLT model) and showed that mannosidase was very high in operated joints and that its activity increased and decreased with the severity of fibrosis in the cartilage. The results suggest that glycoprotein hexose degradation is an early transient event in the enzymatic process of cartilage destruction. These findings led to the conception of a novel 'pre-OA model' in which mannosidase is added to the joint. The present study is valuable in its demonstration that glycometabolism is a driver of degeneration.

      (see manuscript REF. 25, 9)

      Toegel et al., Arthritis Res. Ther. 2013. DOI:10.1186/ar4330

      Vignon et al., Clin Rheumatol. 1987. DOI:10.1007/BF02201026

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript "Articular cartilage corefucosylation regulates tissue resilience in osteoarthritis", the authors investigate the glycan structural changes in the context of pre-OA conditions. By mainly conducting animal experiments and glycomic analysis, this study clarified the molecular mechanism of N-glycan core fucosylation and Fut8 expression in the extracellular matrix resilience and unrecoverable cartilage degeneration. Lastly, a comprehensive glycan analysis of human OA cartilage verified the hypothesis.

      Strengths:

      Generally, this manuscript is well structured with rigorous logic and clear language. This study is valuable and important in the early diagnosis of OA patients in the clinic, which is a great challenge nowadays.

      We thank reviewer #3 for their interest in our work and their mainly positive report. This is precisely the purpose of our study, as we are primarily interested in the detection of conditions prior to the onset of OA.

      Weaknesses:

      I recommend minor revisions:

      (1) I would suggest the authors prepare an illustrative scheme for the whole study, to explain the complex mechanism and also to summarize the results.

      We would like to thank the reviewer for this comment and have created a new Figure 7 for the overall study scheme.

      We included the following statement in the opening discussion part:

      "The objective of this work was to provide novel and translational insights into pathogenesis of OA associated with changes in glycan structure. A graphical abstract summarizing our findings is shown in Fig. 7." (line199-201, p9)

      (2) Including but not limited to Figures 2A-C, Figures 3A and C, Figure 4B, and Figures 5A and D. The texts in the above images are too small to read, I would suggest the authors remake these images.

      The font size of the figures has been reviewed and revised throughout.

      (3) The paper is generally readable, but the language could be polished a bit. Several writing errors should be realized during the careful check.

      Thanks to your suggestion, I have noticed several writing errors. In addition, we have had the manuscript rewritten by an experienced scientific editor, who has improved the grammar and stylistic expression of the paper.

      (4) As several species and OA models were conducted in this study, it would be better if the authors could note the reason behind their choice for it.

      The authors agree with the reviewer's argument that since several species and OA models were performed in this study, it would be better to note the reason for their choice.

      We first attempted to inject mannosidase into rabbits, matching the animal species to a previous paper showing that N-glycans are altered prior to degeneration of the cartilage matrix. Next, we checked whether similar changes occur in mouse cartilage after mannosidase treatment, assuming that we would verify this in genetically engineered mice. We then used the integrated glycome in human cartilage to see if the corefucosylation phenomenon detected was conserved across species.

      For the modeling of OA in Fut8 cKO mice, the instability-induced OA model and the age-associated OA model were adapted. The former emphasizes mechanical stress factors in OA, the latter aging factors. OA is a multifactorial disease. Therefore, we thought it was appropriate to validate both aspects of OA.

      We included the following statements in each Methods part:

      "We injected mannosidase into rabbit knee joints in accordance with a previous paper showing that N-type glycans are altered prior to cartilage matrix degeneration." (line289-290, p12)

      "Organ culture experiments in mice were established to study the effects of mannosidase on articular cartilage without immunoreaction and in anticipation of later candidate gene research using transgenic mice." (line326-328, p14)

      "To determine whether the glycosylation detected is conserved across species, we analyzed the total glycome in human cartilage." (line407-408, p17)

      We included the following statements in the Discussion part:

      "For the modeling of OA in Fut8 cKO mice, the instability-induced OA model and the age-associated OA model were adapted. The former emphasizes mechanical stress factors in OA, the latter aging factors. OA is a multifactorial disease. Therefore, we thought it was appropriate to validate both aspects of OA." (line254-257, p11)

      Reviewer #1 (Recommendations For The Authors):

      (1) The cited literature states that core fucosylation by FUT8 has a chondroprotective effect via the TGF-β pathway and that the loss of these chondroprotective effects in Fut8 led to cartilage degeneration, but these need to be proven by experiment.

      We agree that corefucosylation and the TGF-β signaling pathway are important lines of investigation. We have now acknowledged this and added in the revised manuscript that additional experiments have shown that TGF-β restores the protective effects of Fut8 cKO cartilage by external administration.

      We included the following statements in the Results part:

      "To evaluate whether TGF-β1 decreases cartilage degeneration after mannosidase stimulation, TGF-β1 was exogenously added to Col2-Fut8−/− cartilage in the presence of α-mannosidase stimulation for 24 h. The samples treated with TGF-β1 leaked significantly less PG following mannosidase stimulation compared to samples not treated with TGF-β1 (Fig. 4F)." (line143-147, p6-7)

      We included the following statements in the Discussion part:

      "Here, the exogenous addition of TGF-β1 rescued them from cartilage degeneration." (line274-275, p12)

      (2) There are skeletal differences in cartilage-specific Fut8 KO mice compared to WT, and the effect of Fut8 on endochondral ossification should also be analyzed.

      We agree that Fut8 is associated with various endochondral ossification processes (for example by the TGF-β signaling pathway). Moreover, we would like to thank the reviewer for the proposed experiment.

      The growth curve was normal at birth, with differences beginning around weaning (~3 w for mice). Therefore, we evaluated the epiphyseal line of 4-week-old mice stained with toluidine, type 10 collagen, and proliferating cell nuclear antigen. This is similar to the epiphyseal growth plate phenotype of Smad3ex8/ex8 mice by Yang et al. and is consistent with the finding that Smad3 deficiency does not affect chondrogenesis during developmental stages, but the hypertrophic zone is increased in 3-4 week-old Smad3 KO mice. Chondrocytes in Fut8 cKO mice were suppressed of Tgf-β expression (Fig. 4D), suggesting that inhibition of TGF-β signaling, which is suppressive for late hypertrophic chondrocyte differentiation, led to the increased height of the hypertrophic zone.

      The results suggested that the growth plate of Fut8 cKO mice had an enlarged hypertrophic layer and decreased primary trabecular bone. Because these results have important implications for the content of the paper, we have included the staining results in Figure 5 and added a graph quantitatively assessing the extent of the hypertrophic zone as supplementary Figure S6.

      We included the following statement in the Results part:

      "To assess the role of FUT8 in endochondral ossification, we performed an epiphyseal plate analysis of 4-week-old Col2-Fut8−/− mice. This uncovered a significant enlargement of the zone of hypertrophic chondrocytes in the growth plates of the long bones of Col2-Fut8−/− mice compared to controls (Fig. 5C, S6 Figure)." (line154-158, p7)

      We included the following statement in the Discussion part:

      "The high-mannose/corefucosylation relationship estimated function to maintain formed cartilage. In endochondral ossification, the Fut8 cKO growth plate had an enlarged hypertrophic zone and reduced primary spongiosa because it is involved in the next process of cartilage replacement into bone rather than the process of cartilage formation." (line214-217, p9)

      Literature mentioned above (not included in manuscript):

      Yang X, et al. TGF-beta/Smad3 signals repress chondrocyte hypertrophic differentiation and are required for maintaining articular cartilage. J Cell Biol. 2001;153(1):35–46.

      (3) The DMM model analysis is performed with n=5 for each group. Please consider if the sample size is sufficient.

      In the literature, the sample sizes for DMM models have varied in previous studies (Doyran et al., n=5; Liao et al., n=6-7; Ouhaddi et al., n=8). Therefore, we performed a preliminary test of the DMM in WT and Flox mice with n=3 each and a power analysis with the outcome set to the OARSI score at 8 weeks. This resulted in n=4. The sample size for this study was increased to n=5 to account for attrition. The summed OARSI score of the WT in this study was comparable to that of Ouhaddi et al. and the model was judged to be working accurately. The summed OARSI score of the WT in this study was comparable to that of Ouhaddi et al. and the model was judged to be working accurately. The summed OARSI score of the WT in this study was comparable to that of Ouhaddi et al. and the model was judged to be working accurately.

      Literature mentioned above (not included in manuscript):

      (1) Doyran B, Tong W, Li Q, Jia H, Zhang X, Chen C, et al. Nanoindentation modulus of murine cartilage: a sensitive indicator of the initiation and progression of post-traumatic osteoarthritis. Osteoarthr Cartil. 2017;25(1):108–17.

      (2) Liao L, Zhang S, Gu J, Takarada T, Yoneda Y, Huang J, et al. Deletion of Runx2 in Articular Chondrocytes Decelerates the Progression of DMM-Induced Osteoarthritis in Adult Mice. Sci Rep. 2017 24;7(1):2371.

      (3) Ouhaddi Y, Nebbaki SS, Habouri L, Afif H, Lussier B, Kapoor M, et al. Exacerbation of Aging-Associated and Instability-Induced Murine Osteoarthritis With Deletion of D Prostanoid Receptor 1, a Prostaglandin D2 Receptor. Arthritis Rheum. 2017;69(9):1784–95.

      Reviewer #2 (Recommendations For The Authors):

      This paper is suitable for publication in clinical Journals related to osteoarthritis and cartilage.

      Identification of core fucosylated glycans from chondrocytes is essential for this type of paper.

      We mentioned that we had identified similar corefucosylated glycans in isolated mouse chondrocytes from the cartilage (line117-118, p5), but we have now also added the following to the subtitle of the Results section to avoid any potential confusion: "Corefucosylated N-glycan was formed in resilient cartilage and its isolated chondrocyte" (line109, p5)

      Thank you again for your comments on our paper. We trust that the revised manuscript is suitable for publication.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript describes fundamental single-molecule correlative force and fluorescence microscopy experiments to visualize the 1D diffusion dynamics and long-range nucleosome sliding activity of the yeast chromatin remodelers, RSC and ISW2. Compelling evidence shows that both remodelers exhibit 1D diffusion on bare DNA but utilize different mechanisms, with RSC primarily hopping and ISW2 mainly sliding on DNA. These results will be of interest to researchers working on chromatin remodeling.

      Reviewer #1 (Public Review):

      Single-molecule visualization of chromatin remodelers on long chromatin templates-a long sought-after goal-is still in its infancy. This work describes the behaviors of two remodelers RSC and ISW2, from SWI/SNF and ISWI families respectively, with well-conducted experiments and rigorous quantitative analysis, thus representing a significant advance in the field of chromatin biology and biophysics. Overall, the conclusions are supported by the data and the manuscript is clearly written. However, there are a few occasions where the strength of the conclusion suffers from low statistics. Some of the statements are too strong given the evidence presented.

      We thank the reviewer for the thorough and considerate review of our manuscript. We have increased the statistics when possible and have toned down the conclusions wherever further experimentation to improve statistics could not be done expeditiously.

      Specific Comments:

      (1) It is confusing what is the difference between the "non-diffusive" behavior of the remodeler upon nucleosome encounter and the nucleosome-translocating behavior in the presence of ATP. For example, in Figure 3F, readers can see a bit of nucleosome translocation in the first segment. Is the lower half-life of "non-diffusive" ISW2 with ATP on a nucleosome array because it is spending more time translocating nucleosomes? The solid and dashed green lines in Figure 3F and 3G are not explained. It is also not explained why Figure 3H and 3I are fit by double exponentials.

      We thank the reviewer for calling upon us to clarify these points. In both the case of translocation and stable non-translocating colocalization, the chromatin remodeler is marked as “non-diffusive” because the molecule is not moving quickly enough to be detected by our rolling-window (20 frames considered) diffusion coefficient analysis. We have updated the text to point out the translocation that is occurring in the panels indicated and noted that this type of motion is not detected by our automated analysis. Thus, translocation events were manually segmented for analysis from kymographs; a note of this was added to the results section (Results section # 1; Paragraph # 2).

      To address the question of whether the half-life of “non-diffusive” ISW2 with ATP on the nucleosome array is because of increased time spent in translocation, we have computed the percentage of “non-diffusive” time spent translocating in the presence of ATP for both remodelers; for ISW2, 14% of “non-diffusive” times are translocation whereas for RSC, 28% of “non-diffusive” times are translocation. Given that these percentages are not negligible, the reviewer helped identify an important parameter that better describes the effects of ATP hydrolysis on nucleosome binding for ISW2. In addition, we computed and compared the half-life of translocation times for both remodelers to the “non-diffusive” times and found that RSC translocates with a half-life of 20 s (similar to the half-life of “non-diffusion”) whereas ISW2 translocates with a half-life of 17 s (longer than the half-life of “non-diffusion”). We believe that this new information improves understanding of the role of ATP hydrolysis in turning over ISW2-nucleosome binding interactions, which result in the shorter “non-diffusive” lifetime as well as the shorter and more rarely observed ISW2 translocation events. We have updated the text to include these observations and our interpretation (Results section # 3; Paragraph # 3). As was already included in the text (Results section # 3; Final Paragraph), we speculate that this behavior may be due to a hydrolysis-dependent turnover of the ISW2-nucleosome bound state and refer the reader to Tim Richmond’s 2004 EMBO paper titled “Reaction cycle of the yeast Isw2 chromatin remodeling complex” in which bulk experiments show that ATP hydrolysis affects ISW2-nucleosome bound lifetimes.

      We thank the reviewer for also pointing out where details were missing from the figure legend and results section regarding Figure 3. We have added a description of the dashed and solid lines to the figure legend (Figure 3; Legend). We have also described why Figures 3H and I are fit to double exponentials to the results section (Results section # 3; Paragraph # 2).

      (2) What is the fraction of 1D vs. 3D nucleosome encountered by the remodelers? This is an important parameter to compare between RSC and ISW2.

      We thank the reviewer for raising this point. We agree that this is an important parameter to compare between RSC and ISW2; knowledge of this parameter would enable quantitative predictions to be made from our data regarding target localization efficiency increases owed to 1D scanning for each remodeler. We regretfully could not quantify this due to technical limitations of our measurements. A note about this limitation along with an explanation for why we were unable to quantify this parameter have been added to the main text (Results section # 3; end of Paragraph # 1).

      (3) A major conclusion stated repeatedly in the manuscript is that nucleosome translocation by a remodeler is terminated by a downstream nucleosome. But this is based on a total of 4 events. The problem of dye photobleaching was mentioned, which is a bit surprising considering that the green excitation was already pulsed. The authors should try to get more events by lowering the laser power or toning down the conclusion that translocation termination is prominently due to blockage by a downstream nucleosome. Quantifying the translocation distances before termination, in addition to the durations (Figure 4G and 4H), would also be helpful.

      We thank the reviewer for these observations and feedback. We agree that only 4 observations of direct visualization of remodeler translocation termination by a downstream nucleosome is a small n-value, and have chosen to omit presentation of these rare events in the manuscript.

      (4) The claim on nucleosome translocation directionality is also based on a small number of events, particularly for RSC. 6/9 is hardly over 50% if one considers the Poisson counting error (RSC was also found to switch directions.) If the authors would like to make a firm statement to support the "push-pull" model, they should obtain more events.

      We thank the reviewer for this critique and agree with the reviewer’s concern. In addition to adding data from two additional experimental replicates of RSC nucleosome translocation (which had the smaller n-value), we have also re-evaluated all events containing translocation for additional evidence in support or against the “push-pull” model. Previously we were only considering events where 1D diffusion on DNA leads immediately to translocation. Now we add the following categories to the count: (1) events where translocation terminates with the remodeler dissociating from the nucleosome and performing a 1D diffusive search, (2) events where 1D diffusion on DNA leads to association with a nucleosome and after a paused colocalization we observe translocation, and (3) the inverse scenario of (2) (see schematics in Figure 5 – figure supplement 1). These new results, detailed below, are now included in place of the older results in (Results Section # 5; Paragraph # 2). Furthermore, we toned down our argument and clarified that a larger n-value would be needed to be definitive, especially since we observe RSC switching directions, as the reviewer points out.

      By aggregating in new RSC data and using only events where 1D diffusion leads immediately to translocation, we observe 10/12 events in support of the “push” model. If we include these other categories in addition to aggregating the previous data with the new data, a total of 20/25 events are in support of the “push” model. For RSC, the breakdown in the other categories was as follows: (1) 7/10 events, (2) 1/1 events with a paused time of 5 seconds, and (3) 2/2 events with a paused time of 36 and 50 seconds.

      For ISW2, we had previously reported 12/13 events where 1D search lead immediately to translocation. After combing through the data a second time, we decided to omit two events which were less clear; Now we report 10/11 events in support of the “pull” model from this initial category. If we include these other categories in addition to the original, a total of 19/21 events are in support of the “pull” model. For ISW2, the breakdown in the other categories was as follows: (1) 4/4 events, (2) 4/4 events with pause times of 44, 27, 29, and 8 seconds, (3) 1/2 events with paused times of 5 and 19 seconds.

      (5) At 5 pN of tether tension, the outer wrap of nucleosomes is destabilized, which could impact nucleosome translocation dynamics. Additionally, a low buffer flow was kept on during data acquisition, which could bias remodeler diffusion behavior. The authors should rule out or at a minimum discuss these possibilities.

      We thank the reviewer for raising the important point regarding outer wrap destabilization of the nucleosome occurring at 5pN of tension. We have added an additional section to the discussion that reviews the literature on tension effects on nucleosome stability as well as what is currently known of the effects of tension on remodeler translocation on DNA (Discussion Paragraph # 3). While we cannot exclude the possibility that the 5pN of tension used in this study is a causative factor of the observed fast speed or high processivity nucleosome translocation that we report, we believe that with the modifications made to the text to emphasize to the reader of these possibilities, the reader can draw informed conclusions on the significance of our findings. The topic of force effects on remodeling outcomes is an interesting subject for the future.

      We apologize that the experimental details on buffer flow used during imaging was unclear in our initial submission; we do not have buffer flowing during imaging, rather the buffer containing protein is flowed over the DNA at low pressure just prior to imaging. The flow is completely stopped before the DNA or nucleosome array is stretched to 5pN of tension for imaging (See Methods section: Single Molecule Tracking and Analysis).

      Reviewer 1 (Recommendations For The Authors):

      (1) The figure panels could be better arranged to focus on the main messages of the paper.

      (i) Figure 3C-E should go to a supplemental figure.

      We thank the reviewer for this helpful suggestion. As recommended, we moved Figure 3C to the supplemental figure as this panel did not pertain to the main message of the paper.

      (ii) Figure 4 could be split into two figures, one characterizing processive nucleosome translocation (4C, D, G, H, I, J, K, and relevant panels in S4), and the other showing the differential directionality of each remodeler (4E, F, L, and relevant panels in S4).

      We thank the reviewer for their suggestions that help better organize our presentation of the data. As the reviewer suggests, we split figure 4 into two figures: figure 4 which now focuses on translocation characterization and figure 5 which now focuses on the differential directionality of each remodeler.

      (iii) The nucleotide condition should be clearly indicated in the figures or legends. For example, it is unclear if the data in Figure 2 were generated with or without ATP.

      We thank the reviewer for taking note of this. We have added clear indications of the nucleotide condition to figures where this is relevant, including in Figure 2 as indicated.

      (iv) There are many cartoon panels, and some are redundant (e.g., Figure 1A and 1B, Figure 3A and 3B).

      We thank the reviewer for bringing up this point. We agree that some cartoons are redundant. We have eliminated Figure panel 1B and Figure panel 3A of the original figures from the new figures.

      (2) The last paragraph of the Results section should be moved to Discussion. This paper did not directly address the effects of RSC/ISW2 on NDR length.

      We thank the reviewer for this suggestion. We agree and have moved the last paragraph of the Results section to the Discussion..

      (3) There are some typos in the text. For example, "Of the two main types of 1D diffusion, hopping and sliding" is not a complete sentence.

      We thank the reviewer for catching this typo and bringing our attention to others. Upon a more careful proofreading of the text and figures we have caught and amended this and other typos.

      (4) What are the green lines in Figure S1F?

      We thank the reviewer for asking this question. The green lines were meant emphasize how the percentage of traces in the majority high diffusion category increases for RSC but not for ISW2 in response to increases in the KCl concentration. Since this was confusing, we removed these green lines.

      Reviewer # 2 (Public Review):

      Summary:

      The authors use a dual optical trap instrument combined with 2-color fluorescence imaging to analyze the diffusion of RSC and ISW2 on DNA, both in the presence and absence of nucleosomes, as well as long-range nucleosome sliding by these remodelers. This allowed them to demonstrate that both enzymes can participate in 1D diffusion along DNA for rather long ranges, with ISW2 predominantly tracking the DNA strand, while RSC diffusion involves hopping. In an elegant two-color assay, the authors were able to analyze interactions of diffusing remodeler molecules, both of the same or different types, observing their collisions, co-diffusion, and bypassing. The authors demonstrate that nucleosomes act as barriers for remodeler diffusion, either repelling or sequestering them upon collision. In the presence of ATP, they observed surprisingly processive unidirectional nucleosome sliding with a strong bias in the direction opposite to where the remodeler approached the nucleosome from for ISW2. These results have fundamentally important implications for the mechanism of nucleosome positioning at promoters in vivo, will be of great interest to the scientific community, and will undoubtedly spark exciting future research.

      Strengths:

      The mechanism of target search for chromatin-interacting protein machines is a 'hot' topic, and this manuscript provides extremely important and timely new information about how RSC and ISW2 find the nucleosomes they slide. Intriguingly, although both remodelers analyzed in this study can diffuse along DNA, the diffusion mechanisms are substantially different, with extremely interesting mechanistic implications.

      The strong directional preference in nucleosome sliding by ISW2 dictated by the direction it approaches the nucleosomes from during 1D sliding on DNA is a very intriguing result with interesting implications for the regulation of nucleosome organization around promoters. It will be of great interest to the scientific community and will undoubtedly inspire future research.

      Relatively little is known about nucleosome sliding at longer ranges (>100bp), and this manuscript provides a unique view into such sliding and also establishes a versatile methodology for future studies.

      Weaknesses:

      All measurements were conducted at 5pN tension, which induces unwrapping of the outer DNA gyre from nucleosomes. This could potentially represent a limitation for experiments involving nucleosomes, since partial nucleosome unwrapping could affect the behavior of remodelers, especially their sliding of nucleosomes.

      We thank the reviewer for succinctly summarizing the strengths and weaknesses of our study. We have changed the Discussion to better review the literature on the effects of 5pN of tension on nucleosome wrapping and have more clearly presented the limitations of our studying owing to our conducting measurements at 5pN of tension. In doing so, we have tried to emphasize the strengths of our study identified by the reviewer and better inform the reader of the weaknesses.

      Reviewer #2 (Recommendations For The Authors):

      Although not required, nucleosome sliding data under lower tensions (e.g., <=2pN) could be a valuable addition to the manuscript. Indeed, to my knowledge, there is no data on force-dependent rates of nucleosome sliding, so a conclusive demonstration of changes in remodeling rate with tension would be an exciting new result and might be discussed in the context of a potential tension in chromatin. If such experiments cannot readily be added, the authors could alternatively discuss this potential limitation in more detail.

      We thank the reviewer for this suggestion. We agree that adding data at lower tensions (<= 2pN) would have been valuable. Due to time constraints, this will be the subject for the future. We agree that knowledge of the effects of tension would be especially interesting in light of the possibility that tension on chromatin in cells may be affecting remodeler function. We have added a discussion of this potential significance of future work to the discussion (Discussion Section; Paragraph # 3). We have also elaborated on the potential limitation of only conducting measurements at 5pN to the discussion (Discussion Section; Paragraph # 3), as the reviewer recommends.

      The quantitative implications of the proposed mechanism for targeting ISW2 and RSC towards +1 and -1 nucleosomes are highly interesting. To further strengthen the mechanistic implications, the authors could consider quantitatively analyzing how the observed 1D diffusion would affect the probabilities of binding to +1 and -1 versus to other nucleosomes.

      We thank the reviewer for their thoughtful suggestion. While we would have liked to present a final quantitative model that integrates the experimental parameters on 1D diffusion that we present in this study with the parameters extracted from live cell single particle tracking studies, there are key parameters for model building that are missing from our study, due to technical limitations. Namely, we were not able to quantify the fraction of 1D vs 3D nucleosome encounters by remodelers, because the majority of the protein that we image has been bound before the start of imaging; very few proteins bind the nucleosome arrays after the start of imaging as the protein concentration in the imaging chamber is very low. This makes observing binding directly to a nucleosome a very rare event, especially due to the sparse density of nucleosomes (~10) on the array (~50,000 kb).

      The low-diffusion state is intriguing - could the authors speculate about the nature of this state?

      We thank the reviewer for the question. We had added some speculation about the nature of the low-diffusion state to the results section (Results Section # 1; Paragraph 2). One thought that we have is that this may be due to more stable interactions made between remodelers and free DNA when they become trapped in a conformation state that binds more tightly to DNA. Conformational changes may result in different scanning speeds for chromatin remodelers; e.g. SWR1 was shown to scan DNA quicker when bound to ATP (Carcamo, C. et al. eLife 2022). Another possibility is that certain sequences due to their intrinsic curvature, for instance, or their AT-content may trap the remodeler which may make more contacts with the DNA at these sites.

      Minor points:

      Information on the labeling efficiencies for the remodelers would be helpful.

      We thank the reviewer for pointing this out. We assessed labeling saturation by running gels of remodeler labeling with increasing molar ratios of dye to protein and did not observe increased labeling efficiency above the molar ratio used for proteins imaged in our study (see added Figure 1 – figure supplement 1, panel A). From this, we assessed that we have high protein labeling efficiency. We could not assess the labeling efficiency using the standard absorbance method as the extinction coefficient for JFX650 was measured with 1% v/v TFA (PMCID: PMC8154212) which is not compatible for use in assessing our protein labeling efficiency in an aqueous buffer.

      How were the experimental conditions adjusted for two-color diffusion experiments in order to optimize the probability of observing two remodeler molecules with different labels at the same time.

      We thank the reviewer for this clarifying question. To image both remodelers on the same DNA, we combined the remodelers using the same concentrations that produced single molecule densities when the remodelers were imaged separately. We have clarified this point in the Methods section: “Bimolecular Remodeler-Remodeler Imaging and Interaction Analysis”.

      The authors should check the figures for consistency of labeling and provide definitions for abbreviations used in them (e.g. CDF and PDF).

      We thank the reviewer for catching inconsistencies in labeling in our figures. We have updated the figures such that there is consistent labeling throughout. We have also provided definitions for abbreviations such as Cumulative Distribution Function (CDF) and Probability Distribution Function (PDF) in the figure legends where applicable.

      In the section "Remodeler-remodeler collisions during 1D search" (4th line from the end) reference to Fig3D seems to be out of place.

      We thank the reviewer for catching this typo. We have reworded this section such that each figure panel can be discussed sequentially, eliminating this out of place reference to Fig 3D.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough reading and helpful comments which has allowed us to further improve the manuscript. Following the suggestions of the reviewers we have run a number of new simulations including mutations of the PIP binding residues and with an elastic network allowing more mobility of the linker. Together these excellent ideas have allowed us to strengthen the conclusions of the study. Below, we provide point-by-point responses to their suggestions.

      Reviewer #1 (Public Review):

      Summary:

      Here, the authors were attempting to use molecular simulation or probe the nature of how lipids, especially PIP lipids, bind to a medically-important ion channel. In particular, they look at how this binding impact the function of the channel.

      Strengths:

      The study is very well written and composed. The techniques are used appropriately, with plenty of sampling and analysis. The findings are compelling and provide clear insights into the biology of the system.

      Weaknesses:

      A few of the analyses are hard to understand/follow, and rely on "in house" scripts. This is particularly the case for the lipid binding events, which can be difficult to compute accurately. Additionally, a lack of experimental validation, or coupling to existing experimental data, limits the study.

      Our analysis scripts have now been made publicly accessible as a Jupyter notebook on Github https://github.com/etaoster/etaoster.github.io/tree/main/nav_pip_project

      It is my view that the authors have achieved their aims, and their findings are compelling and believable. Their findings should have impacts on how researchers understand the functioning of the Nav1.4 channel, as well as on the study of other ion channels and how they interact with membrane lipids.

      Reviewer #2 (Public Review):

      Summary:

      Y., Tao E., et al. used multiscale MD simulations to show that PI(4,5)P2 binds stably to an inactivated state of Nav channels at a conserved site within the DIV S4-S5 linker, which couples the voltage sensing domain (VSD) to the pore. The authors hypothesized that PI(4,5)P2 prolongs inactivation by binding to the same site where the C-terminal tail is proposed to bind during recovery from inactivation. They convincingly showed that PI(4,5)P2 reduces the mobility of both the DIV S4-S5 linker and the DIII-IV linker, thus slowing the conformational changes required for the channel to recover to the resting state. They also conducted MD simulations to show that phosphoinositides bind to VSD gating charges in the resting state of Nav channels. These interactions may anchor VDS at the resting state and impede its activation. Their results provide a mechanism by which phosphoinositides alter the voltage dependence of activation and the recovery rate from inactivation, an important step for developing novel therapies to treat Nav-related diseases. However, the study is incomplete and lacks the expected confirmatory studies which are relevant to such proposals.

      Strengths:

      The authors identified a novel binding between phosphoinositides and the VSD of Nav and showed that the strength of this interaction is state-dependent. Based on their work, the affinity of PIPs to the inactivated state is higher than the resting state. This work will help pave the way for designing novel therapeutics that may help relieve pain or treat diseases like arrhythmia, which may result from a leftward shift of the channel's activation.

      Weaknesses:

      However, the study lacks the expected confirmatory studies which are relevant to such proposals. For example, one would expect that the authors would mutate the positive residues that they claim to make interactions with phosphoinositides to show that there are much fewer interactions once they make these mutations. Another point is that the authors found that the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state. The authors should make a resting state model of the Nav1.4 channel to explain the recent experimental data showing that PIP2 delays the activation of Nav1.4, with almost no effect on the voltage dependence of fast inactivation.

      Following the reviewers suggestion we have conducted new simulations demonstrating that there are many fewer protein-PIP interactions after mutating the positive residues as shown in the new Supplementary Fig S6.

      The reviewer mentions that if PIPs interact with the VSD-DIV and DIII-DIV linker in the resting state that it could delay fast inactivation. However, as described in the original manuscript and depicted in the schematic (Fig 7) the C-terminal domain impeded PIP binding at the position in the resting state (but not the inactivated state), meaning that PIP does not bind in the resting state to delay fast inactivation. We have clarified this statement in the text on page 14 lines 1-2.

      Following the reviewer’s suggestion we have examined PIP binding to a model of the resting state of Nav1.4 (in addition to the resting state of Nav1.7 described in the original manuscript) as described on page 12 lines 28-30 (and in Fig S12). Similar to what we saw for Nav1.7, PIP binding to VSDI-III can impair activation of the channel.

      Major concern:

      (1) Lack of confirmatory experiments, e.g., mutating the positive residues that show a high affinity towards PIPs to a neutral and negative residue and assessing the effect of mutagenesis on binding.

      Done as described above

      (2) Nav1.4 is the only channel that has been studied in terms of the effect of PIPs on it, therefore the authors should build a resting state model of Nav1.4 and study the effect of PIPs on it.

      Done as described above

      Minor points:

      There are a lot of wrong statements in many areas, e.g., "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.

      We disagree with this statement from the reviewer which may have come from a misreading of the mentioned sentence. Our statement in the original manuscript is consistent with the original experiments that show that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP bound the channel is expected to recover from inactivation more quickly. We have reworded this sentence for clarity on page 13 line 27-30.

      Reviewer #3 (Public Review):

      Summary:

      This work uses multiscale molecular dynamics simulations to demonstrate molecular mechanism(s) for phosphatidylinositol regulation of voltage gated sodium channel (Nav1.4) gating. Recent experimental work by Gada et al. JGP 2023 showed altered Nav1.4 gating when Nav1.4 current was recorded with simultaneous application of PI(4,5)P2 dephosphorylate. Here the authors revealed probable molecular mechanism that can explain PI(4,5)P2 modulation of Nav1.4 gating. They found PIP lipids interacting with the gating charges - potentially making it harder to move the voltage sensor domain and altering the channels voltage sensitivity. They also found a stable PIP binding site that reaches the D_IV S4-S5 linker, reducing the mobility of the linker and potentially competing with the C-terminal domain.

      Strengths:

      Using multiscale simulations with course-grained simulations to capture lipid-protein interactions and the overall protein lipid fingerprint and then all-atom simulations to verify atomistic details for specific lipidprotein interactions is extremely appropriate for the question at hand. Overall, the types of simulation and their length are suitable for the questions the authors pose and a thorough set of analysis was done which illustrates the observed PIP-protein interactions.

      Weaknesses:

      Although the set of current simulations and analysis supports the conclusions drawn nicely, there are some limitations imposed by the authors on the course-grained simulations. If those were not imposed, it would have allowed for an even richer set and more thorough exploration of the protein-lipid interactions. The Martini 2 force field indeed cannot change secondary structure but if run with a properly tuned elastic network instead of backbone restraints, the change in protein configuration can be sampled and/or some adaptation of the protein to the specific protein environment can be observed. Additionally, with the 4to1 heavy atoms to a bead mapping some detailed chemical specificity is averaged out but parameters for different PIP family members do exist - including specific PIP(4,5)P2 vs PIP(3,4)P2, and could have been explored.

      We thank the reviewer for their excellent suggestions and have run new simulations with an elastic network instead of backbone restraints which have generated new insights. Indeed, as shown in the new panel Fig 4E, the new data allows us to demonstrate that the presence of PIP in the proposed binding site stabilises binding of the DIII-DIV linker to the inactivation receptor site, strengthening the conclusions of the paper.

      We thank the reviewer for pointing out that there do exist parameters for different PIP sub-species and have corrected our statement on page 14 line 16 to reflect this. We have not run additional CG simulations with each of these parameters but use the all-atom simulations to examine the interactions of phosphates at specific positions.

      In our atomistic simulations, we backmapped both PI(4,5)P2 and PI(4)P in the binding site to study their specific interactions. We chose to focus on PI(4,5)P2 given its physiological significance. However, we agree that differences in binding with PI(3,4)P2 would be interesting and warrants future investigation. We also note that the newer Martini3 forcefield would be useful in further work to differentiate between PIP subspecies interactions.

      Detailed Comments

      We thank the reviewers for their thorough reading and helpful comments which has allowed us to further strengthen the manuscript. Below, we provide point-by-point responses to their suggestions.

      Reviewer #1 (Recommendations For The Authors):

      I don't have many suggestions for the manuscript, just a few text edits. Of course, experimental analysis would bolster the claims made in the text, but I don't believe that this is necessary, given the quality of the data.

      I understand the focus on the PIP lipids, but it's a shame that the high binding likelihood of glycosphingolipid isn't considered or analysed in any way. This is an especially interesting lipid from the point-of-view of raftlike membrane domains. Given the potential role of raft-like domains in sodium channel function, I feel this would be worth a paragraph or two in the discussion.

      We thank the reviewer for bringing our attention to this interesting point. Glycolipids accumulate around Nav1.4 in our complex membrane simulations, however, given reports that carbohydrates tend to interact too strongly in the Martini2.2 forcefield (Grünewald et al. 2022, Schmalhorst et al. 2017) and there are no specific residues on Nav1.4 that interact preferentially with glycolipid species, we chose not to focus on this. However, we have noted that interactions with other lipids deserve further attention in our revised discussion.

      The analyses have been run using Martini 2. I don't suggest the authors repeat using the Martini 3 force field, but some mention of this in the discussion would be good.

      We have added the following statement to the discussion: “Our coarse grain simulations were carried out using the Martini2.2 forcefield, for which lipid parameters for many plasma membrane lipids have been developed. We expect that future investigations of lipid-protein interactions will benefit from use of the newer, refined Martini 3 forcefield (Souza et al. 2021) as parameters become available for more lipid types.

      This might just be an oversight, but no mention is made of an elastic network applied to the backbone beads.

      Lack of a network has been known to cause the protein to collapse, so if this is missing, I'd like to see an RMSD to show that the protein dynamics are not compromised.

      While no elastic network was used in our original CG simulations, weak protein backbone restraints (10 kJ mol-1 nm-2) used in our simulations allowed us to maintain the structure while allowing some protein movement. However, following the suggestion of reviewer 3, we conducted additional simulations with an elastic instead of backbone restraints as described in the results on page 9 line 30-37 (and in Fig 4E) of the revised manuscript.

      Minor

      •In Fig 3B, are these lipids binding to the channel at the same time? And therefore do the authors see cooperativity?

      The Fig 3B caption has been amended in the revised manuscript to read “Representative snapshots from the five longest binding events from different replicates, showing the three different PIP species (PIP1 in blue, PIP2 in purple and PIP3 in pink) binding to VSD-IV and the DIII-IV linker.” We cannot comment on PIP cooperativity based on these simulations shown in Fig 3, due to the artificially high concentrations used here; however, in model complex membrane simulations we see co-binding of PIPs at the binding site. This is likely due to PIP’s ability to accumulate together and the high density of positively charged residues in the region, attracting and supporting multiple PIP bindings.

      •What charges were used for the atomistic PIP lipids? Does this match the CG lipids?

      We used the CHARMM-GUI PIP parameters for the atomistic simulations. SAPI24 (PIP2) has a headgroup charge of –4e which is one less negative charge than the CG PIP2; whereas SAPI14 (PIP1) has a charge of –3e which is the same as the CG PIP1. We have explicitly included this charge information in the updated Methods of the manuscript (on page 15-16).

      •Line 259-260: "we performed embedded three structures"

      Corrected in the revised manuscript.

      •Line 272: "us" should be "µs"

      Corrected in the revised manuscript.

      •Line 434: kJ/mol should probably also have 'nm-2' included

      Corrected in the revised manuscript.

      •What charge state titratable residues were set to, and were pKa analyses done to decide this?

      Charge states were assigned to default values at neutral pH. We appreciate that future studies could examine this more carefully using constant pH simulations or similar.

      •It's stated that anisotropic scaling is used the AT sims - is this correct? If so, is there a reason this was chosen over semi-isotropic scaling?

      Anisotropic scaling was used for the atomistic simulations allowing all box dimensions to change independently.

      •I would recommend in-house analysis scripts are made available on GitHub or similar, just so the details can be seen.

      Per the reviewer’s request, the Jupyter notebooks used for analysis has been made available on GitHub (https://github.com/etaoster/etaoster.github.io/tree/main/nav_pip_project ).<br /> -One coarse grained notebook:

      • Lipid DE

      • Contact occupancy + outlier plots

      • Binding duration plots

      • Minimum distance plots

      • Number of ARG/LYS plots

      • PIP Occupancy, binding duration, gating charge residues

      • One atomistic notebook:

      • RMSD, RMSF and distance between IFM and its binding pocket (using MDAnalysis)

      • Atomistic PIP headgroup interaction analyses and plots (using ProLIF)

      As a final note, I am NOT saying this needs to be done for the current study, but I recommend the authors try the PyLipID package (https://github.com/wlsong/PyLipID) if they haven't yet, as it might be useful for similar projects they run in the future (i.e. for binding site identification, accurate binding kinetics calculations, lipid pose generation etc.).

      We thank the reviewer for this suggestion and will keep this in mind for future projects.

      Reviewer #2 (Recommendations For The Authors):

      Lin Y., Tao E., et al. used multiscale MD simulations to show that PI(4,5)P2 binds stably to an inactivated state of Nav channels at a conserved site within the DIV S4-S5 linker, which couples the voltage sensing domain (VSD) to the pore. The authors hypothesized that PI(4,5)P2 prolongs inactivation by binding to the same site where the C-terminal tail is proposed to bind during recovery from inactivation. They convincingly showed that PI(4,5)P2 reduces the mobility of both the DIV S4-S5 linker and the DIII-IV linker, thus slowing the conformational changes required for the channel to recover to the resting state. They also conducted MD simulations to show that phosphoinositides bind to VSD gating charges in the resting state of Nav channels. These interactions may anchor VDS at the resting state and impede its activation. Their results provide a mechanism by which phosphoinositides alter the voltage dependence of activation and the recovery rate from inactivation, an important step for developing novel therapies to treat Nav-related diseases. However, the study is incomplete lacks the expected confirmatory studies which are relevant to such proposals.

      The authors identified a novel binding between phosphoinositides and the VSD of Nav and showed that the strength of this interaction is state-dependent. Based on their work, the affinity of PIPs to the inactivated state is higher than the resting state. This work will help pave the way for designing novel therapeutics that may help relieve pain or treat diseases like arrhythmia, which may result from a leftward shift of the channel's activation. However, the study lacks the expected confirmatory studies which are relevant to such proposals. For example, one would expect that the authors would mutate the positive residues that they claim to make interactions with phosphoinositides to show that there are much fewer interactions once they make these mutations. Another point is that the authors found that the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state. The authors should make a resting state model of the Nav1.4 channel to explain the recent experimental data showing that PIP2 delays the activation of Nav1.4, with almost no effect on the voltage dependence of fast inactivation.

      Major concern:

      (1) Lack of confirmatory experiments, e.g., mutating the positive residues that show a high affinity towards PIPs to a neutral and negative residue and assessing the effect of mutagenesis on binding.

      (2) Nav1.4 is the only channel that has been studied in terms of the effect of PIPs on it, therefore the authors should build a resting state model of Nav1.4 and study the effect of PIPs on it. Minor points:

      Following the reviewer’s suggestion we have conducted new simulations demonstrating that there are notably fewer protein-PIP interactions after performing charge neutralizing and charge reversal mutations to the positive residues as shown in the new Fig S6.

      The reviewer mentions that if PIPs interact with the VSD-DIV and DIII-DIV linker in the resting state that it could delay fast inactivation. However as described in the original manuscript and depicted in the schematic (Fig 7) the C-terminal domain impeded PIP binding at the position in the resting state (but not the inactivated state), meaning that PIP does not bind in the resting state to delay fast inactivation. We have clarified this statement in the text on page 14 lines 1-2.

      Following the reviewers suggestion we have examined PIP binding to a model of the resting state of Nav1.4 (in addition to the resting state of Nav1.7 described in the original manuscript) as described on page 12 lines 28-30 (and in Fig S12). Similar to what we saw for Nav1.7 PIP binding to VSDI-III can impair activation of the channel.

      There are a lot of wrong statements in many areas, e.g., "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.

      We disagree with this statement from the reviewer which may have come from a misreading of the mentioned sentence. Our statement in the original manuscript is consistent with the the original experiments that show that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP present the channel will recover from inactivation more quickly. We have reworded this sentence for clarity on page 13 line 27-30.

      Reviewer #3 (Recommendations For The Authors):

      As mentioned in the public review, overall, I am impressed with the manuscript and do think the conclusions are supported. There are, however, quite a few mistakes, mostly minor (listed below). Additionally, I do have a few questions and several extensions that could be done and I mention a few but fully realize many of those could be outside of the scope of the current manuscript.

      We greatly appreciate the time taken by Reviewer 3 to carefully review our manuscript and provide detailed comments. We believe their suggestions have helped to improve our manuscript.

      First comments are in general about the PIP subtype.

      • In the paper you claim:

      L196, "However, this loss of resolution prevents distinction between phosphate positions on the inositol group and does not permit analysis of protein conformational changes induced by PIP binding"

      L367, "it does not distinguish between phosphate positions within each charge state (e.g. PI(3,4)P2 vs PI(4,5)P2)."

      This is not true the PIP2 most commonly used in Martini 2 is from dx.doi.org/10.1021/ct3009655 and is a PI(3,4)P2 subtype. Also other extensions and alternative parameters exist for PIPs in Martini 2 e.g. http://cgmartini.nl/index.php/tools2/other-tools - Martini lipid .itp generator has all three main variants of both PIP1 and PIP2.

      As described in the response to the public review we are grateful for the reviewer for pointing out that there do exist parameters for different PIP sub-species and have corrected our statement on page 14 to reflect this, and clarified the parameters chosen in the methods section (page 16 line 2-3). We have not run additional CG simulations with each of these parameters in the current work but use the all-atom simulations to examine the interactions of phosphates at specific positions.

      • One detail that is missing in the manuscript is some mention of the charge state of the PIPs e.g. Fig.1D does not specify and Fig.4D PIP2 looks like -2 on position 5 and -1 on position 4. Which I think fits the used SAPI24, please specify. Also, what if you use SAPI25 with the flipped charges would that significantly alter the results?

      The charge state of PIP2 is -2e on the 5’ phosphate and -1e on the 4’ phosphate, using the SAPI24 CHARMM lipid parameters. We have ensured that this charge information is stated clearly in the revised manuscript in the methods section on page 16 (line 21). We considered looking at SAPI25, however we expected that it would behave quite similarly, given that the PIP headgroup can adopt slightly different poses and orientations within the binding site across replicates and does fluctuate over simulations (Fig S8). We have noted this in the revised discussion on page 14 line 15-17.

      • I was very intrigued and puzzled by the lower binding of PIP3 vs PIP2 in the Martini simulations. Could it be that PIP3 has a harder time fully entering the binding site, or maybe just sampling? i.e. and its lower number of binding events is a sampling issue.

      We agree with the reviewer that PIP3 is less able to access the binding site than PIP2, likely because of its larger size. This might also be why we see PIP1 binding at the location via a more buried route (since it has the smallest headgroup size). However, PIP1 does not have enough negative charge to keep it in the binding site. It seems to be a Goldilocks-like situation where PIP2 has the optimal size and charge to allow access and stable binding at the site. We also see that when PIP3 enters the binding site it leaves before the end of the simulations. While it is hard to prove statistical significance given the number of binding and dissociation events even with the high and equal concentrations of all three PIP species in the enriched PIP membrane CG simulations, the data strongly suggests preferential binding of PIP2 over PIP3.

      Also the same L196 sentence as above "However, this loss of resolution prevents distinction between phosphate positions on the inositol group and does not permit analysis of protein conformational changes induced by PIP binding". The later part is also wrong, there are no conformational changes due to the restraints on the protein backbone, from methods "backbone beads were weakly restrained to their starting coordinates using a force constant of 10 kJ mol−1nm−2". Martini in general might have a hard time with some conformational changes and definitely cannot sample changes in secondary structure, but conformational changes can, and have on many occasions, been successfully sampled (even full ion channel opening and closing).

      On a similar note, in L179 you mention "owing to the flexibility of the linker." Hose does this fit with simulation with position restraints on all backbone atoms?

      We applied fairly weak restraints to the backbone only – therefore we still observe some flexibility in the highly flexible loop portion of the linker, where sidechains are able to flip between membrane-facing and cytosol-facing orientations.

      However, after reading the comments from the reviewer we have run additional simulations with an elastic network rather than backbone restraints on the DIII-DIV linker which have given further insight. As seen in Fig 4E and described in the results paragraph on page 9 line 30-37 of the revised manuscript, we can see that the presence of PIP does stabilise the linker in its receptor site. To accentuate this effect, we also ran simulation of the ‘IQM’ mutant known to have a less stable fast inactivated state due to weaker binding to the receptor. Without backbone restraints we can see partial dissociation of the DIII-DIV linker from the receptor that is partially rescued by the presence of PIP.

      I know the paper focuses on PIPs, also very nicely in Fig.2B and Fig. S1-2 the lipid enrichment is shown for other lipids, but why show all lipid classes except cholesterol? And, for the left-hand panels in Fig. S1-2 those really should be leaflet specific - as both the membrane and protein are asymmetric.

      The depletion/enrichment of Cholesterol is shown in Fig 2B and as are the Lipid Z-Density maps and contact occupancy structures a (in row 5 of Fig S2, labeled as CL in yellow). The Z-density maps are meant to provide an overall summary of lipid distribution. The contact occupancy structures showing the transverse views and intracellular/ extracellular views provide a better indication of the occupancy across the different leaflets.

      In L237 for the comparison of Cav2.2 and Kv7.1 bound to PI(4,5)P2 structures: They do agree well with the PIP1 simulations but not as much for the main PIP2 binding site. If you look in the CG simulations, is there another (not the main) PIP2 binding site at that same location (which might also be stable in AA simulations)?

      In some replicates of the CG simulations, we identify stable PIP1 binding via the other orientation (i.e. the one that overlaps with the Cav2.2 and Kv7.1 structures). Since we did not directly observe any PIP2 binding events from the other orientation, we did not run any backmapped atomistic simulations with PIP2 at this position. However, the binding site residues that the PIP1/2 headgroup binds to are the same regardless of which side PIP1/2 approaches from. We would expect that PIP2 bound from the alterative position is also stable.

      Two references I want to put for consideration to the authors, for potential inclusion if the authors find their inclusion would strengthen the manuscript. This one gives a good demonstration of using the same PM mixture to define lipid protein fingerprints with Martini:

      https://pubs.acs.org/doi/10.1021/acscentsci.8b00143.

      And this one https://pubmed.ncbi.nlm.nih.gov/33836525/ shows how Nav1.4 function could also be affected by general changes in bilayer properties (in addition to the specific lipid interactions explored here).

      We thank the reviewer for bringing to our attention these two relevant references that will help to respectively substantiate the use Martini to study membrane protein-lipid interactions, as well as, why Nav channels are interesting to study in the context of their membrane environment (and also the potential implications with drugs that can bind from within the membrane). We have added these citations to the introduction and discussion.

      Minor comments and fixes:

      L2, Title: A binding site for phosphoinositide modulation of voltage-gated sodium channels described by multiscale simulations

      The title reads very strangely to me, should it be "A binding site for phosphoinositide" ; "modulation". We thank the reviewer for this comment - title has been updated to: A binding site for phosphoinositides described by multiscale simulations explains their modulation of voltage gated sodium channels.

      L25, Abstract, "The phosphoinositide PI(4,5)P2 decreases Nav1.4 activity by increasing the difficulty of channel opening, accelerating fast activation and slowing recovery from fast inactivation." Assuming this is referring to results from Gada et al JGP, 2023 should this not be "accelerating fast inactivation"?

      Corrected in the revised manuscript.

      L71 maybe good to write the longer version of IFM on first use e.g. Ile-Phe-Met (IFM), as to not mistake it for some random three letter acronym.

      Corrected in the revised manuscript.

      L109, Fig.2. Maybe change the upper and lower leaflet to intracellular and cytoplasmic leaflets (or outer / inner). In D "(D) Distribution of PIP binding occupancies (left)" something missing can I assume, for/over all lipids exposed residues. Also, for D I am a little confused how occupancy is defined as the total occupancy per residue dose not add up to 100.

      The figure has been updated with intracellular and cytoplasmic leaflet labels. The binding occupancy distribution boxplot shows binding occupancies for all lipid exposed residues. In our analysis, we define contact occupancy as the proportion of simulation time in which a lipid type is within 0.7 nm of a given residue. It is possible for more than one lipid to be within this cut in any given frame – that is, both a PIP and PE can be simultaneously bound.

      L160 "occurring the identified site" in the

      Corrected in the revised manuscript.

      L170 "PIP3 (headgroup charge: -7e) has interacts similarly to PIP1," - remove has Corrected in the revised manuscript.

      L194, "reducing system size" the size does not change, I am assuming you want to say reducing the number of particles?

      Corrected in the revised manuscript.

      L252, Fig.6 "(B) Occupancy of all PIPs (PIP1, PIP2, PIP3) at binding site residues in the three systems" A little confusing, initially was expecting 3x3 data points per residue, maybe change to, Combined occupancy of all PIPs...

      Corrected in the revised manuscript.

      L253, Fig.6 D, I don't really have a good suggestion for improvement here, so this is just a FYI that this panel was very confusing for me and took some time to figure out what is shown.

      We have added to the caption of Fig. 6D to try to clarify this panel.

      L257, Fig.6 (F) not in bold

      Corrected in the revised manuscript.

      L259 "PIP binding, we performed embedded three structures of Nav1.7" something missing?

      Corrected in the revised manuscript.

      L272, "In triplicate 50 us coarse-grained simulations" us instead of (micro_greek)s

      Corrected in the revised manuscript.

      L272, that paragraph how long/many simulations only reported for the inactivated Nav1.7 system not the Nav1.7-NavPas chimera, which I am assuming is the same?

      Corrected in the revised manuscript.

      L297, "marked by both shortened inactivation times", can I assume this is: shortened times to inactivation (i.e. to get inactivated not times in the inactivated states)?

      Corrected in the revised manuscript.

      L331, "are conserved in Nav1.1-1.9 (Fig. 5D)," Fig.5C Corrected in the revised manuscript.

      L353, "channel opening []" [] maybe a missing reference?

      Thank you for pointing out this oversight - Goldschen-Ohm et al. has been cited here.

      L394, "The composition of the complex mammalian membrane is as reported in Ingólfsson, et al. (38)." Ref 38 is the "Computational lipidomics of the neuronal plasma membrane" which indeed uses the 63 component PM but the original reference for the average 63 lipid mixture PM is dx.doi.org/10.1021/ja507832e.

      Corrected in the revised manuscript.

      L404, "Additionally, a model Nav1.7 with all four VSDs in the deactivated state using Modeller (40)." Something missing, e.g. was also built and simulated for ...

      Corrected in the revised manuscript.

      Table S1 "Disease information", I am guessing this should be Disease information; mechanism? Of the x5 entries two have mechanism, one has "; unknown significance ", one has "; unknown" maybe clarify in title and make same if unknown.

      Corrected in the revised manuscript.

      Table S1 and S2 have different styles.

      The tables have been amended to have the same style.

      Fig. S3 "for all 12 lipid types in the mammalian membrane " there are many more lipid types in a typical PM (hundreds) and 63 in the PM mixture simulated here, so maybe write: 12 lipid classes?

      Corrected in the revised manuscript.

      Fig.S6 PIP headgroup, can I assume that is for the bound PIP only, please specify.

      Only a single PIP at the identified binding site was backmapped into all cases of atomistic simulations. We have now clarified this point in the methods, results and the FigS6 caption.

      Writing of PI(4,5)P2 and PI(4)P1 most of the time use 1 and 2 as subscripts but not always (at least not in SI), also the same with Nav vs Na_v (v subscript) and even NAV (in Table S1).

      Subscripts have been implemented in the updated Supplementary Information (as well as within various figures and throughout the manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This landmark study sheds light on a long-standing puzzle of Protein kinase A activation in Trypanosoma. Extensive experimental work provides compelling evidence for the conclusions of the manuscript. It represents a significant advancement in our understanding of the molecular mechanism of Cyclic Nucleotide Binding domains and will be of interest to researchers with interest in kinases and mechanistic studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cyclic Nucleotide Binding (CNB) domains are pervasive structural components involved in signaling pathways across eukaryotes and prokaryotes. Despite their similar structures, CNB domains exhibit distinct ligand-sensing capabilities. The manuscript offers a thorough and convincing investigation that clarifies numerous puzzling aspects of nucleotide binding in Trypanosoma.

      Strengths:

      One of the strengths of this study is its multifaceted methodology, which includes a range of techniques including crystallography, ITC (Isothermal Titration Calorimetry), fluorimetry, CD (Circular Dichroism) spectroscopy, mass spectrometry, and computational analysis. This interdisciplinary approach not only enhances the depth of the investigation but also offers a robust cross-validation of the results.

      Weaknesses:

      None noticed.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript clearly shows that Trypanosoma PKA is controlled by nucleoside analogues rather than cyclic nucleotides, which are the primary allosteric effectors of human PKA and PKG. The authors demonstrate that the inosine, guanosine, and adenosine nucleosides bind with high affinity and activate PKA in the tropical pathogens T. brucei, T. cruzi and Leishmania. The underlying determinants of nucleoside binding and selectivity are dissected by solving the crystal structure of T. cruzi PKAR(200-503) and T. brucei PKAR(199-499) bound to inosine at 1.4 Å and 2.1 Å resolution and through comparative mutational analyses. Of particular interest is the identification of a minimal subset of 2-3 residues that controls nucleoside vs. cyclic nucleotide specificity.

      Strengths:

      The significance of this study lies not only in the structure-activity relationships revealed for important targets in several parasite pathogens but also in the understanding of CNB's evolutionary role.

      Weaknesses:

      The main missing piece is the model for activation of the kinetoplastid PKA which remains speculative in the absence of a structure for the trypanosomatid PKA holoenzyme complex. However, this appears to be beyond the scope of this manuscript, which is already quite dense.

      We fully agree that insight into the activation mechanism and its possible deviation from the mammalian paradigm requires a holoenzyme structure revealing the details of R-C interaction. We have attempted Cryo-EM from LEXSY-produced holoenzyme, yet upscaling the purification procedures described in this manuscript have repeatedly failed in spite of numerous protocol changes and optimizations. Much more work is required to achieve this.

      Reviewer #2 (Recommendations For The Authors):

      Some minor points to consider for enhancing the impact of this interesting manuscript:

      (1) The nucleoside affinities measured are mainly for the regulatory subunits unbound to the kinase domain. How would nucleoside affinities change when the regulatory subunits are bound to the kinase domain, which is presumably the case under resting conditions? An estimation of this change in affinity is important because it more closely relates to the variations in cellular nucleoside concentrations needed for activation.

      This is an important question and we have given an indirect answer in the manuscript, but not very explicit. The EC50 values for kinase activation of the purified holoenzyme complexes are very similar or almost identical to the kD values measured by ITC with free regulatory subunits. By inference, the binding kD for the holoenzyme and for the free R-subunit cannot be very different. In addition, we have recently determined the EC50 for PKA activation in vivo in trypanosomes using a bioluminescence complementation reporter assay. The values fit perfectly to the values obtained with purified holoenzyme (Wu et al. in preparation). A sentence in Results (lines 201-203) has been added.

      (2) The authors should point out that a major implication of nucleoside vs. cyclic nucleotide activation is in terms of signal termination. If phosphodiesterases (PDEs) are responsible for cAMP/cGMP signal termination, what terminates nucleoside-dependent signaling? Although the answer to this question may not be known at this stage, it is important to highlight this critical implication of the authors' study.

      The mechanism of signal termination is indeed unknown so far. We speculate that some enzymes of the purine salvage pathways are differentially localized in subcellular compartments and thereby able to establish microdomains that enable nucleoside signaling. In addition, PKA subunit phosphorylations/dephosphorylations and/or protein turnover may also regulate signal termination. As an example, free PKAC1 is rapidly degraded upon depletion of the PKAR subunit by RNAi. We have now mentioned signal termination in Discussion and have revised the last part of Discussion (lines 567-602). A possible approach to monitor compartmentalized signaling would be using the FluoSTEPs technology (Tenner et al., Sci. Adv. 2021; 7: eabe4091), but adapting this to the trypanosome system will not be a short-term task.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We highly thank the editor and reviewers for their time and insightful comments and suggestions. We have made revisions by performing additional experiments and analysis, and clarified the items based on the suggestions.

      Reviewer #1 (Public Review):

      Summary of Author's Objectives:

      The authors aimed to explore JMJD6's role in MYC-driven neuroblastoma, particularly in the interplay between pre-mRNA splicing and cancer metabolism, and to investigate the potential for targeting this pathway.

      Strengths:

      (1) The study employs a diverse range of experimental techniques, including molecular biology assays, next-generation sequencing, interactome profiling, and metabolic analysis. Moreover, the authors specifically focused on gained chromosome 17q in neuroblastoma, in combination with analyzing cancer dependency genes screened with Crispr/Cas9 library, analyzing the association of gene expression with prognosis of neuroblastoma patients with large clinical cohort. This comprehensive approach strengthens the credibility of the findings. The identification of the link between JMJD6-mediated premRNA splicing and metabolic reprogramming in MYC-driven cancer cells is innovative.

      (2) The authors effectively integrate data from multiple sources, such as gene expression analysis, RNA splicing analysis, JMJD6 interactome assay, and metabolic profiling. This holistic approach provides a more complete understanding of JMJD6's role.

      (3) The identification of JMJD6 as a potential therapeutic target and its correlation with the response to indisulam have significant clinical implications, addressing an unmet need in cancer treatment.

      Weaknesses:

      (1) The manuscript contains complex technical details and terminology that may pose challenges for readers without a deep background in molecular biology and cancer research. Providing simplified explanations or additional context would enhance accessibility.

      We have provided simplified explanations for some terminology.

      (2) It would be beneficial to explore whether treatment with JMJD6 inhibitors, both in vitro and in vivo, can effectively target the enhanced pre-mRNA splicing of metabolic genes in MYC-driven cancer cells.

      Unfortunately, there is no potent and selective JMJD6 inhibitors available.

      Reviewer #3 (Public Review):

      Summary:

      Jablonowski and colleagues studied key characteristics of MYC-driven cancers: dysregulated pre-mRNA splicing and altered metabolism. This is an important field of study as it remains largely unclear as to how these processes are coordinated in response to malignant transformation and how they are exploitable for future treatments. In the present study, the authors attempt to show that Jumonji Domain Containing 6, Arginine Demethylase And Lysine Hydroxylase (JMJD6) plays a central role in connecting pre-mRNA splicing and metabolism in MYC-driven neuroblastoma. JMJD6 collaborates with the MYC protein in driving cellular transformation by physically interacting with RNA-binding proteins involved in pre-mRNA splicing and protein regulation. In cell line experiments, JMJD6 affected the alternative splicing of two forms of glutaminase (GLS), an essential enzyme in the glutaminolysis process within the central carbon metabolism of neuroblastoma cells. Additionally, the study provides in vitro (and in silico) evidence for JMJD6 being associated with the anti-proliferation effects of a compound called indisulam, which degrades the splicing factor RBM39, known to interact with JMJD6.

      Overall, the findings presented by Jabolonowski et al. begin to illuminate a cancer-promoting metabolic, and potentially, a protein synthesis suppression program that may be linked to alternative pre-mRNA splicing through the action of JMJD6 - downstream of MYC. This discovery can provide further evidence for considering JMJD6 as a potential therapeutic target for the treatment of MYC-driven cancers.

      Strengths:

      Alternative Splicing Induced by JMJD6 Knockdown: the study presents evidence for the role of JMJD6 in alternative splicing in neuroblastoma cells. Specifically, the RNA immunoprecipitation experiments demonstrated a significant shid from the GAC to the KGA GLS isoform upon JMJD6 knockdown. Moreover, a significant correlation between JMJD6 levels and GAC/KGA isoform expression was identified in two distinct neuroblastoma cohorts. This suggests a causative link between JMJD6 activity and isoform prevalence.

      Physical Interaction of JMJD6 in Neuroblastoma Cells: The paper provides preliminary insight into the physical interactome of JMJD6 in neuroblastoma cells. This offers a potential mechanistic avenue for the observed effects on metabolism and protein synthesis and could be exploited for a deeper investigation into the exact nature, and implications of neuroblastoma-specific JMJD6 protein-protein interactions.

      Weaknesses:

      There are several areas that would benefit from improvements with regard to the current data supporting the claims of the paper (i.e., the conclusion presented in Figure 8).

      Neuroblastoma Modelling Strategy: The study heavily relies on cell lines without incorporating patient derived cells/biomaterials. Using databases to fill gaps in the experimental design can only fortify the observations to a certain extent. A critical oversight is the absence of non-cancerous control cells in many figures, and the rationale for selecting specific cell lines for assays/approaches remains somewhat unclear. A foundational control for such experiments should involve the non-transformed neural crest cell line, which the authors have readily available. Are the observed splicing and metabolic effects of JMJD6 specific to neuroblastoma? Is there a neuroblastoma-specific JMJD6 interactome? Is MYC function essential?

      In Vivo Modelling: The inclusion of a genetic mouse model combined with an inducible JMJD6 knockdown, would enhance the study by allowing examination of JMJD6's role during both tumor initiation and growth in vivo. For instance, the TH-MYCN mice overexpressing MYCN in neural crest cells, could be a promising choice.

      Dependence on Colony Formation Assay: The study leans on 2D and semi-quantitative colony formation assays to assess malignant growth. To validate the link between the mechanistic insights discussed (e.g., reduced protein synthesis) and JMJD6-mediated malignant growth as a potential therapeutic target, evidence from in vivo or representative 3D models would be crucial.

      Data Presentation and Rigor: The presented data is predominantly qualitative and necessitates quantification. For instance, Western blots should be quantified. The RNAseq, metabolism, and pulldown data should be transparently and numerically presented. The figure legends seem elusive and their lack of transparency (oden with regards to biological repeats, error bars, cell line used etc.) is concerning. Adequate citation and identification of all data sources, including online resources, are imperative. The manuscript would also benefit from a more rigorous depiction and quantification of RNA interference of both stable and transient knockdowns with quantitative validation at mRNA and protein levels.

      Novelty Concerns: The emphasis on JMJD6 as a novel neuroblastoma target is contingent on the new mechanistic revelations about the JMJD6-centered link between splicing, metabolism, and protein synthesis. Given that JMJD6 has been previously linked to neuroblastoma biology, the rationale (particularly in Figure 1) for concentrating on JMJD6 may stem more from bias rather than data-driven reasoning.

      Depth of Mechanistic Investigation: Current evidence lacks depth in key areas such as JMJD6-RNA binding. A more thorough approach would involve pinpointing specific JMJD6 binding sites on endogenous RNAs using techniques such as cross-linking and immunoprecipitation, paired with complementary proximity-based methodologies. Regarding the presented metabolism data, diving deeper into metabolic flux via isotope labeling experiments could shed light on dynamic processes like TCA and glutaminolysis. As it stands, the 'pathway cartoon' in Figure 6d appears overly qualitative.

      Response: We agree with this reviewer that more in-depth studies are needed to understand the biological functions of JMJD6 in neuroblastoma. We have included one paragraph “limitation of the study” to point out that additional work needs to be done to address the comments from this reviewer.

      We have also added details in figure legend to increase rigor.

      Reviewer #1 (Recommendations For The Authors):

      In this study, Jablonowski and colleagues identify the link between JMJD6-mediated pre-mRNA splicing and metabolic reprogramming in cancer cells, with implications for therapeutic response to splicing inhibitors. I have reviewed your manuscript and found it quite promising. However, there are some specific points that require further clarification and additional experiments. Please consider the following comments:

      Major concerns:

      (1) Regarding Figure 1d and e: to enhance the robustness of your findings, it would be beneficial to include additional datasets, such as the Kocak-649 dataset. It is important to narrow down the analysis to high-risk patient groups when examining survival rates, specifically to investigate whether the elevated expression of the 114 gene signature correlates with poor survival within this subgroup. Additionally, please consider conducting a more detailed breakdown of the subsets depicted in Fig. 1b to explore the association between their expression levels and patient survival rates.

      Response: We have included the Kocak-649 datasets as Supplemental Figure 1. We have further analyzed the 114 gene signature in low-risk and high-risk patients, respectively, as Supplemental Figure 2.

      (2) Fig. 2b: Similar to the previous comment, it would strengthen your findings to include survival rate analysis in more datasets, particularly in high-risk patient groups.

      Response: We have further analyzed the association of JMJD6 with survival in low-risk and high-risk patients, respectively, as Supplemental Figure 3. Regardless of the risk factors, high expression of JMJD6 was associated with a poor outcome.

      (3) In reference to Fig. S1D, please clarify the time point under investigation. It looks like siRNAs were utilized in this study. Ensure consistency between the siRNA # mentioned in the methods section and what is presented in Fig. S1d.

      Response: We have clarified the time point under investigation in Fig. S1D (now as Fig. S4D). We have corrected the siRNA# on the method section.

      Additionally, it would be beneficial to include data on knockdown efficacy and consider incorporating western blot results, similar to those presented in Fig. 2c.

      Response: These experiments were performed as shown in Figure 4C. We assumed the knockdown efficiency was comparable.

      Furthermore, I recommend analyzing the RNA-seq data from JMJD6-depleted BE(2)C cells to identify any alterations in the expression of neuronal differentiation signature genes, with the aim of exploring potential associations with changes in cell morphology showed in Fig. S1D.

      Response: We have analyzed the data and indeed like this reviewer expected, we do see the upregulation of neuronal differentiation pathways. We have included the data as Fig. S7B.

      (4) Fig. 4g: Confirm whether the data is related to GAC, and if so, where is the data for KGA?

      Response: We apologize for this. KGA data was missed when we assembled the figure. We have added back as Figure 4H.

      (5) In relation to Fig. 4, I suggest conducting experiments to individually silence GAC and KGA, if feasible (for instance, by targeting their 3'-UTRs). This would allow for a more in-depth investigation into whether GAC and KGA play essential roles in NB cell proliferation.

      Response: As this reviewer suggested, we have performed the experiments to knock down GAC and KGA in BE2C cells, and we found that both isoforms seemed to be important for cell survival. We have included the data as Figure 5G-I. Additionally, we have also performed RNA-seq to understand the differential functions of GAC and KGA in neuroblastoma cells when they were overexpressed separately. We have included the data as Figure 5E,F, and Supplemental Figure 9.

      (6) Fig. 5c: Could this protein synthesis reduction be attributed to an artificial overexpression of JMJD6? It would be interesting to investigate whether the genetic silencing of JMJD6 has an impact on total protein synthesis.

      Response: This is a great question but could be very challenging to have a definitive answer. Since cells are not happy with knockdown of JMJD6, we may have a secondary effect resulting from activation of cell death. While we have successfully generated single cell JMJD6 CRISPR KO clones, the cells are not happy either. In the future, we may generate dTAG knockin cell line which will allow us to induce an acute protein degradation, and then we can assess if JMJD6 loss will consequently impact total protein synthesis.

      (7) Fig. S7: the authors have shown that knocking down of JMJD6 in NB cells reduced cell proliferation (Fig. 2c-e). Please clarify how you obtained sufficient cells ader CRISPR knockout of JMJD6 clones and whether the cells remained healthy. It would be helpful to provide cell images.

      Response: We harvested cells at different time points in Fig 2C-E, and we have added the information in Figure legends. Cells were not happy ader JMJD6 KD or KO. We therefore harvest cells for Western blot at an early time point while stained cells for survival effect at a late time point.

      (8) Fig. 7f: Address the paradox where JMJD-knockdown cells grow slower (Fig. 2c-e), but these JMJD-KO4E5 cells grow at a similar rate compared to SKNAS-WT in the DMSO treatment group. Clarify whether this aligns with the results observed with shRNA results shown in Fig. 2c-e.

      Response: The JMJD6 KO cells grew much slower than the wild-type cells. In these experiments, we intentionally seeded a lot more cells for JMJD6 KO clone so that we can have a comparable comparison for the cells with DMSO treatment.

      Minor concerns:

      (1) Fig. 2c: Please specify the time point for Fig. 2c to provide a clearer context for readers.

      We have added the information.

      (2) In Line 204, it is stated that 'Supplementary Table 3,' which describes the 'Correlation of JMJD6 KO and its co-dependency genes,' can actually be found in 'Supplementary Table 4.' Please clarify this discrepancy.

      We apologize for this. We probably accidentally uploaded the duplicates. We have uploaded the new table in our revision.

      (3) Line 207: The order of figures should be clarified. Fig. 3c should be mentioned before Fig. 3b in the text.

      Yes, we did.

      (4) In Line 216, it is mentioned that 'Supplementary Table 4,' which describes 'Differentially expressed genes by JMJD6 KD,' can actually be found in 'Supplementary Table 3.' Please provide clarification for this discrepancy.

      We have corrected this.

      (5) Line 244-247: Please provide clarification of this section to ensure readers can fully understand your point.

      We have rephrased the sentence.

      (6) Line 1048: Confirm whether Fig. 2c represents siRNA or shRNA, as the label in the graph does not match the figure legends.

      Sorry for this. We have corrected.

      (7) Line 1161: Provide clarification regarding the use of Image J from k, and in Line 1162, specify the source of Image J from l.

      We apologized for the confusion of our description. We meant “Image J” sodware. We have corrected in Figure legend.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions to authors:

      Line 39 - suggest introducing JMJD6.

      Response: We have added the full name of JMJD6.

      Line 47 - suggest slightly rephrasing 'metabolic program that is coupled with...'.

      We have made a slight change by changing “coupled” to “associate”.

      Line 85 - please delete/replace 'exceptional'; proofread for inadequate use of ambiguous wording.

      We have changed it as “significant”.

      Line 141 - please concisely define 'high risk'.

      We have defined it with a citation (line 142-146).

      Line 143 - please concisely define 'event free'.

      We have defined the event free and overall survival precisely (line 149, 150).

      Line 153 - provide an adequate citation for 'cBioportal'.

      We have added the citation (line166).

      Line 161 - please state the utilized cell lines.

      We have referenced to Materials and Methods (line 175).

      Line 166 - please note that 'morphological changes' of a cell do not suffice to determine 'stemness', please rephrase.

      We agreed and changed it to “regulate cellular differentiation” (line 181).

      Line 182 - provide a quantifiable measure for color change and or remove observation from the narrative.

      We have removed “indicative of acidic pH change” (line 198).

      Line 185 - the statement commencing with 'It is believed...' requires referencing.

      We have added references (line 200).

      Line 187 - please provide an adequate citation for the 'JoMa1' neural crest-derived cells (J. Maurer and colleagues?).

      We have added the reference (line 201).

      Line 203 - please provide an adequate citation for 'DepMap'.

      There is no citation specifically for DepMap and that’s why we can only provide the DepMap link.

      Line 234 - please provide an adequate citation for 'two algorithms'.

      We have provided the reference (line 265).

      Line 265 - please provide a rationale for the choice of the three tested cell lines.

      We have added definition by saying C-MYC overexpressed SKNAS, BE2C and SIMA with MYCN amplification (line 302, 303).

      Line 279 - suggest rephrasing 'gaining more ATPs'.

      We have removed these words as we do not have direct evidence to show ATP production (line 320).

      Line 342 - suggest rephrasing 'are in the only gene signature'.

      We have rephrased by saying “lysine demethylase (HDM) genes, including JMJD6, are present in the most significantly enriched gene signature in indisulam-sensitive cells” (line 416-416).

      Line 424 - please state the source or all cell lines (commercial provider?).

      We have added the source of cell lines.

      Lines 438 to 442 - are STR and mycoplasma profiling data adequately presented in the manuscript?

      We routinely test STR and mycoplasma for all cell lines cultured in hood in our Department every month.

      Lines 520 onwards - is the JMJD6 knockout generation data (e.g., cell viability upon knockout) adequately presented in the manuscript? Why does the study depend on transient transfection of siRNAs for obtaining mechanistic results?

      We created stable JMJD6 KO clones by selecting single cell with complete knockout. Cells are not happy ader KO. siRNA knockdown is a method for relatively acute depletion of JMJD6, which is easy and fast, and may be more reliable to assess the direct effect of JMJD6.

      Figures: please provide adequate axis-labeling for all graphs (e.g., FIg2 b, and e).

      We have added the axis labeling.

      Discussion line 370 - what is meant by 'too harsh' - please use unambiguous phrasing to highlight limitations.

      We have changed to “stringent”.

      Please provide a study limitation paragraph.

      We have added one limitation paragraph.

      Limitation of the study

      Our study focused on the understanding of JMJD6 function in neuroblastoma cell lines. In the future, we will consolidate our study by expanding our models to patient-derived xenograds, organoids, and neuroblastoma genetic models, in comparison with non-cancerous cells. Although we have identified a conserved interactome of JMJD6 in neuroblastoma cells, it remains to be determined whether it is neuroblastoma-specific and essential to MYC-driven cancers. The genome-wide RNA binding by JMJD6 in cancer cells and normal cells coupled with isotope labeling to dissect the metabolic effect of JMJD6 will enhance our understanding of the biological functions of JMJD6, awaiting future studies. Inability to target the enhanced pre-mRNA splicing of metabolic genes in MYC-driven cancer cells by pharmacologic inhibition of JMJD6 is another limitation, due to lack of selective and potent JMJD6 inhibitors.

      Additional editing and proof-reading of the manuscript's narrative, figures, legends, and methods is highly recommended.

      We have gone through the whole MS to have proof-reading.

    1. Author Response

      We are grateful for the reviewers' appreciation of our work and for their constructive feedback. We will address their comments through a revised version of the manuscript.

      Reviewer #1 (Public Review):

      This study by Paoli et al. used a resonant scanning multiphoton microscope to examine olfactory representation in the projection neurons (PNs) of the honeybee with improved temporal resolution. PNs were classified into 9 groups based on their response patterns. Authors found that excitatory repose in the PNs precedes the inhibitory responses for ~40ms, and ~50% of PN responses contain inhibitory components. They built the neural circuit model of the mushroom body (MB) with evolutionally conserved features such as sparse representation, global inhibition, and a plasticity rule. This MB model fed with the experimental data could reproduce a number of phenomena observed in experiments using bees and other insects, including dynamical representations of odor onset and offset by different populations of Kenyon cells, prolonged representations of after-smell, different levels of odor- specificity for early/delay conditioning, and shift of behavioral timing in delay conditioning. The trace conditioning was not modeled and tested experimentally. Also, the experimental result itself is largely confirmatory to preceding studies using other organisms. Nonetheless, the experimental data and the model provide a solid basis for future studies.

      We thank the reviewer for summarizing the value of our study and recognizing its generality and significance. As suggested, in a revised version of the manuscript, we will discuss the implication of our approach for the context of trace conditioning. The model we presented hinges on the learning-induced plasticity of KC-to-MBON synapses recruited during the learning window (i.e., the simulated US arrival). In the case of trace conditioning, the model predicts that the time of the behavioral response time should match the expected US arrival. Contrary to this prediction, preliminary analyses on empirical measurements of PER latency upon trace conditioning indicate this is not the case. In a revised version of the manuscript, we will discuss the differences between the predictions of the model and the experimental observations in a trace conditioning paradigm.

      Reviewer #2 (Public Review):

      The study presented by Paoli et al. explores temporal aspects of neuronal encoding of odors and their perception, using bees as a general model for insects. The neuronal encoding of the presence of an odor is not a static representation; rather, its neuronal representation is partly encoded by the temporal order in which parallel olfactory pathways participate and are combined. This aspect is not novel, and its relevance in odor encoding and recognition has been discussed for more than the past 20 years.

      The temporal richness of the olfactory code and its significance have traditionally been driven by results obtained based on electrophysiological methods with temporal resolution, allowing the identification and timing of the action potentials in the different populations of neurons whose combination encodes the identity of an odor. On the other hand, optophysiological methods that enable spatial resolution and cell identification in odor coding lack the temporal resolution to appreciate the intricacies of olfactory code dynamics.

      (1) In this context, the main merit of Paoli et al.'s work is achieving an optical recording that allows for spatial registration of olfactory codes with greater temporal detail than the classical method and, at the same time, with greater sensitivity to measure inhibitions as part of the olfactory code.

      The work clearly demonstrates how the onset and offset of odor stimulation triggers a dynamic code at the level of the first interneurons of the olfactory system that changes at every moment as a natural consequence of the local inhibitory interactions within the first olfactory neuropil, the antennal lobe. This gives rise to the interesting theory that each combination of activated neurons along this temporal sequence corresponds to the perception of a different odor. The extent to which the corresponding postsynaptic layers integrate this temporal information to drive the perception of an odor, or whether this sequence is, in a sense, a journey through different perceptions, is challenging to address experimentally.

      In their work, the authors propose a computational approach and olfactory learning experiments in bees to address these questions and evaluate whether the sequence of combinations drives a sequence of different perceptions. In my view, it is a highly inspiring piece of work that still leaves several questions unanswered.

      We thank the reviewer for considering that our work has an inspiring nature. Below we have tried to answer the questions raised by the following comments, and we will include part of these answers in the revised version of our manuscript.

      (2) In my opinion, the detailed temporal profile of the response of projection neurons and their respective probabilities of occurrence provide valuable information for understanding odor coding at the level of neurons transferring information from the antennal lobes to the mushroom bodies. An analysis of these probabilities in each animal, rather than in the population of animals that were measured, would aid in better comprehending the encoding function of such temporal profiles. Being able to identify the involved glomeruli and understanding the extent to which the sequence of patterns and inhibitions is conserved for each odor across different animals, as it is well known for the initial excitatory burst of activity observed in previous studies without the fine temporal detail, would also be highly significant.

      We thank the reviewer for recognizing the relevance of the findings in understanding the logic of olfactory coding. We agree about the importance of establishing if the different glomerular response profiles are evenly distributed across individuals or have individual biases. In the revised version of the manuscript, we will provide data on the distribution of response profiles for each animal and for different olfactory stimuli. Also, we fully agree on the importance of assessing to what extent such response profiles - largely determined by the local network of AL interneurons - are glomerulus-specific and conserved across individuals.

      In my view, the computational approach serves as a useful tool to inspire future experiments; however, it appears somewhat simplistic in tackling the complexity of the subject. One question that I believe the researchers do not address is to what extent the inhibitions recorded in the projection neurons are integrated by the Kenyon cells and are functional for generating odor-specific patterns at that level.

      The model we proposed represents, indeed, a simplification of olfactory signal processing throughout the honey bee olfactory circuit. Still, it shows that simple but realistic rules can be sufficient to grasp some fundamental aspects of olfactory coding. However, we agree with the reviewer and believe that such a minimalistic model can provide a basis for designing future experiments in which complexity can be increased by adding relevant features, such as the learning-induced plasticity of PN-to-KC synapses or the divergence of multiple PNs from the same glomerulus to different KCs

      Concerning the reviewer's question on the involvement of inhibitory inputs in generating odor-specific patterns at the level of the KCs, the short answer is yes, they contribute to the summed input of a target KC, thus to the odor representation. In designing the model, we considered that a given glomerulus provides maximal input at maximal excitation and minimal input (=0 input) at maximal inhibition. For this reason, an inhibited glomerulus contributes less (to KC action potential probability) than a glomerulus showing baseline activity. This, in turn, contributes less than an excited glomerulus. From the modeling point of view, normalizing the signal between 0 and 1 (i.e., setting minimal inhibition to 0 and maximal excitation to 1) would yield a similar result as with the current approach, where values range from -25% to +30% F/F. We implement the model's description to clarify this point.

      Lastly, the behavioral result indicating a difference in conditioned response latency after early or delayed learning protocol is interesting. However, it does not align with the expected time for the neuronal representation that was theoretically rewarded in the delayed protocol. This final result does not support the authors' interpretation regarding the existence of a smell and an after-smell as separate percepts that can serve as conditioned stimuli.

      Considering that our odor stimulus lasted 5 seconds, glomerular activity is highly variable at odor onset (i.e., within the first 1s) because of short excitatory response profiles and the delayed and slower onset of inhibitory responses. After the initial phase, the neural representation of the stimulus becomes more stable. Consequently, a neural signature learned in the case of delay conditioning, i.e., with the US appearing towards the end of the olfactory stimulation (t = 4 - 5s), may present itself much earlier (t = 1.5s), triggering a behavioral response that largely anticipates the expected US arrival time.

      In the model, we observe an early decrease in action potential probability even in the case of delay conditioning. This occurs because the synapses recruited during the last second of olfactory stimulation (within the learning window during which CS and US overlap) become inactive. Because odorant-induced activity recruits highly overlapping synaptic populations between 1.5 and 5 s from the onset, a learning-induced inactivation of part of these synapses will result in a reduced action-potential probability in the modeled MBON. Importantly, this event will not be governed by time but by the appearance of the learned synaptic configuration.

      We will add a new section to the revised version of the manuscript to clarify this concept and perform further analyses to characterize the contribution of different response types to the modeled response latency.

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths:

      • The paper is clearly written, and all the conclusions stem from a set of 3 principles: circular topology, rotational symmetry, and noise minimization. The derivations are sound and such rigor by itself is commendable.

      • The authors provide a compelling argument on why evolution might have picked an eight-column circuit for path-integration, which is a great example of how theory can inform our thinking about the organization of neural systems for a specific purpose.

      • The authors provide a self-consistency argument on how cosine-like activity supports cosine-like connectivity with a simple Hebbian rule. However, their framework doesn't answer the question of how this system integrates angular velocity with the correct gain in the absence of allothetic cues to produce a heading estimate (more on that on point 3 below).

      Weaknesses:

      • The authors make simplifying assumptions to arrive at the cosine activity/cosine connectivity circuit. Among those are the linear activation function, and cosine driving activity u. The authors provide justification for the linearization in methods 3.1, however, this ignores the well-established fact that bump amplitude is modulated by angular velocity in the fly head direction system (Turner-Evans et al 2017). In such a case, nonlinearities in the activation function cannot be ignored and would introduce harmonics in the activity.

      We thank the reviewer for pointing out this omission. We added a paragraph at the end of section 4.1 clarifying that transient non-linearity, for instance when the circuit is actively receiving external input, is compatible with our work because we only need linearity in the line attractor, but not outside (lines 407-419).

      “In more intuitive terms, the neurons have a saturating nonlinear activation function where they modulate their gain based on the total activity in the network. If the activity in the network is above the desired level, r, the gain is reduced and the activity decreases, and when the activity of the network is less than desired level, both the gain and the activity increase. Note that in this scenario transient deviations from the line attractor, which would induce nonlinear behaviour in the circuit dynamics, are tolerable. External inputs, u(t), could transiently modify the shape of the activity, producing activity shapes deviating from what the linear model can accommodate. For example, the shape of the bump attractor could be modified through nonlinearities while the insect attains high angular velocity (Turner-Evans et al., 2017).

      Such nonlinear dynamics do not conflict with the theory developed here, which only requires linearity when the activity is projected onto the circular line attractor. In our framework, the linearity of integration at the circular line attractor is not a computational assumption, but rather it emerges from the principle of symmetry.”

      Furthermore, even though activity has been reported to be cosine-like, in fact in the fruit fly it takes the form of a somewhat concentrated activity bump (~80-100 degrees, Seelig & Jayaraman 2015; Turner-Evans et al 2017), and one has to take into account the smoothing effect of calcium dynamics too which might make the bump appear more cosine-like. So in general, it would be nice to see how the conclusions extend if the driving activity is more square-like, which would also introduce further harmonics.

      We added a cautionary comment on the sinusoidal activity (lines 222-226).

      “We note, however, that data from the fruit fly shows a more concentrated activity bump than what would be expected from a perfect sinusoidal profile (Seelig and Jayaraman, 2015; Turner-Evans et al., 2017), and that calcium imaging (which was used to measure the activity) can introduce biases in the activity measurements (Siegle et al., 2021; Huang et al., 2021). Thus the sinusoidal activity we model is an approximation of the true biological process rather than a perfect description.”

      Overall, it would be interesting to see whether, despite the harmonics introduced by these two factors interacting in the learning rule, Oja's rule can still pick up the "base" frequency and produce sinusoidal weights (as mentioned in methods 3.8). At this point, the examples shown in Figure 5 (tabula rasa and slightly perturbed weights) are quite simple. Such a demonstration would greatly enhance the generality of the results.

      We also extended the self-consistency framework from Oja’s rule to the non-linear case, and found that while Oja’s rule with non-linear neurons would not give pure harmonics, the secondary harmonics will remain small. We added a sentence explaining this in the main text (section 2.4, lines 309-312) and a methods section to develop the self-consistency framework for the case of non-linear activations (section 4.7.2).

      “For neurons with a nonlinear activation function, secondary harmonics would emerge, but would remain small under mild assumptions, as shown in Section 4.7.2. Oja’s rule will still cause the weights to converge to approximately sinusoidal connectivity.”

      • The match of the theoretical prediction of cosine-like connectivity profiles with the connectivity data is somewhat lacking. In the locust the fit is almost perfect, however, the low net path count combined with the lack of knowledge about synaptic strengths makes this a motivating example in my opinion. In the fruit fly, the fit is not as good, and the function-fitting comparison (Methods Figure 6) is not as convincing. First, some function choices clearly are not a good fit (f1+2, f2). Second, the profile seems to be better fit by a Gaussian or other localized function, however the extra parameter of the Gaussian results in the worst AIC and AICc. To better get at the question of whether the shape of the connectivity profile matches a cosine or a Gaussian, the authors could try for example to fix the width of the Gaussian (e.g. to the variance of the best-fit cosine, which seems to match the data very well even though it wasn't itself fit), and then fit the two other parameters to the data. In that case, no AIC or AICc is needed. And then do the same for a circular distribution, e.g. von Mises.

      We also included the fit with von Mises and Gaussian with the width parameters fixed to match the cosine as the reviewer suggested. We found that even though these two distributions fit the data better, the difference is very small (2%), probably due to the high variability of the fruit fly connectome data. We also changed the wording and state that the theory is compatible with experimental data.

      In the Methods 4.6 (lines 568-585), we wrote

      “As a complementary approach to evaluate the shape of the distribution, we first fit the Gaussian and von Mises distributions to the best fit f = 1 curve. We then freeze the width parameters of the distributions (σ_g for the Gaussian and κ_v for the von Mises) and only optimise the amplitude and vertical offset parameters (β and γ) to fit the data. This approach limits the number of free parameters for the Gaussian and von Mises distributions to two, to match the sinusoid. The results are shown in Methods Fig. 6 and Table 5. Both the fixed-width Gaussian and von Mises distributions are a slightly better fit to the data than the sinusoid, but the differences between the three curves are very small.

      In simplifying the fruit fly connectome data, we assumed all synapses of different types were of equal weight, as no data to the contrary were available. Different synapse types having different strengths could introduce nonlinear distortions between our net synaptic path count and the true synaptic strength, which could in turn make the data a better or worse fit for a sinusoidal compared to a Gaussian profile. As such, we don’t consider the only 2% relative differences between the f = 1 sinusoid and fixed-width Gaussian and von Mises distributions to be conclusive.

      Overall, we find that the cosine weights that emerge from our derivations are a very close match for the locust, but less precise for the fly, where other functions fit slightly better. Given the limitations in using the currently available data to provide an exact estimate of synaptic strength (for the locust), and due to the high variability of the synaptic count (for the fruit fly), we consider that our theory is compatible with the observed data.”

      In addition, the theoretical prediction of cosine-like connectivity is not clearly stated in the abstract, introduction, or discussion. As a prediction, I believe it should be center forward, as it might be revisited again in the future in lieu of e.g. new experimental data.

      We added the explicit prediction in the abstract and the introduction (lines 52-53).

      • I find the authors' claim that Oja's rule suffices to learn the insect head direction circuit (l. 273-5) somewhat misleading/vague. The authors seem to not be learning angular integration here at all. First, it is unclear to me what is the form of u(t). Is it the desired activity in the network at time t given angular velocity? This is different than modelling a population of PEN neurons jointly tuned to head direction and angular velocity, and learning weights so as to integrate angular velocity with the correct gain (Vafidis et al 2022). The learning rule here establishes a self-consistency between sinusoidal weights and activity, however, it does not learn the weights from PEN to EPG neurons so as to perform angular integration. Similar simple Hebbian rules have been used before to learn angular integration (Stringer et al 2002), however, they failed to learn the correct gain. Therefore, the authors should limit the statement that their simpler learning rule is enough to learn the circuit (l. 273-5), making sure to outline differences with the current literature (Vafidis et al 2022).

      We agree and we clarified that we focus only on the self-sustained activity condition. We appended the following text to the first and last paragraphs of section 2.4.

      For the first (lines 279-284): “Our approach follows from previous research which has shown that simple Hebbian learning rules can lead to the emergence of circular line attractors in large neural populations (Stringer et al., 2002), and that a head direction circuit can emerge from a predictive rule (Vafidis et al., 2022). In contrast to this work, we focus only on the self-sustaining nature of the heading integration circuit in insects and show that our proposed sinusoidal connectivity profile can emerge naturally.”

      For the last (lines 317-321): “However, this learning rule only applies to the weights that ensure stable, self-sustaining activity in the network. The network connectivity responsible for correctly integrating angular velocity inputs (given by the PEN to EPG connections in the fly) might require more elements than a purely Hebbian rule (Stringer et al., 2002), such as the addition of a predictive component (Vafidis et al., 2022).”

    1. Author Response

      Public reviews:

      Reviewer 1:

      Weaknesses:

      While I generally agree with the author's interpretations, the idea of Saccorhytida as a divergent, simplified off-shot is slightly contradictory with a probably non-vermiform ecdysozoan ancestor. The author's analyses do not discard the possibility of a vermiform ecdysozoan ancestor (importantly, Supplementary Table 4 does not reconstruct that character),

      Reply: Thanks for the comments. Saccorhytids are only known from the early Cambrian and their unique morphology has no equivalent among any extinct or extant ecdysozoan groups. This prompted us to consider them as a possible dead-end evolutionary off-shot. The nature of the last common ancestor of ecdysozoan (i.e. a vermiform or non-vermiform animal with capacities to renew its cuticle by molting) remains hypothetical. At present, palaeontological data do not allow us to resolve this question. The animal in Fig. 4b at the base of the tree is supposed to represent an ancestral soft-bodied form with no cuticle from which ecdysozoan evolved via major innovations (cuticular secretion and ecdysis). Its shape is hypothetical as indicated by a question mark. Our evolutionary model is clearly intended to be tested by further studies and hopefully new fossil discoveries.

      and outgroup comparison with Spiralia (and even Deuterostomia for Protostomia as a whole) indicates that a more or less anteroposteriorly elongated (i.e., vermiform) body is likely common and ancestral to all major bilaterian groups, including Ecdysozoa. Indeed, Figure 4b depicts the potential ancestor as a "worm". The authors argue that the simplification of Saccorhytida from a vermiform ancestor is unlikely "because it would involve considerable anatomical transformations such as the loss of vermiform organization, introvert, and pharynx in addition to that of the digestive system". However, their data support the introvert as a specialisation of Scalidophora (Figure 4a and Supplementary Table 4), and a pharyngeal structure cannot be ruled out in Saccorhytida. Likewise, loss of an anus is not uncommon in Bilateria. Moreover, this can easily become a semantics discussion (to what extent can an animal be defined as "vermiform"? Where is the limit?).

      Reply: We agree with you that “vermiform” is an ill-defined term that should be avoided. “Elongated” might be a better term to designate the elongation of the body along the antero-posterior axis. Changes have been made in the text to solve this semantic problem. Priapulid worms or annelids are examples of extremely elongated, tubular animals. In saccorhytids, the antero-posterior elongation is present (as it is in the vast majority of bilaterians) but extremely reduced, Saccorhytus and Beretella having a sac-like or beret-shape, respectively. That such forms may have derived from elongated, tubular ancestors (e.g. comparable with scalidophoran worms) would require major anatomical transformations that have no equivalent among modern animals. We agree that further speculation about the nature of these transformations is unnecessary and should be deleted simply because the nature of these ancestors is purely hypothetical. We also agree that the loss of anus and the extreme simplification of the digestive system is common among extant bilaterians. The single opening seen in Saccorhytus and possibly Beretella may result from a comparable simplification process. In Figure 4b, the hypothetical pre-ecdysozoan animal is slightly elongated (antero-posterior axis and polarity) but in no way comparable with a very elongated and cylindrical ecdysozoan worm (e.g. extant or extinct priapulid).

      Therefore, I suggest to leave the evolutionary scenario more open. Supporting Saccorhytida as a true group at the early steps of Ecdysozoa evolution is important and demonstrates that animal body plans are more plastic than previously appreciated. However, with the current data, it is unlikely that Saccorhytida represents the ancestral state for Ecdysozoa (as the authors admit), and a vermiform nature is not ruled out (and even likely) in this animal group. Suggesting that the ancestral Ecdysozoan might have been small and meiobenthic is perhaps more interesting and supported by the current data (phylogeny and outgroup comparison with Spiralia).

      Reply: We agree the evolutionary scenario should be more open, especially the evolutionary process that gave rise to Saccorhytida. Again, we know nothing about the morphology of the ancestral ecdysozoan (typically the degree of body elongation, whether it had a differentiated introvert or not, whether it had a through gut or not). Simplification appears as one possible option, but which assumes that the ancestral ecdysozoan was an elongated animal with a through gut. Changes will be made in Fig.4A accordingly. Alternatively, the ancestral ecdysozoan might have been small and meiobenthic.

      Reviewer 2:

      Weaknesses:

      The preservations of the specimens, in particular on the putative ventral side, are not good, and the interpretation of the anatomical features needs to be tested with additional specimens in the future. The monophyly of Cycloneuralia (Nematoida + Scalidophora) was not necessarily well-supported by cladistic analyses, and the evolutionary scenario (Figure 4) also needs to be tested in future works.

      Reply: Yes, we agree that our MS is the first report on an enigmatic ecdysozoan. Whereas the dorsal side of the animal is well documented (sclerites), uncertainties remain concerning its ventral anatomy (typically the mouth location and shape). Additional better-preserved specimens will hopefully provide the missing information. Concerning Cycloneuralia, their monophyly is generally better supported by analyses based on morphological characters than in molecular phylogenies. I

      Reviewer 3:

      Weaknesses: I, as a paleontology non-expert, experienced several difficulties in reading the manuscript. This should be taken into consideration when assuming a wide range of readers including non-experts.

      Reply: We have ensured that the text is comprehensible to biologists. Our main results are summarized in relatively simple diagrams (e.g. Fig. 4). We are aware that technical descriptive terms may appear obscure to non-specialists. However, we think that our text-figures help the reader to understand the morphology of these ancient animals.

    1. Author Response

      eLife assessment

      This study presents a useful comparison of the dynamic properties of two RNA-binding domains. The data collection and analysis are solid, making excellent use of a suite of NMR methods. However, evidence to support the proposed model linking dynamic behavior to RNA recognition and binding by the tandem domains remains incomplete. The work will be of interest to biophysicists working on RNA-binding proteins.

      Response: We thank eLife for taking the time and effort to review our manuscript. Evidence from the literature and our study shows a great deal of parity between the dynamic behavior of dsRBDs and its dsRNA-recognition and -binding, which helped us culminate in proposing a fair model. As mentioned in the manuscript, we have been working on the suggested experiments to further support our proposed model.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Differential conformational dynamics in two type-A RNA-binding domains drive the double-stranded RNA recognition and binding," Chugh and co-workers utilize a suite of NMR relaxation methods to probe the dynamic landscape of the TAR RNA binding protein (TRBP) double-stranded RNA-binding domain 2 (dsRBD2) and compare these to their previously published results on TRBP dsRBD1. The authors show that, unlike dsRBD1, dsRBD2 is a rigid protein with minimal ps-ns or us-ms time scale dynamics in the absence of RNA. They then show that dsRBD2 binds to canonical A-form dsRNA with a higher affinity compared to dsRBD1 and does so without much alteration in protein dynamics. Using their previously published data, the authors propose a model whereby dsRBD2 recognizes dsRNA first and brings dsRBD1 into proximity to search for RNA bulge and internal loop structures.

      Response: We thank the Reviewer for sending us an encouraging review. We have combined the findings reported in the literature with new ones, that led us to propose the dsRNA-binding model by tandem A-form dsRBDs.

      We propose that dsRBD1 can first recognize a variety of sequential and structurally different dsRNAs. dsRBD2 assists the interaction with a higher affinity, thus fortifying the interaction between TRBP and a possible substrate. This may enable the other associated proteins like Dicer and Ago2 to perform critical biological functions.

      However, the following statements made in the comment above are factually incorrect.

      (1) They then show that dsRBD2 binds to canonical A-form dsRNA with a higher affinity compared to dsRBD1 and does so without much alteration in protein dynamics.

      However, we have explicitly shown the perturbation in dsRBD2 dynamics upon RNA binding.

      (2) Using their previously published data, the authors propose a model whereby dsRBD2 recognizes dsRNA first and brings dsRBD1 into proximity to search for RNA bulge and internal loop structures.

      Our previously published data suggests that dsRBD1, owing to its high conformational dynamics in solution, is able to recognize a variety of structurally and sequentially different dsRNA (PMID: 35134335). dsRBDs preferably bind to the double-stranded region (minor-major-minor-groove) of an A-form RNA (PMID: 24801449; PMID: 27332119) and do not search for bulge and internal loop structures as a part of the binding event. Even though dsRBDs preferably bind to the double-stranded region, they can still accommodate perturbation in the A-form helix due to mismatch and bulges with decreased binding affinity (PMID 25608000). However, it is a matter of future research to identify how much of a deviation from the A-form structure can be accommodated by the dsRBDs. The diffusion event observed in the literature (PMID: 23251028) also does not show any direct implication to search for bulge and internal loop structures.

      Strengths:

      The authors expertly use a variety of NMR techniques to probe protein motions over six orders of magnitude in time. Other NMR titration experiments and ITC data support the RNA-binding model.

      Weaknesses:

      The data collection and analysis are sound. The only weakness in the manuscript is the lack of context with the much broader field of RNA-binding proteins. For example, many studies have shown that RNA recognition motif (RRM) domains have similar dynamic characteristics when binding diverse RNA substrates. Furthermore, there was no discussion about the entropy of binding derived from ITC. It might be interesting to compare with dynamics from NMR.

      Response: We understand the reviewer’s point that this study is focused on a dsRNA-binding mechanism rather than addressing the much broader field of RNA-binding. There are multiple challenges in finding a single mechanism that works for all RNA-binding proteins. For instance, RRM is a single-stranded RNA binding domain that is able to read out the substrate base sequence. RRM behaves entirely differently than the dsRBD in terms of sequence specificity. Besides, several other RNA-binding domains like the KH-domain, Puf domains, Zinc finger domains, etc., showcase a unique RNA-binding behavior. Thus, it would be really difficult to draw a single rule of thumb for RNA-recognition behavior for all these diverse domains.

      Thank you for pointing out the entropy of binding from ITC. We shall include the discussion about the entropy of binding in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Proteins that bind to double-stranded RNA regulate various cellular processes, including gene expression and viral recognition. Such proteins often contain multiple double-stranded RNA-binding domains (dsRBDs) that play an important role in target search and recognition. In this work, Chug and colleagues have characterized the backbone dynamics of one of the dsRBDs of a protein called TRBP2, which carries two tandem dsRBDs. Using solution NMR spectroscopy, the authors characterize the backbone motions of dsRBD2 in the absence and presence of dsRNA and compare these with their previously published results on dsRBD1. The authors show that dsRBD2 is comparatively more rigid than dsRBD1 and claim that these differences in backbone motions are important for target recognition.

      Strengths:

      The strengths of this study are multiple solution NMR measurements to characterize the backbone motions of dsRBD2. These include 15N-R1, R2, and HetNOE experiments in the absence and presence of RNA and the analysis of these data using an extended-model-free approach; HARD-15N-experiments and their analysis to characterize the kex. The authors also report differences in binding affinities of dsRBD1 and dsRBD2 using ITC and have performed MD simulations to probe the differential flexibility of these two domains.

      Weaknesses:

      While it may be true that dsRBD2 is more rigid than dsRBD1, the manuscript lacks conclusive and decisive proof that such changes in backbone dynamics are responsible for target search and recognition and the diffusion of TRBP2 along the RNA molecule. To conclusively prove the central claim of this manuscript, the authors could have considered a larger construct that carries both RBDs. With such a construct, authors can probe the characteristics of these two tandem domains (e.g., semi-independent tumbling) and their interactions with the RNA. Additionally, mutational experiments may be carried out where specific residues are altered to change the conformational dynamics of these two domains. The corresponding changes in interactions with RNA will provide additional evidence for the model presented in Figure 8 of the manuscript. Finally, there are inconsistencies in the reported data between different figures and tables.

      Response: We thank the reviewer for the comprehensive and insightful review. A larger construct carrying both RBDs was not used because of the multiple challenges pertaining to dynamics study by NMR spectroscopy (intrinsic R2 rates of the dsRBD1-dsRBD2 construct would be high, resulting in broadened peaks) as per our previous experience (PMID: 35134335). There would be additional dynamics in that construct coming from domain-domain relative motions, difficult to deconvolute the dynamics information. Further, the dsRNA needed to bind to this construct will be longer, thereby causing further line broadening in NMR.

      Coming to mutational studies, careful designing of domain mutants remains as a challenge because the conformational dynamics in both the domains are distributed all through the backbone rather than only in the RNA-binding residues. The mutational studies would need an exhaustive number of mutations in protein as well as RNA to draw a parallel between the binding and dynamics. Having said that, we are working on making such mutations in the protein (at several locations to freeze the dynamics site-specifically) and the RNA (to change the shape of the dsRNA) to systematically study this mechanism, which will be out of scope of this manuscript.

      The reviewer has rightly pointed out some subtle superficial differences. These superficial differences are present because of the context in which we are describing the data. For example, in Figure S4 we are talking about the average relaxation rates and nOe values for only the common residues we were able to analyze between two magnetic field strengths 600 and 800 MHz. Whereas in Figure 6, we are comparing the averages of the core dsRBD residues at 600 MHz, in presence and absence of D12RNA. The differences however are minute falling well within the error range.

    1. Author Response

      eLife assessment

      The manuscript explores the ways in which the genetic code evolves, specifically how stop codons are reassigned to become sense codons. The authors present phylogenetic data showing that mutations at position 67 of the termination factor are present in organisms that nevertheless use the UGA codon as a stop codon, thereby questioning the importance of this position in the reassignment of stop codons. Alternative models on the role of eRF1 would reflect a more balanced view of the data. Overall, the data are solid and these findings will be valuable to the genomic/evolution fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      The issue:

      The ciliates are a zoo of genetic codes, where there have been many reassignments of stop codons, sometimes with conditional meanings which include retention of termination function, and thus > 1 meaning. Thus ciliate coding provides a hotspot for the study of genetic code reassignments.

      The particular issue here is the suggestion that translation of a stop (UGA) in Blastocritihidia has been attributed to a joint change in the protein release factor that reads UGA's and also breaking a base pair at the top of the anticodon stem of tRNATrp (Nature 613, 751, 2023).

      The work:

      However, Swart, et al have looked into this suggestion, and find that the recently suggested mechanism is overly complicated.

      The broken pairing at the top of the anticodon stem of tRNATrp indeed accompanies the reading of UGA as Trp as previously suggested. It changes the codon translated even though the anticodon remains CCA, complementary to UGG. A compelling point is that this misreading matches previous mutational studies of E coli tRNA's, in which breaking the same base pair in a mutant tRNATrp suppressor tRNA stimulated the same kind of miscoding.

      This is a fair characterization, and we would also note the additional positive aspect: that we observed there is consistency in the presence of 4 bp tRNA-Trp anticodon stems in those ciliates which translate UGA as tryptophan, and generally 5 bp anticodon stems in those that do not (including Euplotes with UGA=Cys).

      But the amino acid change in release factor eRF1, the protein that catalyzes termination of protein biosynthesis at UGA is broadly distributed. There are about 9 organisms where this mutation can be compared with the meaning of UGA, and the changes are not highly correlated with a change in the meaning of the codon. Therefore, because UGA can be translated as Trp with or without the eRF1 mutation, Swart et al suggest that the tRNA anticodon stem change is the principal cause of the coding change.

      We do think multiple lines of evidence support the shorter tRNA anticodon stem promoting UGA translation, but also think other changes in the translation system may be important. For instance, structural studies suggest interaction of ribosomal RNA with extended stop codons (particularly the base downstream of the triplet) during translation termination (Brown et al. 2015, Nature). As we noted, previous studies have sought to correlate individual eRF1 substitutions with genetic code changes, but the proposed correlations have invariably disappeared once new tranches of eRF1 sequences and alternative genetic codes for different species became available. This is why we concluded that there needs to be more focus on obtaining and understanding molecular structures during translation termination, particularly in the organisms with alternative codes.

      The review:

      Swart et al have a good argument. I would only add that eRF1 participation is not ruled out, because finding that UGA encodes Trp does not distinguish between encoding Trp 90% of the time and encoding it 99% of the time. The release factor could still play a measurable quantitative role, but the major inference here seems convincing.

      We agree that eRF1 may participate and compete with the tRNA, but we question the hypothesis that the particular amino acid position/substitution proposed by Kachale et al. 2023 is the key. There is experimental evidence in the form of Ribo-seq for the ciliate Condylostoma magnum (A67), which does appear to efficiently translate UGA sense codons (Swart et al. 2016, Figure S3: https://doi.org/10.1016/j.cell.2016.06.020): we observed no dip in ribosome footprints downstream of these codons, as there would be in the case of classical translational readthrough in standard genetic code organisms (which is usually relatively inefficient - certainly well below 50% of upstream translation from our reading of the literature). Ribo-seq also supports efficient termination at those Condylostoma UGA codons that are stops.

      Of course, the entire translation system may have evolved to be as efficient as what we currently observe, and it is not unreasonable to consider that it may have been less efficient in the past. However, not so inefficient that the error rate incurred would have been strongly deleterious. Importantly also, we believe the role of multiple eRF1 paralogs in translation termination in the ciliates really needs to be investigated, given that translation is inherently probabilistic with any of these proteins potentially being incorporated into the ribosome.

      Reviewer #2 (Public Review):

      The manuscript raises interesting observations about the potential evolution of release factors and tRNA to readdress the meaning of stop codons. The manuscript is divided into two parts: The first consists of revealing that the presence of a trp tRNA with an AS of 5bp in Condylostoma magnum is probably linked to contamination in the databases by sequences from bacteria. This is an interesting point which seems to be well supported by the data provided. It highlights the difficulty of identifying active tRNA genes from poorly annotated or incompletely assembled genomes.

      We will consider adding subheadings in revising the manuscript to make the structure more explicit, as it really has three parts to it, with the third largely in the supplement. The “good” was that there is a range of support for the 4 bp AS stem, with new evidence we supplied from ciliates and older studies with E. coli tRNAs. The “bad” is that scrutiny of eRF1 sequences, with the addition of ones we provided, contradicts the hypothesis by Kachale et al. that a S67A/G substitution is necessary for genetic code evolution in Blastocrithidia and certain ciliates. The “ugly” is that a tRNA shown in a main figure in Kachale et al. 2023, and which was investigated in a number of subsequent experiments, is almost certainly a bacterial contaminant.

      Proper scrutiny of the bacterial tRNA should have led to its immediate recognition and rejection, as one of us did years ago in searches of tRNAs in a preliminary Condylostoma genome assembly (only predicted 4 bp AS tRNA secondary structures were shown in Swart et al. 2016, Fig S4B and C). Evidence for the bacterial nature of this tRNA was placed in the supplement of the present manuscript, as the meat of the critique was the consideration of the evidence for and against its good and bad aspects. The bacterial tRNA secondary structure has been removed from the main figure by Kachale et al. 2023, and downstream experiments based on synthetic constructs for this tRNA have also been revised (https://www.nature.com/articles/s41586-024-07065-0).

      Much of the rest of the supplement served to correct multiple errors in genetic codes in public sequence databases that led to additional errors and difficulties in interpreting the eRF1 substitutions in Kachale et al. 2023. It is important that these codes get corrected. If not they create multiple headaches for users besides those investigating genetic codes, as we found out in communications with authors and a colleague of Kachale et al. 2023 (in particular, leading to thousands of missing genes in the macronuclear genome of the standard code ciliate Stentor coeruleus that were removed in automated GenBank processing due to incorrectly having an alternative genetic code specified).

      Recently the NCBI Genetic Codes curators reinstated a genetic code incorrectly attributed to the ciliate Blepharisma (“Blepharisma nuclear genetic code”) (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG15), despite us requesting a reasonable fix years ago. This would be very confusing for those that are not in the know. We have explained this confusion in our supplement too. Thus we also hope that this paper will aid in communication with the genetic code database curators and in correcting such issues.

      The second part criticises the fact that a mutation at position S67 of eRF1 is required to allow the UGA codon to be reassigned as a sense codon. As supporting evidence, they provide a phylogenetic study of the eRF1 factor showing that there are numerous ciliates in which this position is mutated, whereas the organism shows no trace of the reassignment of the UGA codon into a sense codon. While this criticism seems valid at first glance, it suffers from the lack of information on the level of translation of UGA codons in the organisms considered.

      Firstly, we not only showed that there are organisms with the S67 substitution but no UGA reassignment, but also provided evidence for the converse: organisms with a UGA=Trp reassignment but without the S67 substitution (both ciliates and a non-ciliate). So, two related lines of substitutions were not consistent with the eRF1 substitution hypothesis proposed.

      Secondly, we disagree that there is a “lack of information about UGA translation in the organisms considered”. Evolution has already supplied information as to whether UGA codons are translated at an appreciable level in the organisms of interest, in the form of codon frequencies within their protein-coding sequences and those ending them. If UGA was translated at appreciable levels, it would be found at a corresponding frequency in coding sequences. In genomes with thousands of genes, if not predicted as amino acids, they likely primarily serve as stops. Low levels of potential readthrough of actual stops would not change the arguments. With the exception of selenocysteine translation (which is restricted to a limited number of genes by the condition of requiring a specific mRNA secondary structure) there is no expectation of meaningful levels of UGA translation when this codon is missing from the bulk of coding sequences (CDSs).

      This is well illustrated by the heterotrichs, a clade of ciliates that use a variety of genetic codes. In heterotrichs that use the standard code, UGA is virtually absent from coding sequences, only appearing at the 3’ end of transcripts in the predicted stop codon and 3’-UTR (Seah et al. 2022, Figure 5). This contrasts notably with other genera like Blepharisma where appreciable levels of UGA codons occur throughout coding sequences, upstream of the predicted UAA and UAG stops (Seah et al. 2022, Figure 5: https://www.biorxiv.org/content/biorxiv/early/2022/07/12/2022.04.12.488043/F5.large.jpg). The difference in the UGA, UAG and UAA codon frequencies in 3’ UTRs compared to the upstream frequencies in CDSs of standard genetic code heterotrichs is stark. Frequencies of all three codons are elevated in the 3’ UTRs of all heterotrich ciliates, irrespective of their genetic codes (Seah et al. 2022, Figure 5), according with these codons not being deleterious in this region and strongly selected against upstream, within CDSs.

      The reviewer raises the possibility that UGA may appear to be a stop codon but still have biologically significant translational readthrough. We think that this is unlikely in the heterotrich ciliate species discussed here, which have extremely short (median 21-26 bp) and AU-rich 3’-UTRs compared to yeast and animals (Seah et al. 2022). Therefore, in heterotrichs where UGA is predicted to be a stop, translational readthrough would lead to extensions of only a few amino acids and be relatively inconsequential, as there are plenty of secondary UAA, UAG and UGA codons downstream of the typical stop.

      If one were to consistently pursue the reviewer’s line of argumentation, one would also have to argue against the very reasoning used in Kachale et al. 2023 about all the stop codon predictions/reassignments in protists for which experiments were not conducted in S. cerevisiae or other translation systems, as well as decades of prior work using sequence conservation in multiple sequence alignments to infer alternative genetic codes.

      Furthermore, experimental information for UGA translation levels is available for the ciliate Condylostoma magnum, predominantly in the form of Ribo-seq (Swart et al. 2016). Similarly to Condylostoma’s UAA and UAG codons, Ribo-seq shows that the UGA codons are generally either efficiently translated when present in the bodies of CDSs or terminate translation as actual stops close to mRNA 3’ termini/poly(A) tails (Swart et al. 2016). Thus, irrespective of the presence of the hypothesized eRF1 substitution there is an example of relatively discrete reading of UGA codons in ciliates as either stops or amino acids. This contrasts with Kachale et al 2023’s experiments in yeast with yeast eRF1 S67G or Blastocrithida eRF1 which also has glycine at the equivalent position that appear to lead to modest readthrough. In addition, efficient reading of codons in either of two ways also occurs in the ciliate genus Euplotes in which “stop” codons can either serve as frameshift sites during translation within coding sequences or be actual stops when they are close to 3’ mRNA termini (Lobanov et al. 2017), as verified by Ribo-seq and protein mass spectrometry.

      It has been clearly shown that S67G or S67A mutations allow a strong increase in the reading of UGA codons by tRNAs, so this point is not in doubt. However, this has been demonstrated in model organisms, and we now need to determine whether other changes in the translational apparatus could accompany this mutation by modifying its impact on the UGA codon. This is a point partly raised at the end of the manuscript.

      There is no doubt that S67G or S67A mutations lead to increased translational readthrough, but this is restricted to experiments with or in baker’s yeast or other standard genetic code surrogate model organisms. Experiments introducing eRF1 sequences from alternative genetic code eukaryotes into translation systems of such standard genetic code eukaryotes are not compelling because the rest of the associated translation system has also evolved tremendously. As far as we are aware, no in vivo experiments with ciliate eRF1s have been conducted to determine if position 67 or other substitutions have any effect. These considerations are critical given the vast evolutionary distances between yeasts, Blastocrithidia, the ciliates and Amoebophrya sp. ex Karlodinium veneficum. On the other hand, the evolutionary information presented contradicts the importance of this substitution in the Amoebophyra species and ciliates. We will consider how to incorporate these ideas in the revised version of the manuscript.

      Indeed, it is quite possible that in these organisms the UGA codon is both used to complete translation and is subject to a high level of readthrough. Actually, in the presence of a mutation at position 67 (or elsewhere), the reading of the UGA can be tolerated under specific stress conditions (nutrient deficiency, oxidative stress, etc.), so the presence of this mutation could allow translational control of the expression of certain genes.

      As explained a couple replies above, it is not constructive to invoke the additional complexity of conditional translation or any other kinds of factors that lead to enhanced readthrough, because the translation of UGA sense codons in the ciliate Condylostoma, where we have supporting experimental evidence, does not resemble translational readthrough. These codons occur in constitutively expressed single-copy genes, like a tryptophan tRNA synthetase and an eRF1 protein (Swart et al. 2016), not ones that might be expected to be conditionally translated.

      On the other hand, it seems obvious to me that there are other ways of reading through a stop codon without mutating eRF1 at position S67. So the absence of a mutation at this position is not really indicative of a level of reading of the UGA codon.

      It may seem obvious to the reviewer, but that is neither what Kachale et al. originally proposed nor what we questioned. Kachale et al. hypothesized that mutation of S67 to A or G is necessary for UGA=Trp translation, but we provided evidence that it is not: multiple organisms with S67 or C67 that translate UGA as tryptophan. Kachale et al. also originally suggested that the S67 to A/G substitution is also necessary in Condylostoma for UGA translation as tryptophan by weakening its recognition of this codon as a stop (from their abstract: “Virtually the same strategy has been adopted by the ciliate Condylostoma magnum.”). However, as we have stated, Condylostoma (A67) is both able to efficiently terminate at UGA stop codons and to efficiently translate (other) UGA sense codons, which does not fit this hypothesis.

      Before writing such a strong assertion as that found on page 3, experiments should be carried out. The authors should therefore moderate their assertion.

      Experiments should be carried out in the organisms in which stop codon reassignments have readily occurred and their close relatives that have not, not distantly related ones where they rarely, if ever, occur, like yeasts. We made this point in the conclusion. There is too much emphasis on models for investigation of genetic code evolution via stop codon reassignments in questionable models and too little investigation in the really good ones, particularly the ciliates. This clade has genera that are amenable to molecular experiments including Paramecium, Tetrahymena and Oxytricha. We plan to add some text about these considerations in revision.

      To make a definitive conclusion, we would need to be able to measure the level of termination and readthrough in these organisms. So, from my point of view, all the arguments seem rather weak.

      We reiterate: there is experimental information about translation and termination in two ciliate species worth considering, including one that translates UGA codons depending on their context. If one chooses to ignore the evolutionary information presented, this not only ignores all prior approaches to infer genetic codes, but also the fact that there is experimental verification and other lines of evidence supporting these approaches.

      Moreover, the authors themselves indicate that the conjunction between a Trp tRNA that is efficient at reading the UGA codon and an eRF1 factor that is not efficient at recognising this stop codon could be the key to reassignment.

      This does not convey well what we wrote, since the main consideration was overall eRF1 structure, rather than individual amino acid substitutions. Here are the key sentences:

      “Instead, in a transitional evolutionary phase, codons may be interpreted in two ways, with potential eRF1-tRNA competition. With time, beneficial mutations or modifications in either the tRNA or eRF1 (or other components of translation) that reduce competition may be selected.

      Instead of focusing on individual eRF1 substitutions, we propose future investigations should more generally explore the structure of non-standard genetic code eRF1’s captured in translation termination in the context of their own ribosomes.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the distinct subpopulation of adipocytes during brown-to-white conversion in perirenal adipose tissue (PRAT) at different ages. The evidence supporting the claims of the authors is convincing, although specific lineage tracing of this subpopulation of cells and mechanistic studies would expand the work. The work will be of interest to scientists working on adipose and kidney biology.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors performed single nucleus RNA-seq for perirenal adipose tissue (PRAT) at different ages. They concluded a distinct subpopulation of adipocytes arises through brown-to-white conversion and can convert to a thermogenic phenotype upon cold exposure.

      Strengths:

      PRAT adipose tissue has been reported as an adipose tissue that undergoes browning. This study confirms that brown-to-white and white-to-beige conversions also exist in PRAT, as previously reported in the subcutaneous adipose tissue.

      Response: We thank the reviewer for summarizing the strengths of our manuscript. However, we would like to clarify two points here. First, PRAT has been reported as a visceral adipose depot that contains brown adipocytes and a process of continuous replacement of brown adipocytes by white adipocytes has been previously suggested based on histological assessment. There is no evidence that PRAT undergoes browning, unless cold exposure is involved. Second, unlike the brown-to-white conversion, white-to-beige conversion in PRAT was not observed under normal conditions. The adipocyte population that arises from brown-to-white conversion (mPRAT-ad2) can respond to cold and restore their UCP1 expression. However, the adipocytes that arise from the mPRAT-ad2 subpopulation after cold exposure have a distinct transcriptome to that of cold-induced beige adipocyte in iWAT (Figure S7K) and are more related to iBAT brown adipocytes (Figure 6E). Therefore, it is more of a white-to-brown conversion in PRAT upon cold exposure rather than white-to-beige conversion and the underlying mechanism is likely different from the white-to-beige conversion in the subcutaneous adipose tissue.

      Weaknesses:

      (1) There is overall a disconnection between single nucleus RNA-seq data and the lineage chasing data. No specific markers of this population have been validated by staining.

      Response: We are not sure what “this population” refers to. We assume that it is the Ucp1-&Cidea+ mPRAT-ad2 adipocyte subpopulation. If so, we did not identify specific markers for these adipocytes as shown in Figure 1H and statements in the Discussion section. mPRAT-ad2 is negative for Ucp1 and Cyp2e1, which are markers for mPRAT-ad1 and mPRAT-ad3&4, respectively. To visualize the mPRAT-ad2 adipocytes on tissue sections, we collected pvPRAT and puPRAT of Ucp1CreERT2;Ai14 mice one day after tamoxifen injection and stained with CYP2E1 antibody and BODIPY. The Tomato-&CYP2E1-&BODIPY+ cells represent the mPRAT-ad2 adipocytes. Based on such strategy, we revealed a significantly higher percentage of mPRAT-ad2 cells in puPRAT than pvPRAT (presented as Figure S3E in the revised manuscript).

      (2) It would be nice to provide more evidence to support the conclusion shown in lines 243 to 245 "These results indicated that new BAs induced by cold exposure were mainly derived from UCP1- adipocytes rather than de novo ASPC differentiation in puPRAT". Pdgfra-negative progenitor cells may also contribute to these new beige adipocytes.

      Response: We stained pvPRAT and puPRAT of the PdgfraCre;Ai14 mice with the adipocyte marker Plin1 and observed a 100% overlap between the tdTomato signal and the Plin1 staining, after examining a total of 832 and 628 adipocytes in pvPRAT and puPRAT of two animals (Figure S4). Plin1 stains all adipocytes, while the endogenous tdTomato labels both the adipocytes and blood vessels. This result suggests that all adipocytes in mPRAT are derived from Pdgfra-expressing cells, which is in line with a previous study that integrated several single-cell RNA sequencing data sets and showed that Pdgfra is expressed by virtually all ASPCs (Ferrero et al., 2020).

      Also, we would like to point out that the cold-induced adipocytes in mPRAT resemble more to the brown adipocytes of iBAT than the beige adipocytes of iWAT (Figure 6E and S7K).

      Ferrero, R., Rainer, P., and Deplancke, B. (2020). Toward a Consensus View of Mammalian Adipocyte Stem and Progenitor Cell Heterogeneity. Trends Cell Biol 30, 937-950.

      (3) The UCP1Cre-ERT2; Ai14 system should be validated by showing Tomato and UCP1 co-staining right after the Tamoxifen treatment.

      Response: We collected pvPRAT and puPRAT of 1- and 6-month-old Ucp1CreERT2;Ai14 mice one day after the last tamoxifen injection and stained with UCP1 antibody to check the overlap between the Tomato and UCP1signal. All Tomato+ cells were UCP1+, indicating 100% specificity of the Ucp1CreERT2; and the labelling efficiency was over 93% at both time points for both regions (Figure S3C-D).

      Reviewer #2 (Public Review):

      Summary:

      In the present manuscript, Zhang et al utilize single-nuclei RNA-Seq to investigate the heterogeneity of perirenal adipose tissue. The perirenal depot is interesting because it contains both brown and white adipocytes, a subset of which undergo functional "whitening" during early development. While adipocyte thermogenic transdifferentiation has been previously reported, there remain many unanswered questions regarding this phenomenon and the mechanisms by which it is regulated.

      Strengths:

      The combination of UCP1-lineage tracing with the single nuclei analysis allowed the authors to identify four populations of adipocytes with differing thermogenic potential, including a "whitened" adipocyte (mPRAT-ad2) that retains the capacity to rapidly revert to a brown phenotype upon cold exposure. They also identify two populations of white adipocytes that do not undergo browning with acute cold exposure.

      Anatomically distinct adipose depots display interesting functional differences, and this work contributes to our understanding of one of the few brown depots present in humans.

      Weaknesses:

      The most interesting aspect of this work is the identification of a highly plastic mature adipocyte population with the capacity to switch between a white and brown phenotype. The authors attempt to identify the transcriptional signature of this ad2 subpopulation, however, the limited sequencing depth of single nuclei somewhat lessens the impact of these findings. Furthermore, the lack of any form of mechanistic investigation into the regulation of mPRAT whitening limits the utility of this manuscript. However, the combination of well-executed lineage tracing with comprehensive cross-depot single-nuclei presented in this manuscript could still serve as a useful reference for the field.

      Response: The sequencing depth of our data is comparable, if not better than previously published snRNA-seq studies on adipose tissue (Burl et al., 2022; Sarvari et al., 2021; Sun et al., 2020). Therefore, the depth of our data has reached the limit of the 3’ sequencing methods. Unfortunately, due to size limitation of the adipocytes, it is challenging to sort them for Smart-seq. We suspect that lack of specific markers for mPRAT-ad2 is partly due to its intermediate and plastic phenotype. Regarding the mechanistic regulation of mPRAT whitening, we believe that it is more suitable to leave such investigations for a separate follow-up and more in-depth study.

      Burl, R.B., Rondini, E.A., Wei, H., Pique-Regi, R., and Granneman, J.G. (2022). Deconstructing cold-induced brown adipocyte neogenesis in mice. Elife 11. 10.7554/eLife.80167.

      Sarvari, A.K., Van Hauwaert, E.L., Markussen, L.K., Gammelmark, E., Marcher, A.B., Ebbesen, M.F., Nielsen, R., Brewer, J.R., Madsen, J.G.S., and Mandrup, S. (2021). Plasticity of Epididymal Adipose Tissue in Response to Diet-Induced Obesity at Single-Nucleus Resolution. Cell Metab 33, 437-453 e435. 10.1016/j.cmet.2020.12.004.

      Sun, W., Dong, H., Balaz, M., Slyper, M., Drokhlyansky, E., Colleluori, G., Giordano, A., Kovanicova, Z., Stefanicka, P., Balazova, L., et al. (2020). snRNA-seq reveals a subpopulation of adipocytes that regulates thermogenesis. Nature 587, 98-102. 10.1038/s41586-020-2856-x.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) There is overall a disconnection between single nucleus RNA-seq data and the lineage chasing data. No specific markers of this population have been validated by staining.

      (2) It would be nice to provide more evidence to support the conclusion shown in lines 243 to 245: "These results indicated that new BAs induced by cold exposure were mainly derived from UCP1- adipocytes rather than de novo ASPC differentiation in puPRAT". Pdgfra-negative progenitor cells may also contribute to these new beige adipocytes.

      (3) The UCP1Cre-ERT2; Ai14 system should be validated by showing Tomato and UCP1 co-staining right after the Tamoxifen treatment.

      Please see above for the responses.

      Reviewer #2 (Recommendations For The Authors):

      • Without specific lineage tracing it is not possible to conclude that the mPRAT-ad2 population converted to beige with CE. The authors should change this wording from "likely" to "possible".

      Response: We have changed the word “likely” to “possible” in the text. Also, we would like to point out that the cold-induced adipocytes in mPRAT resemble more to the brown adipocytes of iBAT than the beige adipocytes of iWAT (Figure 6E and S7K).

      • The sentence "precursor cells may be less sensitive to environmental temperature and have a limited contribution to mature adipocyte phenotypes through de novo adipogenesis after cold exposure." and others like it should be changed to indicate the acute timeframe of this experiment. It has been shown that the precursors make a more significant contribution to de novo beige adipogenesis with chronic cold exposure.

      Response: We have modified the sentence as follows: “precursor cells may be less sensitive to acute environmental temperature drop and have a limited contribution to mature adipocyte phenotypes through de novo adipogenesis after cold exposure”. As mentioned above, the cold-induced adipocytes in mPRAT resemble more to the brown adipocytes of iBAT and therefore may have a different mechanism to the de novo beige adipogenesis with chronic cold exposure.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed the specific comments made upon the initial submission. In particular, they have now provided an explanation, why their GSDM tree looks different than previously published trees. The authors have also followed my initial suggestion to consider the highly-conserved residue following the cleavage site in bird GSDMA forms. Some of the more general weaknesses remain, since they cannot easily be addressed. I agree with the suggestions made by reviewer #2 to further improve the manuscript.

      We thank the reviewer for their insight which we think has improved our manuscript. We have additionally made the changes requested by this reviewer and reviewer #2 in the next section.

      Reviewer #2 (Recommendations For The Authors):

      The authors responded sincerely to our reviewers' questions in the revised manuscript and I sufficiently understand. After re-reading it, however, I found two issues that need to be revised, so please consider doing them.

      (1) New sentences (Page 5, lines 209-212) that the authors have added are better written in the subsection, "Bird GSDMA is activated .." after some modification. Because there is an undeniable sense of suddenness in present position.

      We agree with this evaluation and have moved these sentences to a more natural position in the following section.

      (2) Regarding the chromosomal location of the GSDMA gene, the authors describe that the genes of mammals, birds, and reptiles localize the same genetic locus, but no data are presented. To support their claim, it should also be presented as a supplementary figure.

      We agree with this evaluation and have generated Figure 1 – Supplemental 4 to show the synteny of the GSDMA locus from humans to GSDMEc in sharks.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewing Editor:

      Comment: Bladder dysfunction following spinal cord injury (SCI) represents a severe and disabling complication and we lack effective therapies. Following evidence that AMPA receptors play a key role in bladder function the authors show convincingly that AMPA allosteric activators can ameliorate many of the subacute defects in bladder and sphincter function following SCI, including prolonged voiding intervals and high bladder pressure thresholds for voiding. These valuable results in rodents may help in the development of these agents as therapeutics for humans with SCI-induced bladder dysfunction.

      Response: We thank the reviewing editor for their assessment of this manuscript and positive comments. We also appreciate the opportunity to revise this manuscript for publication in eLife. We have addressed the excellent comments of the three reviewers. We have included detailed response-to-reviewer comments below to address each specific point. Based on the reviewers’ critiques, we feel our re-working of the manuscript has made for a greatly improved study.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Spinal cord injury (SCI) causes immediate and prolonged bladder dysfunction, for which there are poor treatments. Following up on evidence that AMPA glutamatergic receptors play a key role in bladder function, the authors induced spinal cord injury and its attendant bladder dysfunction and examined the effects of graded doses of allosteric AMPA receptor activators (ampakines). They show that ampakines ameliorate several prominent derangements in bladder function resulting from SCI, improving voiding intervals and pressure thresholds for voiding and sphincter function.

      Strengths:

      Well-performed studies on a relevant model system. The authors induced SCI reproducibly and showed that they had achieved their model. The drugs revealed clear and striking effects. Notably, in some mice that had such bad SCI that they could not void, the drug appeared to restore voiding function.

      Weaknesses:

      The studies are well conducted, but it would be helpful to include information on the kinetics of the drugs used, their half-life, and how long they are present in rats after administration. What blood levels of the drugs are achieved after infusion? How do these compare with blood levels achieved when these drugs are used in humans?

      Response: We thank Reviewer #1 for the positive comments and their helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document), and have made changes to the manuscript based on these excellent points.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Rana and colleagues present interesting findings demonstrating the potential beneficial effects of AMPA receptor modulators with ampakines in the context of the neurogenic bladder following acute spinal cord injury. Neurogenic bladder dysfunction is characterized by urinary retention and/or incontinence, with limited treatments available. Based on recent observations showing that ampakines improved respiratory function in rats with SCI, the authors explored the use of ampakine CX1739 on bladder and external urethral sphincter (EUS) function and coordination early after mid-thoracic contusion injury. Using continuous flow cystometry and EUS myography the authors showed that ampakine treatment led to decreased peak pressures, threshold pressure, intercontraction interval, and voided volume in SCI rats versus vehicle-treated controls. Although CX1739 did not alter EUS EMG burst duration, treatment did lead to EUS EMG bursting at lower bladder pressure compared to baseline. In a subset of rats that did not show regular cystometric voiding, CX1739 treatment diminished non-voiding contractions and improved coordinated EUS EMG bursting. Based on these findings the authors conclude that ampakines may have utility in recovery of bladder function following SCI.

      Strengths:

      The experimental design is thoughtful and rigorous, providing an evaluation of both the bladder and external urethral sphincter function in the absence and presence of ampakine treatment. The data in support of a role for CX1789 treatment in the context of the neurogenic bladder are presented clearly, and the conclusions are adequately supported by the findings.

      Weaknesses:

      Since CX1789 was administered in the context of cystometry and urethral sphincter EMG, a brief discussion of how ampakines could be used in a therapeutic context in humans would help to understand the translational significance of the work. The study lacks information on the half-life of CX1789 and how might this impact the implementation of CX1789 for clinical use. In addition, the study was limited to female rats. Lastly, given the male bias of traumatic SCI in humans, a brief discussion of this limitation is warranted.

      Response: We thank Reviewer #2 for their positive comments and their helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document). We have also made changes to the manuscript based on the three excellent discussion points brought up by the reviewer.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rana and colleagues examined the effect of a "low impact" ampakine, an AMPA receptor allosteric modulator, on the voiding function of rats subjected to midline T9 spinal cord contusion injury. Previous studies have shown that the micturition reflex fully depends on AMPA glutaminergic signaling, and, that the glutaminergic circuits are reorganized after spinal cord injury. In chronic paraplegic rats, other circuits (no glutaminergic) become engaged in the spinal reflex mechanism controlling micturition. The authors employed continuous flow cystometry and external urethral sphincter electromyography to assess bladder function and bladder-urethral sphincter coordination in naïve rats (control) and rats subjected to spinal cord injury (SCI). In the acute phase after SCI, rats exhibit larger voids with lower frequency than naïve rats. This study shows that CX1739 improves, in a dose-dependent manner, bladder function in rats with SCI. The interval between voids and the voided volume was reduced in rats with SCI when compared to controls. In summary, this is an interesting study that describes a potential treatment for patients with SCI.

      Strengths:

      The findings described in this manuscript are significant because neurogenic bladder predisposes patients with SCI to urinary tract infections, hydronephrosis, and kidney failure. The manuscript is clearly written. The study is technically outstanding, and the conclusions are well justified by the data.

      Weaknesses:

      The study was conducted 5 days after spinal cord contusion when the bladder is underactive. In rats with chronic SCI, the bladder is overactive. Therefore, the therapeutic approach described here is expected to be effective only in the underactive bladder phase of SCI. The mechanism and site of action of CX1739 is not defined.

      Response: We thank Reviewer #3 for the positive comments and their helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document), and have made changes to the manuscript based on the excellent point mentioned in the weakness section.

      Comment: Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Response: We have addressed all comments of both reviewers. We detail our responses in this Response to Reviewer Comments document and have made the associated modifications to the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Comment: These are well-performed studies.

      Response: We thank the reviewer for their positive comment.

      Comment: It would be useful to know the blood levels of the drug that are achieved by the infusions, and how long the drugs remain after infusion. Is the 45-minute interval between doses appropriate for the drug's kinetics?

      Response: While blood levels of ampakine were not tested in this study, pharmacokinetic parameters for CX1739 in Sprague Dawley rats have previously been determined following an intravenous administration of CX1739. The mean plasma half-life of CX1739 was 1.25 ± 0.03 hrs, with a Tmax of 30 minutes (information provided through personal communication with RespireRx). Although the 45 minutes interval between doses would not be within the time frame of post administration clearance of the first CX1739 dose from the system, the plasma levels would be considerably lower by 45 mins post administration. A limitation of terminal cystometry preparations is the duration you can maintain a single animal, and this was also included in our rationale for dosing every 45 mins. In our experience longer recordings can increase variability. A 45 min window allowed for the anesthetized procedure to remain under ~6 hours. Further, in our studies investigating the impact of ampakines in rats following an SCI, acute impacts of intravenous ampakine administration were observed for up to 30 minutes. (Rana et al., 2021) Along with the half-life and data from the respiratory system informed our decision here. We have added this rationale to the methods section and in part to the discussion section (Page 11, 2930).

      Comment: Since a major plus of these studies is their potential applicability to humans with SCI, it would be helpful to know whether the drug levels achieved here resemble those that were achieved in human trials to date.

      Response: Since blood/plasma levels were not tested in the current study, we cannot comment on the comparison of blood plasma levels achieved in human trials. However, we have expanded upon this point in the discussion section (page 29-30).

      Comment: The authors could also provide us with a bit more description of the different classes of ampakines, and why they chose the one they used.

      Response: Thank you for this suggestion. We would like to highlight a section in our discussion (Page 28-29) where we have an in-depth description of the two classes of ampakines in the discussion and the rationale for selecting the low-impact CX1739 drug.

      Comment: Lastly, the first reference is cited twice in the bibliography.

      Response: The duplicate reference has been removed.

      Reviewer #2 (Recommendations For The Authors):

      Comment: Overall, the findings support the potential for ampakine administration in the setting of neurogenic bladder dysfunction following SCI. The manuscript was well written, the experimental design was rigorous, the data were of excellent quality, and the conclusions were adequately supported by the findings. Weaknesses are considered minor and can be addressed mostly by clarification as noted below.

      Response: We thank the reviewer for their positive comments.

      Comment: Since CX1789 was provided in the context of cystometry and EUS EMG, a brief discussion of how ampakines could be used in a therapeutic context in humans would help to understand the translational significance of the work.

      Response: Thank you for this important comment to include a discussion about translational significance of CX1739. We have included a discussion (Page 34) about the translational significance of this work in the discussion section of the last paragraph.

      Comment: No information is provided on the half-life of CX1789 and how might this impact the implementation of CX1789 for clinical use. The inclusion of this information would help the reader to appreciate the potential for and limitations of clinical implementation.

      Response: Although pharmacokinetic analyses were not conducted as part of this study, we have included details of CX1739 plasma pharmacokinetics examined in Sprague-Dawley rats (Page 11, 29-30). This information has been provided through personal communication with RespireRx.

      Comment: The study was limited to female rats. Would the authors anticipate different efficacy of CX1789 in male rats? A comment on the choice of animal sex and implications for interpretation of the findings would strengthen the discussion and potential clinical implementation given the male bias of traumatic SCI in humans.

      Response: Thank you for your important comment. In this study, females were chosen primarily due to the fact they have better recovery outcomes from spinal cord injury. During initial preliminary data gathering, we used both male and female rats and found that the male rats often did not recover cytometric voiding at this time point. So we chose to continue only with the female rats in this current study. It is well established that female rats have better urogenic recovery from SCI effects, perhaps due to the easier postoperative care. It is critical that we complete future studies in both male and female rats, however, we will have to change our experimental paradigm (time after injury, and or severity of injury) to make comparisons between SCI and intact male rats. We have now included this important topic of our sex selection in the methods section (Page 6) of the manuscript and have also expanded this point in the discussion section (page 30).

      Reviewer #3 (Recommendations For The Authors):

      Comment: The impact of ampakine treatment on EUS EMG activity is not obvious from the data presented in Fig. 5C-F. I do see in the magnified area of the SCI rat tracing some clear EUS activity with 15 mg/kg of CX1739. However, statistically, there is not a significant improvement in bladder-urethral sphincter coordination in rats treated with ampakine. Authors should discuss how or why ampakine treatment improves bladder function without affecting bladder-urethral sphincter coordination. The background noise of the EUS EMG in Fig. 5B changes dramatically between conditions. Are these tracings from the same experiment? If yes, please explain why the background noise changes during the course of the experiment. Was this change in background noise observed only in SCI rats?

      Response: Thank you for such an interesting comment. Although our data analysis shows no statistically significant difference in the duration or amplitude of EUS EMG bursting when comparing vehicle to ampakine treatment. However, we did see a difference in the threshold at which bursting occurred (Fig 5C-F). Rats that lost complete coordination (Figure 6) due to injury, ampakines provide further confirmation about producing EUS EMS bursting and coordinated voiding.

      Therefore, these results suggest that ampakines have some positive modulatory effects on EUS EMG bursting events. Overall, we did not see any significant differences of the background noise of EUS EMG between conditions during experiments both in spinal intact and SCI. The background noise of the EUS EMG in Fig. 5B decreases after baseline and HPCD due to changes in experimental conditions (needed to use slightly more urethane due to showing up of animal’s consciousness). We would also like to confirm that these tracings are from the same experiment. Accordingly, we have made further clarifications in the manuscript.

      Comment: Tables 1 and 2 show the same data as figures 3 and 4. I suggest removing the tables. In addition, table 2 includes letters (A, B, C, D) to indicate statistical significance. However, no indication of the meaning of these letters is provided. What does "levels not connected by same letter are significantly different" mean? Please clarify. I suggest including the statistical comparisons in Fig. 4

      Response: While we did consider adding statistical bars in the graphs themselves, the number of comparisons being conducted reduced the readability of the graphs. Thus, we would like preserve the current format of the table and provide the readers with all statistical comparisons being made. The statement “levels not connected by the same letter are significantly different” indicates that only treatment groups for an outcome that do not have an overlapping letter, such as baseline (A) and HPCD (A) values for threshold pressures are different from the 5 mg/kg (B,C,D), 10 mg/kg (C,D) and 15 mg/kg (D) group in the SCI rats. Further, threshold pressures in the 5 mg/kg, 10 mg/Kg and 15 mg/kg groups are not significantly different from each other. These results have also been described in detail in the results section. Lastly, we acknowledge the redundancy of data presented in Tables 1 and 2. These two tables have been moved to the supplemental section.

      Comment: A study by Yoshiyama and colleagues previously showed that the AMPA antagonists LY215490 completely abolished the reflex bladder contractions and EMG activity of the EUS muscle during a continuous filling in naïve rats (JPET 1997). Surprisingly, CX1739, a low-impact AMPA receptor activator, does not affect bladder contractions or EMG activity in naïve rats. Authors should discuss the reason for this discrepancy.

      Response: Thank you for this comment. We believe the different pharmacokinetics of the drugs can explain these effects. We have included this critical point in the discussion (page 31-32).

      Comment: The conclusion that CX1739 is acting on sensory pathways is highly speculative and needs additional support. The functional status of the afferent pathways is uncertain following SCI. Please revise.

      Response: Thank you for this comment. We agree, in retrospect, that this speculative comment is an overassumption, and we have removed it from the discussion. We have modified the discussion to remove focus from the sensory nervous system and, more generally, discuss the location of AMPA receptors in the voiding neurocircuitry (page 31).

      Comment: Figure 3. It's difficult to see the asterisks that indicate statistical significance. Please use a line or a bigger symbol to indicate statistical differences between groups.

      Response: Thank you for the suggestion we have modified the figure to make the asterisks bigger and added a line.

      Comment: Data for peak pressure should be included in Figures 3 and 4.

      Response: Thank you for pointing out one of the important parameters of cystometry which is peak pressure. As we did not see significant changes in bladder peak contraction pressure between spinal intact and SCI rats, we prefer not to show a graph of peak pressure (in Fig 3) to highlight other parameters that showed significant injury effects, such as baseline pressure, ICI, threshold, and voided volume. However, peak pressure reduced similarly both in spinal intact and SCI rats, suggesting that ampakine has some treatment effects on peak pressure that we prefer to include in Fig 4. We modified our results section and have included a description on peak pressures in the result section.

      Comment: The peak pressure was reduced in both naïve and SCI rats treated with ampakine. Therefore, the peak pressure is not one of the parameters that improves by ampakine in SCI rats.

      Response: Yes, we agree that peak pressures between spinal intact and SCI rats were comparable. Some treatment effects of ampakine on peak pressure were observed both between spinal intact and SCI rats. We have amended the manuscript to make this clearer.

      Comment: The reference from Yoshiyama et al (1999) is duplicated.

      Response: Thank you for catching this error. The references have been combined in the revised version.

      Comment: Page 15, the authors state that "Coordinated bladder contractions and associated EUS EMG activity were readily demonstrated in all 7 naïve animals". In other sections, they referred to 8 naïve rats. What is the actual number of naïve rats?

      Response: Thanks for pointing out this error. The actual number of naïve rats is 8. We have rectified this error.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Editors and the Reviewers for their comments on the importance of our work “showing a new role of caveolin-1 as an individual protein instead of the main molecular component of caveolae” in contributing to membrane bending rigidity and for constructive and thoughtful remarks that have allowed us to improve the manuscript.

      Indeed, we here establish the contributing role of caveolin-1 to membrane mechanics by a molecular mechanism that needs to be further addressed. To that respect, we thank the reviewers for suggesting avenues to improve the presentation and discussion of our hypotheses based on results of theoretical model and independent biophysical measurements of membrane mechanics in tube pulling from plasma membrane spheres, which concur to support the key role of caveolin-1 in building membrane bending rigidity.

      To fulfill the recommendations of the reviewers we have modified the manuscript, as discussed below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Because of the role of membrane tension in the process, and that caveloae regulate membrane tension, the authors looked at the formation of TEMs in cells depleted of Caveolin1 and Cavin1 (PTRF): They found a higher propensity to form TEMs, spontaneously (a rare event) and after toxin treatment, in both Caveolin 1 and Cavin 1. They show that in both siRNA-Caveolin1 and siRNA-Cavin1 cells, the cytoplasm is thinner. They show that in siCaveolin1 only, the dynamics of opening are different, with notably much larger TEMs. From the dynamic model of opening, they predict that this should be due to a lower bending rigidity of the membrane. They measure the bending rigidity from Cell-generated Giant liposomes and find that the bending rigidity is reduced by approx. 50%.

      Strengths:

      They also nicely show that caveolin1 KO mice are more susceptible to death from infections with pathogens that create TEMs.

      Overall, the paper is well-conducted and nicely written. There are however a few details that should be addressed.

      See below modifications brought to the manuscript in response to the Reviewer’s remarks.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Morel et al. aims to identify some potential mechano-regulators of transendothelial cell macro-aperture (TEM). Guided by the recognized role of caveolar invaginations in buffering the membrane tension of cells, the authors focused on caveolin-1 and associated regulator PTRF. They report a comprehensive in vitro work based on siRNA knockdown and optical imaging approach complemented with an in vivo work on mice, a biophysical assay allowing measurement of the mechanical properties of membranes, and a theoretical analysis inspired by soft matter physics.

      Strengths:

      The authors should be complimented for this multi-faceted and rigorous work. The accumulation of pieces of evidence collected from each type of approach makes the conclusion drawn by the authors very convincing, regarding the new role of cavolin-1 as an individual protein instead of the main molecular component of caveolae. On a personal note, I was very impressed by the quality of STORM images (Fig. 2) which are very illuminating and useful, in particular for validating some hypotheses of the theoretical analysis.

      Weaknesses:

      While this work pins down the key role of caveolin-1, its mechanism remains to be further investigated. The hypotheses proposed by the authors in the discussions about the link between caveolin and lipids/cholesterol are very plausible though challenging. Even though we may feel slightly frustrated by the absence of data in this direction, the quality and merit of this paper remain.

      We thank the reviewer for mentioning the merit of our work which lays the foundations for more molecular mechanistic work on a possible role of lipids/cholesterol in the building of membrane bending rigidity by caveolin-1 and which is currently carried out by some of the authors, and which shows that the question is indeed challenging as indicated by the reviewer. This is now stated in the results section, as suggested (Page 12) :

      "To test these predictions, we have treated cells with methyl-beta-cyclodextrin to deplete cholesterol from the plasma membrane and reduce its bending rigidity (47); unfortunately, this treatment affected the cell morphology, which precluded further analysis."

      The analogy with dewetting processes drawn to derive the theoretical model is very attractive. However, although part of the model has already been published several times by the same group of authors, the definition of the effective membrane rigidity of a plasma membrane including the underlying actin cortex, was very vague and confusing.

      We thank the reviewer for mentioning the importance of defining the terms “membrane bending rigidity” as well as “effective membrane bending rigidity” that is now used and defined in the material and method section in the Physical modelling description (see considerations below), while for the sake of simplicity we use the term “membrane bending rigidity” in the main text, which is now defined in the introduction section : “membrane bending rigidity, i.e. the energy required to locally bend the membrane surface”.

      Indeed, in a liposome, a rigorous derivation leads to a relationship between the membrane tension and the variation of the projected area, which are related by the bending rigidity: this relationship is known as the Helfrich’s law. This statistical physics approach is only rigorously valid for a liposome, whereas its application to a cell is questionable due to the presence of cytoskeletal forces acting on the membrane. Nevertheless, application of the Helfrich’s law to cell membranes may be granted on short time scales, before active cell tension regulation takes place (Sens P and Plastino J, 2015 J Phys Condens Matter), especially in cases where cytoskeletal forces play a modest role, such as red blood cells (Helfrich W 1973 Z Naturforsch C). The fact that the cytoskeletal structure and actomyosin contraction are significantly disrupted upon cell intoxication-driven inhibition of the small GTPase RhoA, as shown here for the first time by STORM analysis, supports the applicability of Helfrich’s law to describe TEM opening. Because of the presence of proteins, carbohydrates, and the adhesion of the remaining actin meshwork after toxin treatment, we expect the Helfrich relationship to somewhat differ from the case of a pure lipidic membrane. We account for these effects via an “effective bending rigidity”, a term used in the detailed discussion of the model hypotheses, which corresponds to an effective value describing the relationship between membrane tension and projected area variation in our cells.

      The following discussion has been extended and improved in the Physical modeling part of the materials & methods section (Pages 23-24): “κ is the effective bending rigidity of the cell membrane, which quantifies the energy required to bend the membrane. (…). While rigorously derived for a pure lipid membrane, we assumed that Helfrich’s law is applicable to describe the relationship between the effective membrane tension acting on TEMs and the observed projected surface in our cells. We expect Helfrich’s law to be applicable on short time scales, before active cell tension regulation takes place (73), especially in cases where cytoskeletal forces play a modest role, such as for red blood cells (74) or for the highly disrupted cytoskeletal structure of our intoxicated cells. Thus, the parameter κ in Eq. 2 is an effective bending rigidity, whose value may somewhat differ from that of a pure lipid membrane to account for the role played by protein inclusions and the mechanical contribution of the remaining cytoskeletal elements after cell treatment with the toxin”

      Here, for the first time, thanks to the STORM analysis, the authors show that HUVECs intoxicated by ExoC3 exhibit a loose and defective cortex with a significantly increased mesh size. This argues in favor of the validity of Helfrich formalism in this context. Nonetheless, there remains a puzzle. Experimentally, several TEMs are visible within one cell. Theoretically, the authors consider a simultaneous opening of several pores and treat them in an additive manner. However, when one pore opens, the tension relaxes and should prevent the opening of subsequent pores. Yet, experimentally, as seen from the beautiful supplementary videos, several pores open one after the other. This would suggest that the tension is not homogeneous within an intoxicated cell or that equilibration times are long. One possibility is that some undegraded actin pieces of the actin cortex may form a barrier that somehow isolates one TEM from a neighboring one.

      As pointed by the Reviewer, we expect that membrane tension is neither a purely global nor a purely local parameter. Opening of a TEM will relax membrane tension over a certain distance, not over the whole cell. Moreover, once the TEM closes back, membrane tension will increase again. This spatial and temporal localization of membrane tension relaxation explains that the opening of a first TEM does not preclude the opening of a second one or enlargement of the TEM when the actin cable is cut by laser ablation (20). On the other hand, membrane tension is not a purely local property. Indeed, we observe that when two TEMs enlarge next to each other, their shape becomes anisotropic, as their enlargement is mutually hampered in the region separating them. We account for this interaction by treating TEM membrane relaxation in an additive fashion. We emphasize that this simplified description is used to predict maximum TEM size, corresponding to the time at which TEM interaction is strongest. As the reviewer points out, it would be more questionable to use this additive treatment to predict the likelihood of nucleation of a new TEM, which is not done here.

      Accordingly, the Physical modelling part in the materiel and methods has been modified into: “Eq. 2 treats the effect of several simultaneous TEMs in an additive manner. This approximation is used here to predict TEM size, because at maximum opening of simultaneous TEMs their respective membrane relaxation is felt by each other, as it can be inferred from the shape that neighboring TEMs adopt in experiments. This additive treatment would appear less appropriate to describe the likelihood of nucleating a second TEM in the presence of a first one (a calculation that is not performed here), since membrane relaxation by a TEM may not be felt at membrane regions distant from it.”

      Could the authors look back at their STORM data and check whether intoxicated cells do not exhibit a bimodal population of mesh sizes and possibly provide a mapping of mesh size at the scale of a cell?

      To address the question raised by the Reviewer we decided to plot the whole distribution of mesh sizes in addition to the average value per cell. We did not observe a bimodal distribution but rather a very heterogeneous distribution of mesh size going up to a few microns square in all conditions of siRNA treatments. Moreover, we did not observe a specific pattern in the distribution of mesh size at the scale of the cell, with very large mesh sizes being surrounded by small ones. We also did not observe any specific pattern for the localization of TEM opening, as described in the paper, making the correlation between mesh size and TEM opening difficult.

      This following sentence has been added in the results section (Pages 8-9): “Indeed, we observed in cells treated with ExoC3 no specific cellular pattern or bimodal distribution of mesh size between the different siRNA conditions but a rather very heterogeneous distribution of mesh size values that could reach a few square microns in all conditions. ”

      In particular, it is quite striking that while bending rigidity of the lipid membrane is expected to set the maximal size of the aperture, most TEMs are well delimited with actin rings before closing. Is it because the surrounding loose actin is pushed back by the rim of the aperture? Could the authors better explain why they do not consider actin as a player in TEM opening?

      Actin ring assembly and stiffening is indeed a player in TEM opening, that was investigated in the work by Stefani et al., 2017 Nat comm. Interference of actin ring assembly and stiffening is included in our differential equation describing TEM opening dynamics (second term on the left-hand side of Eq. 3). In some cases, actin ring assembly is the dominant player, such as in TEM opening after laser ablation (ex novo TEM opening/widening). In contrast, here we investigate de novo TEM opening, for which we expect that bending rigidity can be estimated without accounting for actin assembly, as we previously reported (19). Such a bending rigidity estimate (Eq. 5) is obtained by considering two different time scales: the time scale of membrane tension relaxation, governed by bending rigidity, and the time scale of cable assembly, governed by actin dynamics. We expect the first time scale to be shorter, and thus the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation. Two paragraphs related to the discussion of the different time scales have been added to 1) the discussion section, and 2) to the physical modelling part discussed in the materiel and methods section of the revised manuscript (see below).

      The following paragraph has been added in the discussion (Pages 14-15): “Our study shows that membrane rigidity sets the maximal size of TEM aperture, although an actin ring appears before TEM closure (20). Actin ring assembly and stiffening is indeed a player in TEM opening, and it is included in our differential equation describing TEM opening dynamics (Eq. 3). In some configurations, actin ring assembly is the dominant player, such as in TEM opening after laser ablation (ex novo TEM opening), as we previously reported (20). In contrast, here we investigate de novo TEM opening, for which we expect that bending rigidity can be estimated without accounting for actin assembly (19). Such a bending rigidity estimate (Eq. 5) is obtained by considering two different time scales: the time scale of membrane tension relaxation, governed by bending rigidity, and the time scale of cable assembly, governed by actin dynamics. We expect the first-time scale to be shorter, and thus the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation. However, we cannot rule out that the formation of an actin cable around the TEM before it reaches its maximum size may limit the correct estimation of the bending rigidity.”

      The following paragraph has been added in the physical modelling part of the materiel and methods section (Pages 24-25) “A limitation of our theoretical description arises from the use of spatially uniform changes in parameter values to describe differences between experimental conditions, thus assuming spatially uniform effects. However, we cannot exclude the existence of non-uniform effects, such as changes in the size and organization of the remaining actin mesh, which could set local, non-uniform barriers to TEM enlargement in a manner not accounted for by our model.” And “We note that the estimate of κ provided by Eq. 5 is independent of α and thus of actin cable assembly. This simplification arises from membrane tension relaxing over a shorter time scale than actin assembly. Thus, we expect the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation (19), unlike ex novo TEM enlargement upon laser ablation, for which the dynamics of actin cable assembly control TEM opening (20)”

      Instead of delegating to the discussion the possible link between caveolin and lipids as a mechanism for the enhanced bending rigidity provided by caveolin-1, it could be of interest for the readership to insert the attempted (and failed) experiments in the result section. For instance, did the authors try treatment with methyl-beta-cyclodextrin that extracts cholesterol (and disrupts caveolar and clathrin pits) but supposedly keeps the majority of the pool of individual caveolins at the membrane?

      As recommended by the reviewer we have added the following sentence (Page 12): “We have treated cells with methyl-beta-cyclodextrin to deplete cholesterol from the plasma membrane and reduce its bending rigidity (47); unfortunately, this treatment affected the cell morphology, which precluded further analysis”

      Tether pulling experiments on Plasma membrane spheres (PMS) are real tours de force and the results are quite convincing: a clear difference in bending rigidity is observed in controlled and caveolin knock-out PMS. However, one recurrent concern in these tether-pulling experiments is to be sure that the membrane pulled in the tether has the same composition as the one in the PMS body. The presence of the highly curved neck may impede or slow down membrane proteins from reaching the tether by convective or diffusive motion.

      We thank the Reviewer for mentioning the dedicated work accomplished with tether pulling experiments on PMS and for pointing the obtention of convincing results that align well with the hypotheses drawn from the theoretical model thereby allowing us to propose a direct or indirect role of caveolin-1 in the building of membrane rigidity. As pointed out by the reviewer, a concern with tube pulling experiments is related to the dynamics of equilibration of membrane composition between the nanotube and the rest of the membrane. In our experiments, we have waited about 30 seconds after tube pulling and after changing membrane tension. We have checked that after this time, the force remained constant, implying that we have performed experiments of tube pulling from PMS in technical conditions of equilibrium that ensure that lipids and membrane proteins had enough time to reach the tether by convective or diffusive motion.

      The revised version of the manuscript now includes the following sentence and a representative example of force vs time plot (Page 12): “We waited about 30 seconds after tube pulling and changing membrane tension and checked that we reached a steady state (Fig. S5), where lipids and membrane proteins had enough time to equilibrate.”

      Could the authors propose an experiment to demonstrate that caveolin-1 proteins are not restricted to the body of the PMS and can access to the nanometric tether?

      In principle, this could be further checked using cells expressing GFP-caveolin-1 to generate PMS as done in Sinha et al., 2011 and by analyzing a steady protein signal in the tube. This would confirm the equilibration, provided that caveolin-1 is recruited in the nanotube due to mechanical reasons that are now discussed in the discussion section (Pages 13-14) : “Our tube pulling experiments can be discussed along 2 lines. Indeed, since caveolin-1 is inserted in the cytosolic leaflet of the plasma membrane, when a nanotube is pulled towards the exterior of the PMS, we can expect 2 situations depending on the ability of caveolin-1 to deform membranes, which remains to be addressed (24). i) If Cav1 does not bend membranes, it could be recruited in the nanotube at a density similar to the PMS and our force measurement would reflect the bending rigidity of the PMS membrane. Cav1 could then stiffen membrane either as a stiff inclusion at high density or/and by affecting lipid composition. ii) If Cav1 bends the membrane, it is expected from caveolae geometry that the curvature in the tube would favor Cav1 exclusion. The force would then reflect the bending rigidity of the membrane depleted of Cav1, which should be the same in both types of experiments (WT and Cav1-depleted conditions) if the lipid composition remains unchanged upon Cav1 depletion. Note that the presence of a very reduced concentration of Cav1 as compared to the plasma membrane has been reported in tunneling nanotubes (TNT) connecting two neighboring cells (51). These TNTs have typical diameters of similar scale than diameters of tubes pulled from PMS. At this stage, we cannot decipher between both properties for Cav1. Considering a direct mechanical role of Cav1, previous studies showed that inclusion of integral proteins in membranes had no impact on bending rigidity, as shown in the bacteriorhodopsin experiment (52), or even decreased membrane rigidity as reported for the Ca2+-ATPase SERCA (53). Previous simulations have also confirmed the softening effect of protein inclusions (54). Nevertheless, our observations could be explained by a high density of stiff inclusions in the plasma membrane (>>10%), which is generally not achievable with the reconstituted membranes. Considering an impact on lipid composition, it is well established that caveolae are enriched with cholesterol, sphingomyelin, and glycosphingolipids, including gangliosides (55,56), which are known to rigidify membranes (57,47). Thus, caveolin-1 might contribute to the enrichment of the plasma membrane with these lipid species. We did not establish experimental conditions allowing us to deplete cholesterol without compromising the shape of HUVECs, which prevented a proper analysis of TEM dynamics. Moreover, a previous attempt to increase TEMs width by softening the membrane through the incorporation of poly-unsaturated acyl chains into phospholipids failed, likely due to homeostatic adaptation of the membrane’s mechanical properties (18). Further studies are now required to establish whether and how caveolin-1 oligomers control membrane mechanical parameters through modulation of lipids organization or content. Caveolin-1 expression may also contribute to plasma membrane stiffening by interacting with membrane-associated components of the cortical cytoskeletal or by structuring ordered lipid domains. Nevertheless, it has been reported that the Young’s modulus of the cell cortex dramatically decreases in ExoC3-treated cells (17) suggesting a small additional contribution of caveolin-1 depletion to membrane softening. This is supported by 2D STORM data showing a dramatic reorganization of actin cytoskeleton in ExoC3-treated cells into a loose F-actin meshwork that is not significantly exacerbated by caveolin-1 depletion. Altogether, our results suggest that the presence of Cav1 stiffens plasma membranes, and that the exact origin of this effect must be further investigated.”

      Author recommendations

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improvements:

      (1) Depletion of both Cavin1 and Caveolin1 increases the density of TEMs. Membrane tension is a critical parameter of the initiation phase of TEMs, its nucleation, and initial enlargement. From the TEM dynamics, the authors should be able to measure membrane tension. The expectation is that in both Caveolin1 and Cavin1 depleted cells, tension is higher (because there is no caveolae), explaining why there are more TEMs.

      While we cannot directly measure membrane tension, we can estimate membrane tension variations using our theoretical modeling. As reported in the article, we predict that depleting Caveolin-1 leads to a significant 2-fold increase of membrane tension, which can explain the concomitant increase in the nucleation of TEMs, as the reviewer points out. In contrast, the model predicts no significant increase of membrane tension upon Cavin-1/PTRF depletion, whereas TEM nucleation also increases significantly (but less than upon Caveolin-1 depletion). Altogether, we can explain these results by considering that membrane tension is an important player in TEM nucleation, but not the only one. Notably, we expect cell height to be another important player, as it sets an energy barrier for the basal and apical membranes to meet each other and fuse. Indeed, we report that membrane height is reduced upon depletion Cavin-1, thus explaining the observed increase in TEM nucleation. The importance of reducing cell thickness to increase the TEM opening likelihood is best supported by previous data showing that pushing forces applied on the apical membrane induced the opening of TEMs (Ng et al., 2017 MBoC).

      An improved discussion of the parameters controlling TEM nucleation has been included in the discussion of the revised manuscript, as follow (Page 15): “Our study points to underlying mechanisms by which caveolae regulate the frequency of TEM nucleation. Nucleation of TEMs requires the apposition of the basal and apical cell membranes, which is hindered by the intermembrane distance, set by the cell height. Meeting of the two membranes may create an initial precursor tunnel, which needs to be sufficiently big to enlarge into an observable TEM, instead of simply closing back. The size of the minimal precursor tunnel required to give rise to a TEM increases with membrane bending rigidity and decreases with membrane tension (19). Silencing cavin-1 or caveolin-1 both lead to a decrease in cell height, thus favoring the likelihood of precursor tunnel nucleation. While silencing cavin-1 has no significant impact on either membrane tension or bending rigidity, silencing caveolin results in both an increase of membrane tension and a decrease of bending rigidity, which results in a decrease in the required minimal radius of the precursor tunnel, thus further favoring TEM nucleation. Overall, our results offer a consistent picture of the physical mechanisms by which caveolae modulate TEM nucleation.”

      (2) In Figure 2B, the authors state that there is no significant difference in the actin mesh size while I see a clear higher average value and distribution in siCAV1+. This seems to correlate with the differences in TEM maximal sizes. How can the authors completely exclude that the actin organisation is not in part responsible for the larger TEMs observed in siCAV1 cells?

      In our theoretical modeling of TEM opening dynamics, all differences between conditions are described by changes in what we consider as “effective” parameter values. Thus, changes in actin organization may induce a change in the "effective bending rigidity" parameter controlling membrane tension relaxation. A limitation of such a description is that all changes are assumed to be spatially uniform. However, it is possible that changes in actin mesh size and organization set local barriers to TEM enlargement in a way that would not be appropriately described by our model. While our current modeling appears to provide a consistent interpretation of our observations, we cannot completely exclude the existence of such local effects.

      This limitation of our current interpretation is now mentioned in the following paragraph, which has been added in the physical modelling part of the materiel and methods section (Page 24) : “A limitation of our theoretical description arises from the use of spatially uniform changes in parameter values to describe differences between experimental conditions, thus assuming spatially uniform effects. However, we cannot exclude the existence of non-uniform effects, such as changes in the size and organization of the remaining actin mesh, which could set local, non-uniform barriers to TEM enlargement in a manner not accounted for by our model.”

      (3) It would be nice to see the results of Table 1 (in particular the thickness of cells) in a Bar plot.

      The experimental values of cell volumes and areas are reported in bar plots of Fig. 3C and 3D. In contrast, we chose not to depict values of cell eight in bar plots considering that these values were calculated from mean values of cell areas and volumes reported in Fig. 3C and 3D, i.e. rough division of volumes over areas, with error propagation. Since the volume and areas are not performed on the same set of cells, it is not possible to divide the repeats one by one and to provide cell numbers, which are key parameters to perform statistical tests.

      (4) There are two reasons why Caveolin1 could change the bending rigidity. First, because it makes the membrane stiffer, or because the presence of caveolin1 (that binds to cholesterol) in the plasma membrane changes the lipid composition. It would be nice if the authors could provide some lipidomics analysis to see if there is a lipid change in siCAV1 cells.

      We thank the reviewer for pointing the importance of clarifying the hypotheses regarding a direct or indirect role of caveolin-1 in membrane bending rigidity which might be related to changes in membrane lipid composition especially cholesterol and sphingomyelin. We have modified the discussion section to integrate this point. The lipidomic approach is certainly interesting to address the question of the role of caveolin-1 in building membrane bending rigidity. Indeed, some of the authors have addressed the specific questions related to Cav-1 spontaneous curvature and its effect on the lipid composition of the plasma membrane in two separate manuscripts (in preparation). They represent comprehensive studies by themselves that will provide mechanistic insights on how caveolin-1 builds membrane bending rigidity and as follow up of the present manuscript which reports the importance of the regulation of membrane rigidity in cell biology and during infectious processes.

      Reviewer #2 (Recommendations For The Authors):

      The paper is nicely written and the results are convincing. The three main comments and questions from the Public Review do not necessarily call for new experiments. However, clarifications are required. This work can be very useful. Better not to leave any difficulty or weakly justified hypothesis under the carpet.

      To fulfill with the reviewer comments, we have improved the discussion regarding the hypothesizes which can be drawn about of a direct versus indirect mechanistic role of caveolin-1 in the regulation of effective membrane bending rigidity and which might be related to changes in membrane lipid composition or via regulation of the cytoskeleton, which we cannot exclude.

      • Minor correction: in the abstract: replace "the enhanced nucleation" with "the enhanced occurrence of nucleation events".

      The abstract has been changed accordingly : “The enhanced occurrence of TEM nucleation events correlates with a reduction of cell height, …”

    1. Author Response

      The authors' responses to the public reviews can be found here


      The following is the authors’ response to the most recent recommendations.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I appreciate the effort that the authors have put into this revised version of the manuscript. Before going into details, I would suggest that, in the future, the authors include enough information in their response to allow reviewers to follow the changes made. Not simply "Fixed", but instead "we have modified the description of these results and now state on lines XXX to XXX (revised text)".

      We greatly apologize, we certainly did not wish to cause more work for the reviewer to find the necessary changes. We will list the line number and our changes in the following response.

      The authors' response to my comments was confined to the minor points, with no attention to more important questions regarding speculations about mechanism which were (and still are) presented as factual conclusions. I do not consider the responses adequate.

      We responded to each of your comments and where we disagree, we have explained in detail.

      With respect to the meaning of "above" and "below" in the context of an intracellular organelle, I think that referring to up and down in a figure is fine, provided that the cytoplasmic and luminal sides are indicated in that figure. I think that labeling to that effect in each figure would be immensely helpful for the reader.

      We agree with this point and have updated all the figures to include these labels.

      The statement on lines 333-335 about non-competitive inhibition is a bit naïve. The only thing ruled out by this type of inhibition is that substrate and TBZ binding do not share the same binding process, in which case they would compete. It doesn't show that TBZ gets to its binding site from the lumen or from the bilayer, or by any other process that isn't shared with substrate. It also doesn't rule out kinetic effects, such as slow inhibitor dissociation, that result in non-competitive kinetics. Please rewrite this sentence to indicate that one explanation of the non-competitive nature of TBZ inhibition would be that TBZ diffuses into the vesicle and binds from the lumen. It's not the only explanation.

      We have changed this sentence lines 334-336 to be more speculative and not include any statement about non-competitive inhibition. Please see, “Studies have proposed that TBZ first enters VMAT2 from the lumenal side, binding to a lumenal-open conformation.”

      The revised version integrates the MD simulations into a plausible mechanism for luminal release of substrate. A key element in this mechanism is the protonation of D33, E312 and D399, which allows substrate to leave following water entry into the binding site. The acidic interior of synaptic vesicles should facilitate such protonation, but the fate of those protons needs to be considered. Are any of them predicted to dissociate prior to the return to a cytoplasm-facing conformation? If so, are all 3 released in that conformation? Postulating protonation events at one point in the reaction cycle requires some accounting for those protons - or at least recognition of the problem of reconciling their binding with the known stoichiometry of VMAT.

      We completely agree with this point and while we cannot account for all protons with a single structure and simulation of neurotransmitter release, some discussion of the fate of the protons is warranted. We have included a highly speculative statement in the discussion on this point, see lines 462-465, “Given the known transport stoichiometry of two protons per neurotransmitter, we speculate that two protons may dissociate back into the lumen, perhaps driven by the formation of salt bridges between D33 and K138 or R189 and E312 for example in an cytosol-facing state.”

      Reviewer #3 (Recommendations For The Authors):

      On page 13, line 238, the statement "The protonation states of titratable residues D33, E312, D399, D426, K138 and R189, which are in close proximity to TBZ, also impact its binding stability (Table 4)" is misleading. Table 4 only shows that D426 is charged and what the pKa values are. This should be rephrased to separate out which residues are in close proximity from what is known about how their protonation states affect TBZ stability.

      We agree with this statement and have rephrased this on line 290-294 on page 13 to read, “Several titratable residues, including D33, E312, D399, D426, K138, and R189, line the central cavity of VMAT2 and impact TBZ binding stability (Table 4). We found that maintaining an overall neutral charge within the TBZ binding pocket, as observed in system TBZ_1, most effectively preserves the TBZ-bound occluded state of VMAT2. Residues R189 and E312 in particular are within close proximity of TBZ and participate directly in binding.” We note that given the acidic pH of the vesicle lumen (5.5), it is likely all four residues may be protonated to a significant degree in this state.

      Typos:

      • luminal is another name for the drug generically known as phenobarbital, lumenal means in the lumen. (This typo seems to have crept into the published literature now too).

      Thank you for pointing this out. Indeed, we had considered carefully whether to use ‘lumenal’ or ‘luminal’ in our revised text. In fact, both are used interchangeably throughout the scientific literature and luminal is the more commonly used term. Please also see: https://www.merriam-webster.com/medical/luminal we do agree that there may be confusion because ‘Luminal’ is a trademark of phenobarbital. Therefore, we have changed the text to read ‘lumenal’ throughout.


      The following is the authors’ response to the original recommendations.

      Reviewer #1 (Recommendations For The Authors):

      I congratulate the authors on this study, which I enjoyed reading. Overall, the study reports a novel and exciting new structure for a member of the SLC18 family of vesicular monoamine transporters. Associated MD, binding and transport assays provide support for the hypothesis and firm up the modelled pose for the TBZ drug. The main strengths of the study largely sit with the structure, which, as the authors say, provides additional and essential insights above those available from AF2. The structures also reveal several potentially interesting observations concerning the mechanism of gating and proton-driven transport. The main weakness lies in the limited mutational data and studies into the role of pH in regulating ligand binding. As detailed below, my main comment would be to spend a little extra time expanding the mutational data (perhaps already done during the review?) to enable more evidence-based conclusions to be drawn.

      We thank reviewer #1 for their helpful comments and suggestions. We agree that mutational analysis specifically of neurotransmitter transport would strengthen the mechanistic conclusions of the work. We also agree with reviewer #1 and #3 that the role of pH and the protonation state of charged residues was a weakness in the first version of the manuscript. Therefore, we have expanded our mutational and computational data as detailed below and we believe that this has further solidified our findings.

      Specific comments & suggestions:

      It is an interesting strategy to fuse the mVenus and anti-GFP nanobody to the N-/C-termini. The authors should also include in SI Fig. 1 a full model for the features observed in these maps and deposit this in the PDB.

      Great point, we have made a main text panel describing the construct. Figure S1 includes a full description of the construct. The reviewer will note that the PDB entry contains the entire amino acid sequence of the construct and while the GFP and GFP-Nb cannot be well modeled into the density, we have included all of the relevant information for the reader.

      Difficult to make out the ligand in Fig. 2b, I would suggest changing the color of the carbon atoms.

      Fixed.

      It is difficult to make out the side chains in ED Fig. 5d.

      This is now its own supplemental figure and is presented larger.

      ED Figures are called out of order in the manuscript. For example, in line 143 ED Fig.6 is called before ED Fig. 5d (line 152), and then ED 5d is called before ED 5a. This makes it rather confusing to follow the description, analysis, and data when reading the paper. Although there are other examples. I would suggest trying to order the figure callouts to flow with the narrative of the study.

      Agreed. Fixed.

      It wasn't clear to me what the result was produced by just imaging the ligand-free chimaera protein. It would be useful to say whether this resulted in low-resolution maps and whether the presence of the TBZ compound was essential for high-resolution structure determination.

      The ligand is likely required for structure determination. We have not, however, made such a statement largely because we have yet to determine an apo reconstruction.

      The role of E127 and W318 on EL1 in gating the luminal side of the transporter is very intriguing. As the authors suggest, this may represent an atypical gating mechanism for the MFS (line 182). I did wonder if the authors had considered providing more insight into this potentially novel mechanism. Additional experiments would be further mutations of W318 to F, Y, V, and I to see if they can identify a non-dead variant that could be analysed kinetically. They may have more luck with variants of E127, as they suggest this stabilises W318. If these side chains are important for gating and transport regulation, one might expect to see interesting effects on the transport kinetics.

      This is a fantastic suggestion. We have done this, and we think that the reviewer will find the results to be quite interesting. Some VMAT2 sequences have an R or an H at position 318 while VPAT has an F at the equivalent position. We have made these mutants including the E127A mutant and analyzed them using TBZ binding and transport experiments. Interestingly the W318R, H, and F mutants preserve activity in varying degrees with the R mutant closely resembling wild type. W318A has no transport activity. Only the W318F mutant retains some TBZ binding. The E127A mutant also has little transport activity but nearly wild type like TBZ binding which we believe suggests a role for this residue also in stabilizing W318.

      The authors identify an interesting polar network, which is described in detail and shown in Fig. 2d. However, the authors present no experimental data to shed further mechanistic insight into how these side chains contribute to monoamine transport or ligand binding. Additional experiments that would be helpful here might include repeating the binding and competition assays shown in Fig. 1c under different pH conditions for the WT and different mutations of this polar network. At present, this section of the manuscript is very descriptive without providing much novel insight into the mechanism of VMAT transport. I did wonder whether a similar analysis of pH effects on DTBZ binding might also provide insight into the role of E312 and the role of protons in the mechanism.

      Thank you, we have addressed this point in several different ways. The first is that many of these residues have already been characterized in several earlier studies, see refs 31, 32, and 42 and we have incorporated this into our discussion where appropriate. With respect to E312, the reviewers’ comments are again very appropriate. We have addressed this using computational experiments exploring the protonation status of E312 and other residues as well as TBZ. Our simulations and Propka calculations clearly show that E312 must be protonated and TBZ must be deprotonated to maintain TBZ binding. We have also extended these computational studies toward understanding the protonation status of residues which orchestrate dopamine binding and release.

      The authors then describe the binding pose for TBZ. This section also provides some biochemical characterisation of the binding site, in the form of the binding assay introduced in Fig. 1. However, the insights are again somewhat reduced as the mutants were chosen to show reduced binding. Could the authors return to this assay and try more conservative mutations of the key side chains to illuminate more detail? For example, does an R189K mutant still show binding but not transport? Similarly, what properties does an E312D have? The authors speculate that K138 might play a role in coupling ligand binding/transport to the protonation, possibly through an interaction with D426 and D33 (line 236). Given the presence of D33 in the polar network described previously, I was left wondering how this might occur. I feel that some of the experiments with pH and conservative mutants might shed some light on this important aspect. Please label the data points in Fig. 3d.

      Indeed, alanine mutants at these positions while valuable do not provide the level of detailed insight into mechanism that we also would have liked to obtain. Thus, we have made more conservative and targeted mutants like the R189K mutant and various mutants at N34 for example and tested them in both transport and binding assays. We have also made a mutant at K138 and found that it is not transport competent or able to bind TBZ to a significant degree. With respect to labels and color codes, we have made the color codes consistent between the bar graphs and the curves. We have also labeled the data points in the figure legends.

      The manuscript currently doesn't present a hypothesis for how TBZ induces the 'dead-end' complex compared to physiological ligands. Does the MD shed any light on this aspect of the study? If the authors place the physiological ligand in the same location as the TBZ and run the simulation for 500ns, what do they observe? 100ns is also a very short time window. I appreciate the comment about N34 in line 303, but is this really the answer? It would be very interesting to provide more evidence on this important aspect of VMAT pharmacology.

      MD with a natural ligand (dopamine) provides substantial insight into why TBZ is a dead-end complex. Since water cannot penetrate into the binding site in the TBZ bound complex, this does not allow for substantial luminal release. In contrast, simulations conducted in the presence of DA bound to the occluded VMAT2 show the propensity of that structure to accommodate an influx of water molecules that promote the release of DA to the lumen. The new results are illustrated in Figure 5 (main text) as well as supplemental figure 8 panels d-h. The new simulations further emphasized the importance of the protonation state of acidic residues near the substrate-binding pocket.

      Reviewer #2 (Recommendations For The Authors):

      Line 68, "both sides of the membrane" -> "alternately to either side of the membrane".

      Fixed. Thanks.

      Transmembrane proteins in intracellular organelles present unique issues of nomenclature. I suggest the authors refer to cytoplasmic and luminal faces of the protein (not intracellular or extracellular (line 124)) and adhere to these names to avoid confusion. This creates problems for loops called IL and EL, but they could be defined on first use.

      We agree with this point and had initially gone with the conventional definitions used in the literature. We have now changed this throughout the text to be luminal and cytosolic.

      Lines 135-6, are these residue numbers correct? The pdb file lists 126 as Asp and 333 as Ala.

      Thank you. This is fixed.

      ED Fig. 6 is not clear. A higher-resolution figure is needed.

      We have updated this figure and hope that the reviewer will find it to be much clearer.

      Lines 158-9, Is there any data to support effects on dynamics or folding? If not, please indicate that this is speculation.

      Fixed.

      Line 174, Should "I315" be "L315"?

      Fixed.

      Line 179, Please indicate what is meant by "inner" and "below" (also lines 183 and 258).

      We have added Figure calls here where needed.

      Line 192, S197 is listed as part of polar network 1, but not discussed further. Is it actually involved, or just in the neighborhood?

      It is part of the network, but we did not discuss in further detail because we do not have data indicating its precise function and thus have left this as a description.

      Line 199, E312, and N388 are fairly distant from each other. Do you want to clarify why they represent a network?

      While they are not within hydrogen bonding distance, we nevertheless include them as part of the same network because they may come into closer proximity in a different conformational state.

      Line 206, Protonation of all 3? VMAT2 doesn't transport 3 protons per cycle. Please clarify.

      We believe that these residues may be protonated, but they may not necessarily all be involved in proton transport.

      Line 219, Do you mean the aspartate unique to DAT, NET, and SERT? This is Gly in all the amino acid transporters in the NSS family. Please be specific.

      Fixed. Thank you.

      Line 224, "mutation of E312 to Q" or "mutation of Glu312 to Gln".

      Fixed. Thank you.

      Fig. 3d, Normally, one would expect full saturation curves for each mutant. How can a reader distinguish between low affinity or a decrease in the number of binding sites? Would full binding curves be prohibitive for the mutants because of the cost or availability of the ligand? These points should be addressed. A couple of the curves are not visible. Would an expanded scale inset show them more clearly? Also, would it be possible to include chemical structures for all ligands discussed?

      Many if not most of these mutants bind TBZ with such low affinity that it is not possible to measure a full saturation curve either because of ligand availability (radioactive ligand concentration is only in µM) or due to technical issues with being able to measure such low affinity binding. We have changed the presentation of the curves and have split the gating and binding site mutants into their own figures. We feel this improves the readability of these curves. We have also included a table with the respective Kd values determined for each of the mutants where possible.

      Line 235, The distances are long for a direct interaction between K138 and the TBZ methoxy groups. The unusual distances should be mentioned if an interaction is being proposed.

      We do not think that K138 is directly involved in TBZ binding, however this was written in a confusing way and has been now changed.

      Line 243, Please give a quantitative estimate of the affinity difference. "modestly" is vague.

      It is an approximately 2-fold difference. Fixed in the text.

      Line 248, 150 nM is, at best, a Kd, not an affinity.

      Agreed, this is changed.

      Reviewer #3 (Recommendations For The Authors):

      The (3 x ~100ns-long) molecular dynamics simulations provided suggest some instability of the pose identified by cryo-EM. While it is not unreasonable that ligands shift around and adopt multiple conformations within a single binding site (in a reversible manner), the present results do raise questions about the assumptions made when starting the simulations, in particular (1) the protonation states of charged residues in the TBZ binding sites; (2) the parameters used for tetrabenazine; (3) the conformations of acidic side chains that are notoriously difficult to resolve in cryoEM maps; and (4) any contributions of the truncated regions truncated in the simulated structure, namely the cysteine cross-linked loop and the terminal domains. The authors should examine and/or discuss these contributions before attributing mechanistic insights into the newly observed binding orientation.

      In order to estimate the effects of protonation states on TBZ binding, we now added three new systems with altered protonation on TBZ and binding pocket lining residues (see Table 3 in the revised vision); and for each system, we performed multiple MD runs to address the question and concerns raised by reviewer.

      Regarding the protonation states: Propka3.0 was used to determine the protonation states, finding that E312 and D399 should be protonated. If I am not mistaken, this version of ProPka cannot account for non-protein ligands (https://github.com/jensengroup/propka). Given their proximity to the binding site, these protonation states will be critical factors for the stability of the simulations. The authors could test their assumption by repeating the calculations with Propka 3.1 or higher, to establish sensitivity to the ligand. Beyond this, showing the resultant hydrogen bond networks will help to reassure the reader that the dynamics in the lumenal gates do not arise from an artifact.

      We thank the reviewer for suggestion of using higher version of Propka. We used the most recent Propka3.5 and carried out protonation calculations in the presence and absence of TBZ. The new calculations are presented in Table 4 and SI Figure 8c of the revised version.

      It should be possible to assess whether waters penetrate the ligand binding site during the simulations if that is of concern.

      We now added the number of waters within the ligand binding pockets for all MD simulations we performed, which are presented in Table 3 and Table 5 of the revised version.

      Finally, I didn't fully understand the conclusion based on the simulations and the "binding affinity" calculations: do they imply that the pose identified in the EM map is not stable? What is the value of the binding affinity histogram?

      We apologize for this confusion. For each MD snapshot, we calculated TBZ binding affinity using PRODIGY-LIG (Vangone et al., Bioinformatics 2019), which is a contact-based tool for computing ligand binding affinity. The binding affinity histogram shown in the original submission was the histogram of those binding affinities calculated for MD snapshots. In the revision, we replaced binding affinity histogram by time evolution of binding affinity changes (SI Fig 6c in the revision). The simulations confirmed that the pose identified in the EM map is stable, with a flattened binding affinity of -9.4 ± 0.3 kcal/mol in all three runs.

      Recommendations regarding writing/presentation:

      The authors use active tense terminology in attributing forces to elements of structure (cinching, packing tightly, locking). While appealing and commonplace in structural biology, this style frequently overstates the understanding obtained from a static structure and can give a rather misleading picture, so I encourage rephrasing.

      We appreciate this point; the use of these words is not meant to overstate or provide a misleading picture but rather to aid the reader in mechanistic understanding of the proposed processes.

      I would also recommend replacing the terms "above" and "below" for identifying aspects of the structure; the protein's location in the vesicular membrane makes these terms particularly difficult to follow.

      These terms refer specifically to the Figures themselves which we have always oriented with the luminal side at the top of the page and the cytosolic on the bottom. We have indicated in Figure 1 the orientation of VMAT2. The Figures are the point of reference which we refer to, and the ‘above’ and ‘below’ terms have been used to assist the reader to make the manuscript easier for a more casual or non-expert reader to follow.

      Minor corrections:

      • the legend in Figure 2 lacks details, e.g. how many simulation frames are shown, how were the electrostatic maps calculated?

      We revised Figure 2 and moved simulation frames to SI figure 6e. A total of 503 simulation frames are shown.

      • how were the TBZ RMSDs calculated? using all atoms or just the non-hydrogen atoms?

      For TBZ RMSDs, we used non-hydrogen atoms. This information is presented in the Methods section.

      MD simulation snapshots and input files can be provided via zenodo or another website.

      We will upload snapshots and input files to Zenodo upon acceptance of the manuscript.

      Reviewing editor specific points:

      Specific points

      L.97: Remove "readily available"

      Fixed.

      L.99: The authors are not measuring competition binding. It is well known that reserpine and substrates inhibit TBZ binding only at concentrations 100 times higher than their respective KD and KM values. It is, therefore, surprising that the authors use this isotherm and refrain from commenting on the significance of the finding. Moreover, the presentation of results as "Normalized Counts" does not provide any information about the fraction of VMAT molecules binding the ligand. At least, the authors should provide the specific activity of the ligand, and the number of moles bound per mole of protein should be calculated.

      The point was not to infer any details about the conformations that TBZ and reserpine bind but merely to point out that both constructs have a similar behavior with respect to their Ki for reserpine. We have added a sentence to say that reserpine binding stabilizes cytoplasmic-open so the reader is aware of the significance of this competition experiment.

      L.102: The characterization of serotonin transport activity needs to be more satisfactory. The Km in rVMAT2 is 100-200 nM, so why are the experiments done at 1 and 10 micromolar? Is the Km of this construct very different? The results provided (counts per minute at the steady state) need to give more information.

      The Km of human VMAT2 varies somewhat according to the source but has generally been reported to be between 0.6 to 1.4 µM for serotonin according to these references.

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019297/ https://www.cell.com/cell/pdf/0092-8674(92)90425-C.pdf https://www.pnas.org/doi/abs/10.1073/pnas.93.10.5166

      Fig 1B could be more informative. I suggest adding a cartoon model with TMs labeled, similar to ED Fig6a.

      This panel is to aid the reader in accessing the overall map quality and thus we do not wish to add additional labels/fits which would distract from that point. Instead, we have added overall views of the model in Figs 2,3.

      L.179: The authors claim that the inner gate is located "below" (whatever this could mean) the TBZ ligand. In L.214, they claim that TBZ adopts a pose.....just "below" the location of the luminal gating residues. Please clarify and use appropriate terminology.

      This refers to the position of these residues in the Figures themselves. We have added figure calls where appropriate here.

      Fig. 4: The cartoon could be more informative.

      We have added more information to the mechanism cartoon which is now Figure 6. This incorporates some of our new data and we believe it will be more informative.

      L. 213: The paragraph describes residues involved in TBZ binding. Mutagenesis is used to validate the structural information. However, the results (ED fig. 5B) must be corrected for protein expression levels. In the Methods section, the authors state (L.444), "Mutants were evaluated similarly from cell lysates of transfected cells." Without normalization of protein expression levels, the results are meaningless even if they agree with predictions.

      In fact, we have normalized the concentrations of protein in our binding experiments. This was noted in the methods section. And to account for these differences, experiments were conducted using 2.5 nM of VMAT2 protein as assessed by FSEC.

      L.220: The referral to ED Fig.7 is not appropriate here. The figure shows docking-predicted poses of dopamine and serotonin.

      Figure call has been changed.

      L.226: The referral to Fig. 3b needs to be corrected. The figure shows TBZ and not the neurotransmitter.

      This has been corrected.

      L. 337: "The neurotransmitter substrate is bound at the central site." What do the authors mean in this cartoon? Do they have evidence for this? Tetrabenazine is not a substrate.

      This cartoon drawing is meant to illustrate the elements of structure. Similar drawings are presented throughout the literature such as here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940252/ Figure 3 and here: https://pubs.acs.org/doi/10.1021/acs.chemrev.0c00983 Figure 2.

      The same compound is mentioned with different names: 3H-dihydrotetrabenazine and 3H-labeled DTBZ.

      Fixed.

      ED fig 1d is illegible.

      The high-resolution figure is completely legible. We will provide this to the journal upon publication.

      Figure 2d: A side view would be more visual.

      We have updated this figure and believe that it is much easier to understand now.

      L. 179: The inner gate is located 'below' the TBZ ligand

      Please see above response, this refers to the figures themselves. The figures are our point of reference.

      L. 213-215: Tetrabenazine binding site just 'below' the location of the luminal gating residues.

      See above.

      Throughout the paper, results are given as cpm or counts. The reader can only estimate the magnitude of the binding/transport by knowing the specific activity of the radiolabel. I recommend switching to nano/picomoles or supplying enough information to understand what the given cpm values could mean.

      Binding experiments were done using scintillation proximity assays and therefore converting the CPMs to values in pmol of bound ligand is simply not possible. For the transport experiments (now Fig 1d) the point was to show that the wild type was similar in activity to the chimera. In our new transport experiments we have presented for the mutants, many experiments were combined together and therefore, we have normalized the counts to the relative activity of wild type VMAT2.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript introduces an exciting way to measure SARS-CoV-2 aerosolized shedding using a disposable exhaled breath condensate collection device (EBCD). The paper draws the conclusion that the contagious shedding of the virus via aerosol route persists at a high level 8 days after symptoms.

      Strengths:

      The methodology is potentially of high importance and the paper is clearly written. The study design is clever. If aerosolized viral load kinetics truly differed from those of nasal swabs, then this would be a very important finding.

      Thank you for your encouraging remarks. We agree that a comparison between aerosolized viral load and nasal swabs would strengthen our findings, and we have collected new specimens which will enable this comparison: In each session we collected both nasal swabs and exhaled breath samples, and we are in the process of analyzing these data. These data will be included in our revised manuscript.

      Weaknesses:

      The study conclusions are not entirely supported by the data for several reasons:

      (1) Most data points in the study are relatively late during infection when viral loads from other compartments (nasal and oral swabs) are typically much lower than peak viral loads which often occur in the pre-symptomatic or early symptomatic phase of infection. Moreover, the generation time for SARS-CoV-2 has been estimated to be 3-4 days on average meaning that most infections occur before or very early during symptoms. Therefore, the available epidemiologic data does not support 12 days of infection (day 8 symptoms) as important for most transmissions. Therefore, many of the measurement timepoints in this study may not be relevant for transmission.

      Thank you for your comment. Notably, our new data set includes a small number of specimens that were collected prior to the start of symptoms, and so we may be able to partially address this concern with those data. That said, we agree that a limitation of our study is that we were unable to collect specimens prior to symptom onset, and that this pre-symptomatic period represents a fruitful area for future work. However, significant questions do remain open regarding transmission dynamics of SARS-CoV-2, including the extent of transmission after symptom onset, and therefore, despite this limitation of our data, we feel that our method may contribute to further understanding of those dynamics. However, we will include a more prominent discussion of this limitation in the revised manuscript.

      (2) Fig 1A would be more powerful as a correlation plot between viral load from nasal samples (x-axis) and aerosol (y-axis). One would expect at least a rough correlation (as has been seen between viral loads in oral and nasal samples) and deviations from this correlation would provide crucial information about how and when aerosol shedding is discordant from nasal samples (ie early vs late time points, low versus high viral loads< etc...). It is too strong to state correspondence is 100% when viral load is only measured in one compartment and nasal swabs are reduced to the oversimplified "positive or negative".

      Thank you for this suggestion, we agree that the figure would be more powerful as a correlation plot between viral load from nasal samples and aerosol. Unfortunately, at the time these samples were collected, the ER at Northwestern Hospital was diagnosing SARS-CoV-2 patients using the Abbott ID NOW rapid diagnostic platform, which, despite being a PCR-based system, does not provide quantitative information about viral loads, and instead provides a binary positive/negative result. Since we were looking for a direct comparison between the clinical diagnostic test and our test, we considered the binary aspect of our data (detected/undetected), and found 100% correspondence, meaning that when the clinical test detected SARS-CoV-2, our test did too. We have collected additional data which includes quantitative PCR values from nasal swabs collected at the same time as breath samples and we will include these data in the format you suggest, once analyzed, in our revised manuscript.

      (3) Results are reported in RNA copies which is fine but particle-forming units (pfu, or quantitative culture) are likely a more accurate surrogate of infectivity. It is quite possible that all of these samples would have been negative for pfu given that the ratio of RNA: pfu is often >1000 (though also dynamic over time during infection). This could be another indicator that most samples in the study were collected too late during infection to represent contagious time points.

      We agree that culturing exhaled breath samples would be an important addition to our understanding of the transmission dynamics of SARS-CoV-2 and we consider this to be an important next step for our method. Because we did not perform culturing of our breath samples in this study, we avoided making claims about infectivity of our samples in this manuscript, and instead speculate about the future utility of our method in understanding transmission dynamics, once an appropriate surrogate of infectivity is performed. We will make sure this is clearer in the revised manuscript. That said, other groups have successfully cultured breath samples with corresponding CT values in a range that are well within the range we found in our study, and sufficient for transmission (for example, Alsved et al, 2023, CT range ~33-38). These studies support the idea that a significant portion of the viral RNA measured in our samples may come from viable virus. Therefore, quantifying the ratio of viable to nonviable virus in our samples is an important next step. We appreciate this comment, and we will add a clearer discussion of this point to the revised manuscript.

      (4) Individual kinetic curves should be shown for participants with more than three time points to demonstrate whether there are clear kinetic trends within individuals that would help further validate this approach. The inclusion of single samples from individuals is less informative.

      We will add individual kinetic curves to the revised manuscript.

      (5) The S-shaped model in 2A is somewhat misleading as it is fit to means but there is tremendous variability within the data. Therefore the 8-day threshold should be listed clearly as a mean but not a rule for all individuals. The statement that viral RNA copies do not decrease until 8 days from symptom onset is unlikely to be true for all infected people and can't be made based on the available data in this study given that many people contributed only one datapoint.

      We will clarify the language in the manuscript and make limitations of the 8-day interpretation clearer.

      (6) The incubation period for SARS-CoV-2 is highly variable. Therefore duration of symptoms is a rather poor correlate of the duration of infection. This further diminishes the interpretive value of positive samples from individuals who were only sampled once.

      We will add a discussion of this point to the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Lane and colleagues measured the abundance of SARS-CoV-2 on breath in 60 outpatients after the development of COVID-19 symptoms using a novel breath collection apparatus. They found that, overall, viral abundance remains high for approximately eight days following the development of symptoms, after which viral abundance on breath drops to a low level that may persist for approximately 20 days or more. They did not identify significant differences in viral shedding on breath by vaccination status or viral variant. They also noted substantial variation in the degree and duration of shedding across individuals.

      Strengths:

      The primary strengths of this study are (1) the focus on breath, rather than the more traditional nasal/oropharyngeal swabs, and (2) the fact that the data were collected at multiple time points for each infection. This allows the authors to characterize not only mean viral abundance across individuals but also how that abundance changes over time, allowing for a better understanding of the potential duration of infectiousness of SARS-CoV-2.

      Weaknesses:

      The sample size is moderate (60) and focuses only on outpatients. While these are minor weaknesses (as the authors note, the majority of SARS-CoV-2 transmission likely occurs among those with symptoms below the threshold of hospitalization), it would nevertheless be useful to have a fuller understanding of variation in viral shedding across clinical groups.

      We agree this would be very interesting and feel our method, which is straightforward to perform in clinical settings, lends itself to future studies across clinical groups. We have added discussion of this to the discussion section of the manuscript.

      Furthermore, the study lacks information on viral shedding prior to the development of symptoms, which may be a critical period for transmission. Since the samples were collected at home by study participants using a novel apparatus, it is difficult to assess the degree to which actual variation in viral abundance, user variability, and/or measurement variation is inherent to the apparatus.

      This is a great point, which we will discuss in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For the Authors):

      (1) While not absolutely necessary - it would be nice to see at least at the in-situ level what happens to the handful of other HC-important transcription factors in the Rbm24 KO (IKZF2, Barlh1, RFX) as the authors did look at Insm1.

      Reply: Thanks for your suggested experiments. We agree that knowing whether the genes that are known to be involved in cell survival regulation are changed will provide insights into the mechanisms underlying cell death of Rbm24-/- HCs. Our data showed that Ikzf2 seemed to be upregulated when in the Rbm24-/- HCs, relative to Rbm24+/+ HCs at P5. We also tested Barlh1 and RFX, but we did not obtain confident data to present. Nonetheless, following the reviewer’s logic, we further tested Gata3, another gene involved in HC survival, and found that Gata3 was down-regulated in Rbm24 -/- HCs, compared to Rbm24+/+ HCs. Please refer to the text on lines 12-22 on page 12 and lines 1-10 on page 13, and Figure 3-figure supplement 1.

      (2) Major comments: The nomenclature for mouse gene vs. mouse protein needs to be addressed throughout the manuscript. The nomenclature when referring to a mouse gene: gene symbols are italicized, with only the first letter in upper-case (e.g. Rbm24).

      The nomenclature when referring to a mouse protein: Protein symbols are not italicized, and all letters are in upper-case (e.g. RBM24).

      Reply: Thanks for pointing it out. In the entire manuscript, we have followed the reviewer’s comments to list gene and protein.

      (3) Supplemental Figure 2D: Individual data points should be displayed on the bar graph via dots. SEM is not appropriate for this graph as SEM precision with only 3 samples is low. Furthermore, readers are more interested in knowing the variability within samples and not proximity of mean to the population mean, therefore standard deviation (SD) should be used instead.

      Reply: We have edited the Figure 1-figure supplement 2D, as suggested. The Figure 1figure supplement 2 legend was updated, too. Please refer to line 21-22 on page 32.

      (4) Red/Green should be avoided, especially when both are on the same image (merged immunofluorescence images that are found throughout the manuscript). I highly recommend changing to a color-blind friendly color scheme (such as cyan/green/magenta, cyan/magenta/yellow, etc.) for inclusivity.

      Reply: Thanks for pointing it out. We have changed the red to magenta in all our Figures and figure supplements.

      (5) Minor comments: As CRISPR-stop is a major method used throughout the paper, a brief explanation is needed for readers to understand what this methodology entails and why it was used. Something along the lines of," The CRISPR-stop technique allows for the introduction of early stop codons without the induction of DNA damage via Cas9 which can cause deleterious effects".

      Reply: We have further elaborated how CRISPR-stop works and its advantages. Please refer to lines 8-13 on page 5.

      (6) Page 5; line 5 - "Phenotypes occur earlier..." Grammar

      Reply: The grammar error was corrected. Please refer to line 4, page 5.

      (7) Page 5; line 5 - "Given Pou4f3 is the upstream regulator..." Not proven, rephrase

      Reply: We have rephrased this sentence. Please refer to lines 5-6 on page 5.

      (8) Supplemental 1A: Fine, Proof of knockout, I wouldn't mention INSM1 being "irregular"

      Reply: We have rephrased this sentence. Please refer to lines 2-3 on page 6.

      (9) Page 5; line21 - "Alignment of Insm1+ OHCs was not as regular..." Not a good description

      Reply: We have rephrased this sentence. Please refer to lines 2-3 on page 6.

      (10) Page 6; line11 - "Rbm24 was completely absent.." Redundancy with line 9

      Reply: Thanks for pointing it out, and we have removed the redundant sentence.

      (11) Page 7 - HA tag should be indicated originally as: Hemaglutinin (HA)

      Reply: We have switched “HA” to “Hemaglutinin (HA)”. Please refer to line 15, page 7.

      (12) Page 9, line 11- "Determine if autonomous/noncell autonomous." Disagree, cells still clustered in supplemental fig 4.

      Reply: We have removed this sentence.

      Reviewer #2 (Recommendations For The Authors):

      The writing of the manuscript is adequate, but it would certainly be improved by professional editing.

      Reply: Thanks for the reviewer’s encouraging comments. The revised version of our manuscript has been edited by an English native speaker.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript from Richter et al. is a very thorough anatomical description of the external sensory organs in Drosophila larvae. It represents an important tool for investigating the relationship between the structure and function of sensory organs. Using improved electron microscopy analysis and digital modeling, the authors provide compelling evidence offering the basis for molecular and functional studies to decipher the sensory strategies of larvae to navigate through their environment.

      Public Reviews:

      Summary

      This is a very meticulous and precise anatomical description of the external sensory organs (sensillia) in Drosophila larvae. Extending on their previous study (Rist and Thum 2017) that analyzed the anatomy of the terminal organ, a major external taste organ of fruit fly larva, the authors examined the anatomy of the remaining head sensory organs - the dorsal organ, the ventral organ, and the labial organ-also described the sensory organs of the thoracic and abdominal segments. Improved serial electron microscopy and digital modeling are used to the fullest to provide a definitive and clear picture of the sensory organs, the sensillia, and adjacent ganglia, providing an integral and accurate map, which is dearly needed in the field. The authors revise all the data for the abdominal and thoracic segments and describe in detail, for the first time, the head and tail segments and construct a complete structural and neuronal map of the external larval sensilla.

      Strengths

      It is a very thorough anatomical description of the external sensory organs of the genetically amenable fruitfly. This study represents a very useful tool for the research community that will definitely use it as a reference paper. In addition to the classification and nomenclature of the different types of sensilla throughout the larval body, the wealth of data presented here will be valuable to the scientific community. It will allow for investigating sensory processing in depth. Serial electron microscopy and digital modeling are used to the fullest to provide a comprehensive, definitive, and clear picture of the sensory organs. The discussion places the anatomical data into a functional and developmental frame. The study offers fundamental anatomical insights, which will be helpful for future functional studies and to understand the sensory strategies of Drosophila larvae in response to the external environment. By analyzing different larval stages (L1 and L3), this work offers some insights into the developmental aspects of the larval sense organs and their corresponding sensory cells.

      Weaknesses

      There are no apparent weaknesses, although it is not a complete novel anatomical study. It revisits many data that already existed, adding new information. However, the repetitiveness of some data and prior studies may be avoided for easy readability.

      We would like to thank the reviewers for their respective reviews. The detailed comments and efforts have helped us to improve our manuscript. In the following, we have listed the comments one by one and provide the respective information on how we addressed the concerns.

      Recommendations for the authors:

      We have tried to address every single comment as far as possible. In order to structure our response a little better, we have listed the relevant page number and the original comments once again. Directly following this you will find our response and a description of what we have changed in the manuscript.

      REVIEWER #1 (Recommendations For The Authors):

      I have a few comments that will help the reader navigate this long and detailed paper.

      REVIEWER 1.1. page 4

      The final section of "the Structural organization of Drosophila larvae" needs some reorganization.

      Specifically:

      "The DO and the TO are prominently located on the tip of the head lobes" Can the authors rewrite the sentence in a way that it is clear that there is one DO and one VO on each side of the head? Check at the beginning of each section, please. There is a mention about hemi-segments but it is still confusing.

      Done – replaced with “The largest sense organs of Drosophila larvae are arranged in pairs on the right and left side of the head.”

      REVIEWER 1.2. page 5

      "The sequence of sensilla is always similar for and different between T1, T2-T3, and A1-A7" This sentence is not clear, please break it into two sentences.

      Done – replaced with: “We noticed varying arrangements for T1, T2-T3, and A1-A7, with a consistent sequence of sensilla in each configuration.”

      REVIEWER 1.3. figures page 4

      Double hair can't be found in Figure 1B or C (is it h3, h4?) - please clarify.

      Done - changed to double hair organ in page 11, included double hair sketch in legend in figure 1B. We changed the name of the structure to double hair organ, to clarify that this is a compound sensillum consisting of two individual sensilla.

      REVIEWER 1.4. page 5

      The authors go back and forth in their descriptions of the different sensory organs. Knob sensilla and then papilla sensilla are discussed and then a few lines later a further description is done. Please unify the description of each separately.

      Done – we restructured the whole section.

      REVIEWER 1.5. figures page 6

      "We found three hair sensilla on T1-T3, and "two" on A1-A7" - in the figure there seem to be "four" on A1-A7.

      Done – we included the two hair sensilla of the double hair organ

      REVIEWER 1.6. figures page 6

      DORSAL ORGAN:

      Can the authors explain the colour map meaning in Figure 2A? It is explained in 2C but the image already has colours. Add your sentence "Color code in A applies to all micrographs in this Figure".

      Done – we added a sentence to explain that the color code in A applies to the whole figure.

      REVIEWER 1.7. page 6

      Page 10: which comprises seven olfactory sensilla "composing" three dendrites each: replace this with"with". At the end, we want to think 7 X 3= 21 ORNs.

      Done – replaced.

      REVIEWER 1.8. page 9

      CHORDOTONAL ORGANS:

      "We find these these DO associated ChO (doChO).. .". Please remove one "these"

      Done – removed.

      REVIEWER 1.9. page 8

      Is the DO associated ChO part of the dorsal ganglion???? It does not look like it. Could you clarify?

      Done – we added a sentence that clarifies that the ChO neuron is not iside the DOG.

      REVIEWER 1.10. page 9 VENTRAL ORGAN: A figures page 12

      Please add to the Figure 8 legend the description of 8c' and 8c'?

      Done – added description in figure legend.

      B page 9

      8H, what are the *, arrows? Please clarify - it is hard to interpret the figure.

      Done – we added parentheses in the figure legend that state which structures the asterisks and arrows indicate.

      C page 9

      "Three of them are innervated by a single neuron () and one by two neurons () (Figure 8F-I). Please add which are innervated by 1 (VO1, VO2-VO4) and which by 2 (VO3).

      Done – we added parentheses that clarify which sensilla are innervated by 1 or 2 neurons.

      REVIEWER 1.11. page 9

      Can you add something (or speculate) about the difference in sensory processing of the different types of sensilla?

      Done – new sentence in discussion:

      ‘Their different size and microtubule organization likely correlate with processing of different stimulus intesities applied to the mechanotransduction apparatus (Bechstedt et al. 2010).’

      REVIEWER 1.12. figures page 16

      PAPILLA AND HAIR SENSILLA:

      FIGURE 10a, please add the name of each sensillum from p1, p2, px py, etc... (if not we have to go back to figure 1 when you describe specific ps.)

      Thanks for the comment, it really makes it a lot easier for the reader.

      REVIEWER 1.13. figures page 18 Figure 11, can you add the name of each hair, please?

      Done – updated figure.

      REVIEWER 1.14. figures pages 16, 18, 20

      In Figures 10, 11, and 12 you clearly draw an area on the internal side that I assume is what you call the "electron-dense sheath". It is wider in papilla sensilla than in hair sensilla, most likely due to the difference in stimuli sensed that you explain in detail in the discussion. Can you say in the figure what this "internal" thing is? Can you add this difference to your list "Apart from the difference in outer appearance and structure of the tubular body"?

      This is the basal septum, but it is not certain that it is wider in the papillae sensillae, at least we could not observe this in our data sets. The impression could have been created by different scales in the 3D reconstructions and a perspective view. Therefore, we do not want to list this as a difference here, as we are not sure.

      However, we have now specified the socket septum in the figure legends and in Figures 10A, 11A and 12A.

      REVIEWER 1.15. page 11

      KNOB SENSILLA:

      Page 25;" Knob sensilla have been described under "vaious" names such as": add various.

      Done

      REVIEWER 1.16. page 12

      "reveals that the three hair and the two papilla sensilla are associated with a single dendrite." Can you write that "reveals THAT EACH OF the three hair and the two papilla sensilla" if not it seems that there is only one dendrite.

      Done

      REVIEWER 1.17. figures page 25 TERMINAL SENSORY CONES:

      Please name the t1-t7 cones in Figure 15A.

      Done – we updated the figure.

      REVIEWER 1.18. page 13

      The spiracle sense organ deserves a new paragraph. As does the papilla sensillum of the anal plate.

      Done – we added subtitles before the prargraphs.

      Discussion:

      REVIEWER 1.19. page 15

      Page 38: "v'entral" correct typo

      PAGE 15

      Done – we have updated the nomenclature  ventral 1 (v), ventral 2 (v’) and ventral 3 (v’’)

      REVIEWER #2 (Recommendations For The Authors):

      I have only a few comments:

      REVIEWER 2.1. page 5

      p.5, right column, middle: the use of trichoid, campaniform, and basiconical (sensilla) in previous works were based on even older papers and reviews that attempted to link EM architecture to function (e.g., KEIL, T. A. & STEINBRECHT, R. A. (1986). Mechanosensitive and olfactory sensilla of insects. In Insect Ultrastructure, vol. 2. (ed. R. C. King & H. Akai), pp. 477-516. New York/London: Plenum Press). Trichoid sensilla can be mechano-sensitive, olfactory, or gustatory; trichoid simply refers to the shape (hair). The same applies to basiconical sensilla. The use of "campaniform", which Ghysen et al called "papilla sensilla", was the only really problematic case, because these (Drosophila larval) sensilla did not really resemble closely the classical campaniform sensilla (e.g., adult haltere). The only reason we called them campaniform is because they were not more similar to any other type of (previously named) sensillum.

      Thank you for the explanation. The nomenclature of structures is generally always a complex topic with often different approaches and principles. We are aware of this and have therefore tried to be as careful as possible. We were not sure from this comment whether you were suggesting to change the text or whether you wanted to explain how these names were assigned to the sensilla in the past. However, we hope that the current version is in line with your understanding, but could of course make changes if necessary (see also comments of reviewer 1).

      REVIEWER 2.2. page 9

      p.21, Labial Organ: the ventral lip is the labium; the dorsal one is the labrum.

      Done – replaced labrum with labium.

      REVIEWER 2.3. page 9

      p.20/21, Ventral organ and labial organ: here, the projection of the axons could be mentioned as an ordering principle. In the previous literature, for larva and embryo, a labial organ (lbo) was described that most likely corresponds to the labial organ presented here. This (previously mentioned) lbo characteristically projects along the labial nerve to the labial segment (hence the name). It fasciculates with axons of another sensory complex, also generated by the labial segment, namely the ventral pharyngeal sensory organ (VPS). Does the labial organ described here share this axonal path?

      Yes, it has the same axonal pathway and is the same organ as the lbo. We have tried to standardise the nomenclature for all important external head organs (DO, TO, VO, LO) and have therefore used abbreviations with two letters. However, to avoid confusion, we have now added that the LO was also called lbo in the past.

      For the ventral organ, the segmental origin (to my knowledge) was never clarified. The axons of the ventral organ project along the maxillary nerve (which carries axons of the terminal=maxillary organ). This nerve, closely before entering the VNC, splits into a main branch to the maxillary segment (TO axons) and a thinner branch that appears to target the mandibular segment. This branch could contain the axons of the ventral organ (as described previously and in this paper). Could the authors confirm this axonal projection of the VO?

      In this work, we did not focus on the axonal projections into the SEZ. This is also not a simple and fast process, as in the entire larval dataset, the large head nerves unfortunately exhibit a highly variable quality of representation. Therefore, the reconstruction of nerves and individual neurons within it is often challenging and very time-consuming. The research question is, of course, very intriguing, and one could also attempt to match each sensory neuron of the periphery with the existing map of the brain connectome. However, this is a project in itself, exceeding the scope of this work, and is therefore more feasible as a subsequent project.

      REVIEWER #3 (Recommendations For The Authors):

      Minor suggestions that the authors might consider:

      REVIEWER 3.1. figures all

      Recheck the scale bar in figures and figure legends. Missing in a few places.

      Done – we replaced or added some (missing) scale bars in figures and figure legends (see annotated figure document).

      REVIEWER 3.2. figures page 4

      The color schematic in Figure 1 can be improved for readability.

      Done – we changed the color schematic, especially for the head region to improve readability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript titled "Coevolution due to physical interactions is not a major driving force behind evolutionary rate covariation" by Little et al., explores the potential contribution of physical interaction between correlated evolutionary rates among gene pairs. The authors find that physical interaction is not the main driving of evolutionary rate covariation (ECR). This finding is similar to a previous report by Clark et al. (2012), Genome Research, wherein the authors stated that "direct physical interaction is not required to produce ERC." The previous study used 18 Saccharomycotina yeast species, whereas the present study used 332 Saccharomycotina yeast species and 11 outgroup taxa. As a result, the present study is better positioned to evaluate the interplay between physical interaction and ECR more robustly.

      Strengths & Weaknesses:

      Various analyses nicely support the authors' claims. Accordingly, I have only one significant comment and several minor comments that focus on wordsmithing - e.g., clarifying the interpretation of statistical results and requesting additional citations to support claims in the introduction.

      We are pleased the reviewer found the analyses to support the claims. We have addressed comments related to clarifying interpretations as suggested in the Recommendations to the Authors. For example, we have added discussion and clarification on the other parameters that could affect the strength of ERC correlations.

      Reviewer #2 (Public Review):

      Summary:

      The authors address an important outstanding question: what forces are the primary drivers of evolutionary rate covariation? Exploration of this topic is important because it is currently difficult to interpret the functional/mechanistic implications of evolutionary covariation. These analyses also speak to the predictive power (and limits) of evolutionary rate covariation. This study reinforces the existing paradigm that covariation is driven by a varied/mixed set of interaction types that all fall under the umbrella explanation of 'co-functional interactions'.

      Strengths:

      Very smart experimental design that leverages individual protein domains for increased resolution.

      Weaknesses:

      Nuanced and sometimes inconclusive results that are difficult to capture in a short title/abstract statement.

      We appreciate the reviewer’s acknowledgement of the experimental design. We have addressed the nuance of the results by changing the title and clarifying other statements throughout the manuscript as suggested in the reviewer’s recommendations. We have also addressed reviewer comments asking for further explanation on using Fisher transformations when normalizing the Pearson correlations for branch counts.

      Reviewer #3 (Public Review):

      Summary:

      The paper makes a convincing argument that physical interactions of proteins do not cause substantial evolutionary co-variation.

      Strengths:

      The presented analyses are reasonable and look correct and the conclusions make sense.

      Weaknesses:

      The overall problem of the analysis is that nobody who has followed the literature on evolutionary rate variation over the last 20 years would think that physical interactions are a major cause of evolutionary rate variation. First, there have been probably hundreds of studies showing that gene expression level is the primary driver of evolutionary rate variation (see, for example, [1]). The present study doesn't mention this once. People can argue the causes or the strength of the effect, but entirely ignoring this body of literature is a serious lack of scholarship. Second, interacting proteins will likely be co-expressed, so the obvious null hypothesis would be to ask whether their observed rates are higher or lower than expected given their respective gene expression levels. Third, protein-protein interfaces exert a relatively weak selection pressure so I wouldn't expect them to play much role in the overall evolutionary rate of a protein.

      We thank the reviewer for their comments and suggestions. A point to immediately clarify is that the methods studied in this manuscript deal with rate variation of individual proteins over time, and if that variation correlates with that of another protein.. The numerous studies the reviewer refers to deal with explaining the differences in average rate between proteins. These are different sources of variation. It has not, to our knowledge, been shown that variation in the expression level of a single protein over time is responsible for its variation in evolutionary rate over time, let alone to a degree that allows its variation to correlate with that of a functionally related protein. That question interests us, but it is not the focus of this study.

      In our study, we sought to test for a contribution of physical interaction to the correlation of evolutionary rate changes as they vary over time, i.e. between branches. We made many changes to clarify this distinction in our revisions.

      We agree that the manuscript would be more clear to define the forces proposed to lead to difference in rate in general, which includes expression levels. We had generally considered expression level as one of the many potential non-physical forces, but failed to make that explicit and instead focused on selection pressure. In our revision we describe expression level as another potential driver of evolutionary rate variation over time. References to previous literature have been made in the introduction. We also added a more explicit explanation of the rate covariation over time that we are measuring in contrast with the association between expression level and rate differences between proteins that was studied in previous literature.

      On point 3, the authors seem confused though, as they claim a co-evolving interface would evolve faster than the rest of the protein (Figure 1, caption). Instead, the observation is they evolve slower (see, for example, [2]). This makes sense: A binding interface adds additional constraint that reduces the rate at which mutations accumulate. However, the effect is rather weak.

      The values in Fig 1B are a measure of correlation, specifically a Fisher transformed correlation coefficient. They are not evolutionary rates, so they are not reflecting faster or slower evolution, rather more or less covariation of evolutionary rates over time. We are not predicting that physically interacting interfaces evolve faster than the rest of the protein, but rather that if physical interaction drives covariation in evolutionary rates over time, their correlation would be stronger between pairs of physically interacting domains. In response, we have used clearer language in the figure caption and reorganized labels in Figure 1B to clearly show that the values are correlations. Revised Figure 1 Legend:

      “Overview of experimental schema and hypotheses. Proteins that share functional/physical relationships have similar relative rates of evolution across the phylogeny, as shown in (A) with SMC5 and SMC6. The color scale along the bottom indicates the relative evolutionary rate (RER) of the specific protein for that species compared to the genome-wide average. A higher (red) RER indicates that the protein is evolving at a faster rate than the genome average for that branch. Conversely, a lower (blue) RER indicates that protein is evolving at a slower rate than the genome average. The ERC (right) is a Pearson correlation of the RERs for each shared branch of the gene pair. (B) Suppose the correlation in relative evolutionary rates between two proteins is due to compensatory coevolution and physical interactions. In that case, the correlation of their rates (ie. ERC value) would be higher for just the amino acids in the physically interacting domain. (C) Outline of experimental design. Created with Biorender.com

      All in all, I'm fine with the analysis the authors perform, and I think the conclusions make sense, but the authors have to put some serious effort into reading the relevant literature and then reassess whether they are actually asking a meaningful question and, if so, whether they're doing the best analysis they could do or whether alternative hypotheses or analyses would make more sense.

      [1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4523088/

      [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4854464/

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) Numerous parameters influence ECR calculation. The authors note that their use of a large dataset of budding yeast provides sufficient statistical power to calculate ECR. I agree with that. However, a discussion of other parameters needs to be improved, especially when comparing the present study to others like Kann et al., Hakes et al., and Jothi et al.. For example, what is the evolutionary breadth and depth used in the Kann, Hakes, Jothi and other studies? How does that compare to the present study? Budding yeast evolve rapidly with gene presence/absence polymorphisms observed in genes otherwise considered universally conserved. Is there any reason to expect different results in a younger, slower-evolving clade such as mammals? There is potential to acknowledge and discuss other parameters that may influence ECR, such as codon optimization and gene/complex "essentiality," among others.

      More discussion of these parameters is a good idea. We have added the number and phylogeny of species used in the previous studies in the discussion paragraph starting with “Previous studies attributed varying degrees of evolutionary rate covariation signal to physical interactions between proteins.” We also like the idea of studying the effect of younger and more slowly evolving clades as opposed to the contrary, but currently we lack the required number of datasets to do this.

      We have also added more discussion and clarification of potential non-physical forces leading to ERC correlations in the introduction.

      Minor comments

      (1) It would be good to add a citation to the second sentence of the first paragraph, which reads, "It has been observed that some genes have rates that covary with those of other genes and that they tend to be functionally related."

      Added citation to Clark et al. 2012

      (2) In the last sentence of the first paragraph of the introduction, ERC is discussed in the context of only amino acid divergence, however, there is no reason that DNA sequences can't be used, especially if ERC is being calculated among species that are less ancient than, for example, Saccharomycotina yeasts. Thus, it may be more accurate to suggest that ERC measures how correlated branch-specific rates of sequence divergence are with those of another gene.

      Nice suggestion to generalize. We have made this change.

      (3) ERC was not calculated in reference #2. For the sentence "Protein pairs that have high ERC values (i.e., high rate covariation) are often found to participate in shared cellular functions, such as in a metabolic pathway2 or meiosis3 or being in a protein complex together," I think more appropriate citations (including inspiring work by the corresponding author) would be

      a) Coevolution of Interacting Fertilization Proteins (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000570)

      b) Evolutionary rate covariation analysis of E-cadherin identifies Raskol as a regulator of cell adhesion and actin dynamics in Drosophila (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007720)

      c) An orthologous gene coevolution network provides insight into eukaryotic cellular and genomic structure and function (https://www.science.org/doi/10.1126/sciadv.abn0105)

      d) PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data (https://academic.oup.com/bioinformatics/article/37/16/2325/6131675)

      Thank you for pointing out these works. We agree that there are more appropriate citations and we have referenced your suggested b-d.

      (4) The dataset of 343 yeast species also includes outgroup taxa. Therefore, indicating that 332 species are Saccharomycotina yeast and 11 are closely related outgroup taxa may be more accurate.

      Thank you for the suggestion, the following sentence has been added, citing the Shen et. al 2018 paper that the dataset was derived from:

      “To investigate the discrepancy between contributions to ERC signal from co-function and physical interaction, we used a dataset of 343 evolutionarily distant yeast species. 332 of the species are Saccharomycotina with 11 closely related outgroup species providing as much evolutionary divergence as humans to roundworms3”

      (5) Are there statistics/figures to support the claim that "Almost all complexes and pathways had mean ERC values significantly greater than a null distribution consisting of random protein pairs"?

      This is shown in supplementary figure 1. A reference to this figure was added as well as quantification within the text.

      (6)Similar to the previous comment, can quantitative values be added to the statement "While protein complexes appear to have higher mean ERC scores than the pathways..."?

      The median of the mean ERC scores for protein complexes is 5.366 while the median for the mean ERC score in pathways is 4.597. This quantification has been included in the text: “While protein complexes have higher mean ERC scores (median 5.366) than the pathways (median 4.597), the members of a given complex are also co-functional, making interpretation of the relative contribution of physical interactions to the average ERC score difficult”

      These quantifications are were also added to the figure caption for figure 2A

      (7) A semantic point: In the sentence "The lack of significance in the global permutation test shows that the...", I recommend saying that the analysis suggests, not shows, because there is potential for a type II error.

      Good suggestion, we have made this change.

      (8) The authors suggest that shared evolutionary pressures, "and hence shared levels of constraint," drive signatures of coevolution. The manuscript does not delve into selection measures (e.g., dN/dS). Perhaps it would be more representative to remove any implication of selection.

      We have added better language to clarify that discussion of selection is purely a hypothesis and that selection is not probed in our analyses.

      “Previous work finds evidence that relaxation of selective constraint can lead to drastic rate variation and hence covariation6. Rather, the greater and consistent contribution comes from non-physical interaction drivers that could include variation in essentiality, expression level, codon adaptation, and network connectivity. These non-physical forces would be under shared selective pressures and hence shared levels of constraint, the result of which was elevated ERC between non-interacting proteins, as visible in our study of genetic pathways that do not physically interact (Figure 2).”

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      -Title: In my opinion, the title of the manuscript is a somewhat misleading summary of the results of this paper. In the majority of the analyses in this paper, physical interactions do account for a significantly outsized portion of the ERC signature. The current title downplays the consistent (although sometimes small effect-sized) result that physically interacting domains do show higher ERC than non-physically interacting domains by every statistical measure employed in this paper to compare physical vs non-physical interactions. The authors' interpretation of their results within the manuscript body is that the effect of physical interactions is an inconsistent, weak, and non-generalizable driver of ERC. I generally agree with the authors' interpretations, but the nuance of these interpretations is lost in the title of the paper. I would suggest rewording the title to try to capture the nuance or at least be subjectively accurate. For example, stating that "...physical interactions are not the sole driving force.." is inarguably accurate based on these results.

      As an alternative title, I would suggest focusing on an important takeaway from the paper: ERC is a reliable predictor of co-functional interactions but not necessarily physical interactions. I agree with the statement that "there is not a strong enough signal to confidently call an interaction physical or not and would be of little value to an experimentalist wanting to infer interacting domains" and I think that a title that emphasizes this idea would be more accurate and impactful.

      Great suggestion. We agree that the current title is downplaying the minutiae of the method and the signal we capture with it, we have used your suggested title.

      There are an outsized number of complexes that had ROC-AUCs greater than 0.5 which is why we performed the permutation tests to determine how significant each of the individual ROC-AUCs were given the differing number of protein/domain pairs in each complex. Between the statistical methods used only 3 of the 17 complexes ranked physical interactions significantly higher than non-physically interacting domains in every analysis. Even among the 3 that were statistically significant some of the physically interacting domains still fell among the bottom portion of the ERC scores for that complex (Figure 5: MCM and CUL8 complexes) This is why we concluded that physical interactions are not the sole driving force of the signal captured by ERC.

      -Abstract: related to my preceding comment, the word "negligible" in the abstract is misleading. If physical interactions were truly entirely negligible, the comparisons of physically interacting vs non-physically interacting domains would yield 0.5. Instead, these comparisons always yielded results greater than 0.5. Consider rewording.

      Thank you for the suggestion this phrasing has been changed to “Therefore, we conclude that coevolution due to physical interaction is weak, but present in the signal captured by ERC”

      We agree that “negligible” may be too strong of a word, however, the comparisons do not always yield results greater than 0.5.

      5 of the 17 complexes do not reach the 0.5 threshold for the initial ROC analysis and even among those that do, only 4 had significantly high ROC-AUCs. You are correct that the signal is not completely negligible which is why we continued by determining if the physical interaction was driving high ERC only within proteins (Figure 5)

      -Figure 3: I think there may be an error in the domain labeling in Figure 3. The comparison between OKP1_2 and AME1_3 is the highest ERC value in the matrix. From the complex structure, it appears that OKP1_2 and AME1_3 are two helix domains that appear to physically interact. However, in the ERC matrix, they are not shaded to indicate they are a physical interaction pair. Please double-check that the interacting domains are properly annotated, since mis-annotation would have a large impact on the interpretation of this figure with respect to the overall question the paper addresses.

      Thank you for catching this - fixed.

      Minor comments:

      -Methods: "The full ERC pipeline can be found at (Github)." Provide github URL here? Thanks for the catch, fixed

      -Discussion: "Evidence for physical coevolution however was tempered by a global permutation test, which did not reach significance, indicating that this inference is sensitive to approach and further underlines the relatively weak contribution of physical coevolution." The word "relatively" may not be a good choice of words. In comparison to what? As is, the phrasing could be interpreted as implying "in comparison to non-physical interactions". This would not be accurate, because the results show that in general, physical interactions are a stronger contributor to ERC (consistent trend but varied significance, depending on methodology) than non-physical interactions.

      Thank you for your help with clarification. The word relatively was removed.

      However, we do not agree that in general physical interactions are a stronger contributor to ERC than non-physical interactions (such as gene expression, codon adaptation, etc.). In all of our statistical tests a maximum of 5 of the 17 complexes ranked physical interactions significantly higher than non-physical interactions. While the ROC-AUC is greater than 0.5 for 12 of the 17 complexes only 4 of those were significant.

      -I have not seen Fisher-transformed correlation coefficients used in the context of ERC. I understand that it's helpful in normalizing the results so that they are comparable between ERC comparisons with differing numbers of overlapping branches (i.e. points on a linear correlation plot). A reference of where the authors got this idea or a little more verbiage to describe the rationale would be helpful. On a related note, I would expect that using linear correlation p-value instead of R-squared would account for differences in overlapping branches, eliminating the need to apply fisher-transformation. It would be helpful for the authors to outline their rationale for using a correlation coefficient rather than a P-value.

      We agree that this method could be made clearer. We made a methodological choice to use Fisher transformation over linear correlation p-value. Both methods should achieve the same end result by taking the number of branches into consideration. We have added additional explanation to the results section “Both protein pathways and complexes have elevated ERC”:

      “ERC was calculated for all pairs of the 12,552 genes. For each pair the correlation is Fisher transformed to normalize for the number of shared branches that contribute to the correlation. This normalization is necessary to reduce false positives that have high correlation solely due to a small number of data points. This normalization also allows for direct comparison of ERC between gene pairs that have differing numbers of branches contributing to the score.”

      We also added additional explanation in the methods section including the formula used to calculate the Fisher transformation

      -Did the authors use Pearson or Spearman correlation coefficient?

      Pearson. We clarified this in the methods section, “Calculating evolutionary rate covariation” : “Evolutionary rate covariation is calculated by correlating relative evolutionary rates (RERs) between two gene trees using a Pearson correlation.”

      -Did the authors explore ERC between domains within a single protein? Do domains within a protein exhibit ERC? I would expect that they do. If they do, this could likely be attributed to linkage/genetic hitchhiking, representing a new angle/factor beyond physical interaction that could lead to ERC. This is just an idea for a future analysis, not necessarily a request within the scope of the present paper.

      We did calculate the ERC between domains of a single protein but did not include them in the analysis since they didn’t address the specific question we posed. As expected they are highly correlated, and past unpublished studies in the lab do find a very weak, but detectable genome-wide, signature of rate covariation between neighboring colinear genes on a chromosome. That signal was however so weak as to be eclipsed by true functional relationships, when present.

      Reviewer #3 (Recommendations For The Authors):

      Please read the literature and revise accordingly.

      We understand the confusion surrounding previous literature on the relationship between expression levels and evolutionary rates when comparing between different proteins. Those studies clearly showed how expression level is highly predictive of a given protein’s average evolutionary rate. However, we are studying the change in evolutionary rate over branches for single proteins. This is inherently different because we’re following rate fluctuations in the same protein over time. To our knowledge it has not yet been shown that expression level commonly varies enough over time to produce large rate variations over time in the same protein, and if it is responsible for the correlations of rate we observe between co-functional proteins. It is however reasonable to expect that what governs between-protein differences in rate could also contribute to between-branch differences (over time for a single protein). In fact, our earlier study approached this (Clark et al. Genome Research 2012). We expect expression level could influence rate over time and lump its effect together with general non-physical forces, such as selection pressures. We recognize we could do better in defining more of the non-physical forces and the past literature. We added the following section to the introduction and many other clarifying statements throughout the manuscript:

      “For the purposes of this study, the forces that contribute to correlated evolutionary rates are grouped into two bins, physical and non-physical. The physical force is coevolution occurring at physical interaction interfaces. Non-physical forces include gene co-expression, codon adaptation, selective pressures, and gene essentiality. There is a well accepted negative relationship between gene expression and rate of protein evolution where genes that are highly expressed generally have slower rates of evolution14,15. However, Cope et al.16 found that there is a weak relationship between both gene expression and the number of interactions a protein has with the coevolution of expression level. Conversely, they found a strong relationship between proteins that physically interact and the coevolution of gene expression. These findings illuminate the difference between the strong relationship of gene expression level on the average evolutionary rate of a protein and the weak contribution of gene expression level to correlated evolutionary rates of proteins across branches. The finding that physically interacting proteins have strong expression level coevolution brings to question how much coevolution of physically interacting proteins contributes to overall covariation in protein evolutionary rates.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings that examine both how Down syndrome (DS)-related physiological, behavioral, and phenotypic traits track across time, as well as how chronic treatment with green tea extracts 25 enriched in epigallocatechin-3-gallate (GTE-EGCG), administered in drinking water spanning prenatal through 5 months of age, impacts these measures in wild-type and Ts65Dn mice. However, the strength of the evidence is incomplete, due to high variability across measures, perhaps attributable to a failure to include sex as a factor for measures known to be sexually dimorphic. This study is of interest to scientists interested in Down Syndrome and its' treatment, as well as scientists who study disorders that impact multiple organ systems.

      Public Reviews:

      Using Ts65Dn - the most commonly used mouse model of Down syndrome (DS) - the goal of this study is two-pronged: 1) to conduct a thorough assessment of DS-related genotypic, physiological, behavioral, and phenotypic measures in a longitudinal manner; and 2) to measure the effects of chronic GTE-EGCG on these measures in the Ts65Dn mouse model. Corroborating results from several previous studies on Ts65Dn mice, findings of this study show confirm the Ts65Dn mouse model exhibits the suite of traits associated with DS. The findings also suggest that the mouse model might have experienced drift, given the milder phenotypes than those reported by earlier studies. Results of the GTE-EGCG treatment do not support its therapeutic use and instead show that the treatment exacerbated certain DS-related phenotypes.

      Strengths:

      The authors performed a rigorous assessment of treatment and examined treatment and genotypic alterations at multiple time points during growth and aging. Detailed analysis shows differences in genotype during aging as well as genotype with treatment. This study is solid in the overarching methodological approach (with the exception of RNAseq, described below). The biggest strength of the study is its approach and dataset, which corroborate results from a multitude of past studies on Ts65Dn mice, albeit on adult specimens. It would be beneficial for the dataset to be made available to other researchers using a public data repository.

      We deeply appreciate the reviewers' positive feedback. Their acknowledgment of the solid methodological approach and the rigorous assessment of genotypic and treatment effects over various developmental stages resonates with our motivation. Their suggestion to make the dataset available in a public data repository for other researchers is well-taken. We are committed to data sharing and we are creating a dedicated platform to facilitate the accessibility of our research data to the scientific community. Given its size and complexity, we currently hold the dataset available upon reasonable request to the corresponding authors.

      Weaknesses:

      There are several primary weaknesses, described below:

      Sex was not considered in the analyses.

      The number of experimental animals of each sex are not clearly represented in the paper, but are buried in supplemental tables, and the Ns for the RNAseq are unclear. No analyses were done to examine sex differences in male/female DS or WT animals with or without treatment. Body measurements will greatly vary by sex, but this was not taken into consideration during assessments. As such, there is a high amount of variability within each cohort measured for body assessments (tibia, body weight, skeletal development etc.). Supplemental table 14 had the list of each animal, but not collated by sex, genotype or treatment, making it difficult to assess the strength of each measurement.

      Our study primarily concentrated on providing a holistic understanding of the impact of trisomy and GTE-EGCG treatment on Down syndrome, and was not explicitly designed to investigate sexual dimorphism. However, instead of reporting on only one sex and thereby obviating sex as a source of variation, as in previously published studies, we decided to include both male and female mice within the study design to represent a more realistic portrayal of the nature of Down syndrome in a heterogeneous population. By encompassing both sexes, we aim to better capture the variability in Down syndrome.

      As we do acknowledge the significance of sex bias in scientific research, we considered performing post-hoc analyses to test the effect of sexual dimorphism, but found that our dataset was underpowered to obtain reliable results, since our experiments were not a priori designed to investigate this question and sample sizes for each sex by separate were not large enough. Nevertheless, considering the reviewer’s comment, we have taken specific steps to improve the representation of sex-related information and to enhance the clarity of our manuscript.

      First, we have redesigned all figures using empty and full symbols to distinguish male from female mice within each analysis, providing readers with an immediate sense of the sex distribution in each experimental group. Moreover, we have modified Supplementary Table 1 to offer a comprehensive breakdown of the number of male and female mice for each test, along with their respective genotypes and treatment groups. This table aims to make the sample size and sex distribution within our study as transparent as possible for our readers. While we acknowledge that our study lacked the statistical power to perform a detailed sex-based analysis, the visual representation of sex in our data shows which systems are mainly affected by sexual dysmorphism. This evidence can guide future investigations directly designed to investigate sexual effects in certain systems or structures.

      Key results are not clearly depicted in the main figures

      Rigorous assessment of each figure and the clarity of the figure to convey the results of the analysis needs to be performed. Many of the figures do not clearly represent the findings, with authors heavily relying on supplemental figures to present details to explain results. Figure legends do not adequately describe figures; rather, they are limited to describing how the analysis is performed. For example, LDA plots in Figure 4 do not clearly convey the results of metabolite analysis.

      Overall, the amount of data presented here is overwhelming, making it difficult to interpret the findings. Some assessments that do not add to the overall paper need to be removed. Clarifying the text, figures and trimming the supplement to represent the data in a manner that is easily understood will improve the readability of the paper. For example, perhaps measures which are not strongly impacted by genotype could be moved to the supplement, because they are not directly relevant to the question of whether GTE-EGCG reverses the impact of trisomy on the measures.

      As rightly pointed out by the reviewers, the vast amount of data generated by our holistic and longitudinal approach is one of the primary strengths, but also an important challenge in our study. Our dataset encompasses a comprehensive assessment of the effects of treatment and genotypic alterations at multiple time points during growth and aging. This multi-dimensional evaluation is pivotal to our research, and relegating data to supplementary material would restrict access to this holistic understanding. Our aim is to provide readers with a complete view of the complex interactions we have explored, and retaining this data in the main text is essential to uphold the integrity of our work.

      Indeed, we specifically chose to submit or manuscript to eLife because this journal allows to access supplemental information directly from the text and figures in the main manuscript and best aligned with our approach to data presentation. The eLife format permits us to offer readers a quick and informative overview of all the data within the main figures employing multivariate techniques such as Linear Discriminant Analysis or Principal Component Analysis. Subsequently, we supply more detailed analyses in the supplementary figures for readers who wish to delve deeper into specific aspects. Furthermore, while certain figures may be categorized as supplementary, for us it is crucial, and we would like to emphasize, that every result is comprehensively described in the main text.

      Acknowledging the concerns raised about the density of our paper and the potential challenges in interpreting the findings, we have conducted a thorough review of the text and figure legends. We have made revisions with the goal to enhance clarity and readability. We have made dedicated efforts to ensure that readers can readily grasp the significance of our results and appreciate the intricacies of our findings. We firmly believe that with these revisions, our chosen approach is the most effective means of presenting the richness of our data and maintaining the integrity of our findings.

      Lack of clarity in the behavioral analyses

      Behavioral assessments are not clearly written in the methods. For example, for the novel object recognition task, it isn't clear how preference was calculated. Is this simply the percent of time spent with the novel object, or is this a relative measure (novel:familiar ratio)? This matters because if it is simply the percent of time, the relevant measure is to compare each group to 50% (the absence of a preference). The key measures for each test need to be readily distinguished from the control measures.

      There are also many dependent behavioral measures. For example, speed and distance are directly related to each other, but these are typically reported as control measures to help interpret the key measure, which is the anxiety-like behavior. Similarly, some behavioral tests were used to represent multiple behavioral dimensions, such as anxiety and arousal. In general, the measurements of arousal seem atypical (speed and distance are typically reported as control measures, not measures of arousal). Similarly, measures of latency during training would not typically be used as a measure of long-term memory but instead reported as a control measure to show learning occurred. LDA analysis requires independence of the measures, as well as normality. It does not appear that all of the measures fed into this analysis would have met these assumptions, but the methods also do not clearly describe which measures were actually used in the LDA.

      We agree with the reviewers’ concerns about the clarity of our behavioral analyses and we have thus added information to the methods section to clarify the procedures. Specifically, for SPSN, social approach was recorded as time spent close to STR1, and a preference ratio was calculated as Pref= 100 Time close to STR1/(Time close to STR1 + Time close to Empty). Social recognition memory was scored as preference towards STR2 and calculated as Pref =100 (time close to STR2) / (Time close to STR1 + Time close to STR2). For NOR, preference for novel object was calculated as Pref=100* Time novel object / (Time familiar object + novel object).

      With regards to the different variables reported for the behavioral protocols, we agree that some measures, such as path length and speed can be used as control measures. For example, in an open field test, path length is an important control measure to assess whether an animal is engaged in the task. However, if an animal is actively moving, the amount of distance covered can but does not have to correlate with the amount of time that a mouse spends in the center of the open field. Using the measure of distance covered as a measure for general arousal and time spent in the center as a measure for anxiety related behavior allows a more nuanced evaluation of animal behavior. For instance, two animals spending similar amounts of time in the center may exhibit differences in the distance they cover. In this scenario, we would argue that anxiety related behavior (defined as exploring the center of an open field) would not reflect well a behavioral difference between the two animals, while the aspect of arousal clearly is a differencing factor.

      Regarding the PA task and the use of latency during training, we agree that typically latency during training can be used as control measure to show that learning occurred. However, our study involved testing animals at two distinct time points. Contextual fear conditioning creates very robust memory traces that can persist for weeks or even months, and therefore the starting premise is very different when repeating the test. Initially, the animals were experimentally naïve and had not yet experienced a foot shock, leading to a rapid entry into the dark box. However, after experiencing the first CS-US presentation, a robust and persistent contextual fear memory trace is formed. Therefore, the latency observed in the second training phase of the PA reflects in essence long-term contextual fear memory, that is robustly displayed in WT animals but less in treated WT and TS animals. We have included this clarification in the methods and results sections.

      Finally, we want to thank the reviewer for noticing the error in the LDAs, as the analysis was indeed performed including dependent variables for some systems. We have re-evaluated the LDAs for the behavioral tests and tibia microarchitecture tests, excluding dependent variables. As a result, the text and significance levels have been adjusted accordingly. To enhance transparency and clarity, we have included Supplementary Table S21, which precisely outlines the variables included in each LDA.

      Unclear value of RNAseq

      RNAseq was performed in cerebellum, a relatively spared region in DS pathology at an early time point in disease. Further, the expression of 125 genes triplicated in DS was shown in a PCA plot to highly overlap with WT, indicating that there are minimal differences in gene expression in these genes. If these genes are not critical for cerebellar function, perhaps this could account for the lack of differences between WT and Ts65Dn mice. If the authors are interested in performing RNAseq, it would have made more sense to perform this in hippocampus (to compare with metabolites) and to perform more stringent bioinformatic analysis than assessment by PCA of a limited subset of genes. Supplementary Table S14, which shows the differentially expressed genes, appears to be missing from the manuscript and cannot be evaluated. Additionally, the methods of the RNAseq are not sufficiently described and lack critical details. For example, what was the normalization performed, and which groups were compared to identify differentially expressed genes? It would also be worthwhile to describe how animals were identified for RNAseq-were those animals representative of their groups across other measures?

      We acknowledge the reviewers' comments on the RNAseq analysis and would like to provide additional insights into our rationale and choices for this analysis. The primary aim of our RNAseq analysis was to offer supplementary evidence in support of the broader context of our paper. Rather than focusing on specific genes, our aim was to assess potential alterations in transcription within genes triplicated in the mouse model and explore differentially expressed genes across the entire genome. Therefore, we conducted a global analysis of the triplicated genes using a PCA and analyzed the differentially expressed genes across the entire genome as shown in Supplementary Table S14. The table was originally included as a separate Excel file but apparently it was not received by the reviewers. We have contacted the eLife editorial to ensure its inclusion in the current version. Furthermore, we have modified the text to clarify that both the triplicated genes and the entire genome were analyzed.

      Regarding the use of cerebellum instead of hippocampus, we agree with the reviewers that the hippocampus is a major tissue of interest in the study of Down Syndrome since it mostly relates to cognition. Trisomic patients, however, also display other typical features such as for example a delay in the acquisition of motor skills. Here we decided to focus on the cerebellum as it is primarily associated to the locomotor system but also plays a role in other cognitive functions such as language processing and memory. Furthermore, at the time of the RNAseq analysis, the mice were 8 months old, equivalent to the adult human stage, and previous studies have shown transcriptomic alterations in this tissue and mouse model (Olmos-Serrano et al., 2016; Saran et al., 2003).

      The lack of observable differences between WT and Ts65Dn mice in our PCA analysis may be attributed to several factors as discussed in our article. First, the high variability within each group, inherent to the complexity of DS, may obscure inter-group differences. Additionally, the subtlety of gene expression differences between WT and trisomic mice in the set of triplicated genes, as suggested by other transcriptomics studies on DS (Aït Yahya-Graison et al., 2007; Lyle et al., 2004; Olmos-Serrano et al., 2016; Saran et al., 2003), may contribute to the limited distinctions observed. Furthermore, regarding treatment effects, the timing of the RNAseq analysis should be considered since it was conducted at the endpoint, three months after treatment cessation. This temporal aspect could imply that the effects of the drug are not persistent, and a molecular memory might not be formed and maintained.

      Nevertheless, we appreciate the reviewers' constructive comments and acknowledge the potential for more stringent bioinformatic analyses. While our intention was to provide an initial, global perspective, we are eager to support further investigations that delve deeper into the complexities of DS-related molecular mechanisms. Consequently, the dataset is available for other researchers to explore more specific questions upon request.

      Finally, we have updated the methods section of the article to offer more detailed information on RNAseq processing and analysis. We have also clarified that all the surviving mice were included in the analysis.

      Recommendations for the authors:

      (1) Please add power calculations for each of the assessments.

      We would like to clarify that we had already conducted power calculations as part of the initial planning and design phase of our study. After data acquisition and analysis, we have utilized appropriate statistical methods to interpret the results based on the data we have collected. Given that we had conducted a priori power calculations prior to data collection and that our analysis is based on the acquired data, we do not see the added value in including post hoc power calculations. Our primary focus has been on performing the correct statistical analyses to accurately interpret the results and draw meaningful conclusions.

      (2) Introduction has some excessive references for each statement, which are not necessary. For instance: lines 67-73 are only references for 1 statement and lines 74-76 are references for a 2nd statement in the same sentence.

      We have removed redundant references.

      (3) Introduction: Lines 136-146 Gene names need to be spelled out, not just the IDs. Were these studies done in human or mouse models of DS?

      We have spelled out the names of the genes.

      (4) Why was brain volume and brain structure size normalized to body weight, not clearly explained?

      The choice to normalize brain volume and brain structure size to body weight was a deliberate decision made to address potential confounding factors in our study. In the case of trisomic (TS) mice, they are generally smaller in size compared to their wild-type (WT) counterparts. The same may hold true for sex-related size differences. Without normalization, assessing brain volume and structure size could be misleading, as it might reflect the differences in overall body size rather than providing insights into the specific aspects of brain structure that we aimed to investigate. We have clarified this in the methods section.

      (5) In cognitive tests, some of the WT data represented in Figure 3 does not match supplemental findings. Again power calculations may indicate a higher number of WT mice are needed to clarify this discrepancy.

      We appreciate the reviewers' observation regarding the disparities between the data presented in Figure 3 and the supplemental figures. We would like to clarify that these variations are a result of the distinct analytical approaches employed in the two sets of data.

      In Figure 3 and all main figures, the data were analyzed using multivariate tests, which consider multiple variables simultaneously and are particularly suited for investigating the collective impact of multiple factors. Conversely, the results shown in the supplementary figures were derived from univariate tests, which focus on individual variables and are well-suited for addressing specific questions related to each variable in isolation. The discrepancies between the data in the main figures and the supplementary figures can be attributed to the differences in the analytical methods chosen.

      As for the suggestion of conducting power calculations to address the observed differences, we believe that the differences in data are inherent to the distinct analytical strategies and the specific research questions each analysis intended to answer. Power calculations may not be the most suitable approach in this context, as they pertain to sample size planning for hypothesis testing and may not reconcile the inherent dissimilarity between multivariate and univariate analyses.

      Aït Yahya-Graison, E., Aubert, J., Dauphinot, L., Rivals, I., Prieur, M., Golfier, G., . . . Potier, M. C. (2007). Classification of human chromosome 21 gene-expression variations in Down syndrome: impact on disease phenotypes. Am J Hum Genet, 81(3), 475-491. https://doi.org/10.1086/520000

      Lyle, R., Gehrig, C., Neergaard-Henrichsen, C., Deutsch, S., & Antonarakis, S. E. (2004). Gene expression from the aneuploid chromosome in a trisomy mouse model of down syndrome. Genome Res, 14(7), 1268-1274. https://doi.org/10.1101/gr.2090904

      Olmos-Serrano, J. L., Kang, H. J., Tyler, W. A., Silbereis, J. C., Cheng, F., Zhu, Y., . . . Sestan, N. (2016). Down Syndrome Developmental Brain Transcriptome Reveals Defective Oligodendrocyte Differentiation and Myelination. Neuron, 89(6), 1208-1222. https://doi.org/10.1016/j.neuron.2016.01.042

      Saran, N. G., Pletcher, M. T., Natale, J. E., Cheng, Y., & Reeves, R. H. (2003). Global disruption of the cerebellar transcriptome in a Down syndrome mouse model. Hum Mol Genet, 12(16), 2013-2019. https://doi.org/10.1093/hmg/ddg217

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      In this study, Yan et al. investigate the molecular bases underlying mating type recognition in Tetrahymena thermophila. This model protist possesses a total of 7 mating types/sexes and mating occurs only between individuals expressing different mating types. The authors aimed to characterize the function of mating type proteins (MTA and MTB) in the process of self- and non-self recognition, using a combination of elegant phenotypic assays, protein studies, and imaging. They showed that the presence of MTA and MTB in the same cell is required for the expression of concavalin-A receptors and for tip transformation - two processes that are characteristic of the costimulation phase that precedes cell fusion. Using protein studies, the authors identify a set of additional proteins of varied functions that interact with MTA and MTB and are likely responsible for the downstream signaling processes required for mating. This is a description of a fascinating self- and non-self-recognition system and, as the authors point out, it is a rare example of a system with numerous mating types/sexes. This work opens the door for the further understanding of the molecular bases and evolution of these complex recognition systems within and outside protists.

      The results shown in this study point to the unequivocal requirement of MTA and MTB proteins for mating. Nevertheless, some of the conclusions regarding the mode of functioning of these proteins are not fully supported and require additional investigation.

      Strengths:

      (1) The authors have established a set of very useful knock-out and reporter lines for MT proteins and extensively used them in sophisticated and well-designed phenotypic assays that allowed them to test the role of these proteins in vivo.

      (2) Despite their apparent low abundance, the authors took advantage of a varied set of protein isolation and characterization techniques to pinpoint the localization of MT proteins to the cell membrane, and their interaction with multiple other proteins that could be downstream effectors. This opens the door for the future characterization of these proteins and further elucidation of the mating type recognition cascade.

      Weaknesses:

      The manuscript is structured and written in a very clear and easy-to-follow manner. However, several conclusions and discussion points fall short of highlighting possible models and mechanisms through which MT proteins control mating type recognition:

      (1) The authors dismiss the possibility of a "simple receptor-ligand system", even though the data does not exclude this possibility. The model presented in Figure 2 S1, and on which the authors based their hypothesis, assumes the independence of MTA and MTB proteins in the generation of the intracellular cascade. However, the results presented in Figure 2 show that both proteins are required to be active in the same cell. Coupled with the fact that MTA and MTB proteins interact, this is compatible with a model where MTA would be a ligand and MTB a receptor (or vice-versa), and could thus form a receptor-ligand complex that could potentially be activated by a non-cognate MTA-MTB receptor-ligand complex, leading to an intracellular cascade mediated by the identified MRC proteins. As it stands, it is not clear what is the proposed working model, and it would be very beneficial for the reader for this to be clarified by having the point of view of the authors on this or other types of models.

      We are very grateful that Reviewer #1 proposed the possibility that MTA and MTB form a receptor-ligand complex in which one acting as the ligand and the other as the receptor. We considered this hypothesis when asking how dose MTRC function, too. However, our current results do not support this idea. For instance, if MTA were a ligand and MTB a receptor, we would expect a mating signal upon treatment with MTAxc protein, but not with MTBxc. Contrary to this expectation, our experiments revealed that both MTAxc and MTBxc exhibit very similar effects (Figure 5, green and blue), and their combined treatment produces a stronger effect (Figure 5, teal). This suggests a mixed function for both proteins. (We incorporated this discussion into the revised version [line 120-121, 240-244].) It is pity that our current knowledge does not provide a detailed molecular mechanism for this intricate system. We are actively investigating the protein structures of MTA, MTB, and the entire MTRC, hoping to gain deeper insights into the molecular functions of MTA and MTB.

      Additionally, we also realized that the expression we used in the previous version, “simple receptor-ligand model”, is not clearly defined. As Reviewer #1 pointed out, in this section, we examined whether the individual proteins of MTA and MTB act as a couple of receptor and ligand. We think this is the simplest possibility as a null hypothesis for Tetrahymena mating-type recognition. We have clarified it in the revised version (line 90-91, 104-106). According to this section, we proposed that MTA and MTB may form a complex that serves as a recognizer (functioning as both ligand and receptor) (line 117-118).

      (2) The presence of MTA/MTB proteins is required for costimulation (Figure 2), and supplementation with non-cognate extracellular fragments of these proteins (MTAxc, or MTBxc) is a positive stimulator of pairing. However, alone, these fragments do not have the ability to induce costimulation (Figure 5). Based on the results in Figures 5 and 6 the authors suggest that MT proteins mediate both self and non-self recognition. Why do MTAxc and MTBxc not induce costimulation alone? Are any other components required? How to reconcile this with the results of Figure 2? A more in-depth interpretation of these results would be very helpful, since these questions remain unanswered, making it difficult for the reader to extract a clear hypothesis on how MT proteins mediate self- and non-self-recognition.

      Several factors could contribute to the inability of MTA/Bxc to induce costimulation. It is highly likely that additional components are necessary, given that MTA/B form a protein complex with other proteins. Moreover, the expression of MTA/Bxc in insect cells, compared with Tetrahymena, might result in differences in post-translational modifications. Additionally, there are variations in protein conditions; on the Tetrahymena membrane, these proteins are arranged regularly and concentrated in a small area, while MTA/Bxc is randomly dispersed in the medium. The former condition could be more efficient. If there is a threshold required to stimulate a costimulation marker, MTA/Bxc may fail to meet this requirement. Much more studies are needed to fully answer this question. We acknowledged this limitation in the revised version (line 244-248).

      Reviewer #2:

      This manuscript reports the discovery and analysis of a large protein complex that controls mating type and sexual reproduction of the model ciliate Tetrahymena thermophila. In contrast to many organisms that have two mating types or two sexes, Tetrahymena is multi-sexual with 7 distinct mating types. Previous studies identified the mating type locus, which encodes two transmembrane proteins called MTA and MTB that determine the specificity of mating type interactions. In this study, mutants are generated in the MTA and MTB genes and mutant isolates are studied for mating properties. Cells missing either MTA or MTB failed to co-stimulate wild-type cells of different mating types. Moreover, a mixture of mutants lacking MTA or MTB also failed to stimulate. These observations support the conclusion that MTA and MTB may form a complex that directs mating-type identity. To address this, the proteins were epitope-tagged and subjected to IP-MS analysis. This revealed that MTA and MTB are in a physical complex, and also revealed a series of 6 other proteins (MRC1-6) that together with MTA/B form the mating type recognition complex (MTRC). All 8 proteins feature predicted transmembrane domains, three feature GFR domains, and two are predicted to function as calcium transporters. The authors went on to demonstrate that components of the MTRC are localized on the cell surface but not in the cilia. They also presented findings that support the conclusion that the mating type-specific region of the MTA and MTB genes can influence both self- and non-self-recognition in mating.

      Taken together, the findings presented are interesting and extend our understanding of how organisms with more than two mating types/sexes may be specified. The identification of the six-protein MRC complex is quite intriguing. It would seem important that the function of at least one of these subunits be analyzed by gene deletion and phenotyping, similar to the findings presented here for the MTA and MTB mutants. A straightforward prediction might be that a deletion of any subunit of the MRC complex would result in a sterile phenotype. The manuscript was very well written and a pleasure to read.

      Thanks for the valuable comments and suggestions. We are currently in the process of constructing deletion strains for these genes. As of now, we have successfully obtained ΔMRC1-3 and MRC4-6 knockdown strains. Our preliminary observations indicate that ΔMRC1-3 strains are unable to undergo mating. However, we prefer not to include these results in the current manuscript, as we believe that more comprehensive studies are still needed.

      Reviewer #3:

      The authors describe the role, location, and function of the MTA and MTB mating type genes in the multi-mating-type species T. thermophila. The ciliate is an important group of organisms to study the evolution of mating types, as it is one of the few groups in which more than two mating types evolved independently. In the study, the authors use deletion strains of the species to show that both mating types genes located in each allele are required in both mating individuals for successful matings to occur. They show that the proteins are localized in the cell membrane, not the cilia, and that they interact in a complex (MTRC) with a set of 6 associated (non-mating type-allelic) genes. This complex is furthermore likely to interact with a cyclin-dependent kinase complex. It is intriguing that T. thermophila has two genes that are allelic and that are both required for successful mating. This coevolved double recognition has to my knowledge not been described for any other mating-type recognition system. I am not familiar with experimental research on ciliates, but as far as I can judge, the experiments appear well performed and mostly support the interpretation of the authors with appropriate controls and statistical analyses.

      The results show clearly that the mating type genes regulate non-self-recognition, however, I am not convinced that self-recognition occurs leading to the suppression of mating. An alternative explanation could be that the MTA and MTB proteins form a complex and that the two extracellular regions together interact with the MTA+MTB proteins from different mating types. This alternative hypothesis fits with the coevolution of MTA and MTB genes observed in the phylogenetic subgroups as described by Yan et al. (2021 iScience). Adding MTAxc and/or MTBxc to the cells can lead to the occupation of the external parts of the full proteins thereby inhibiting the formation of the complex, which in turn reduces non-self interactions. Self-recognition as explained in Figure 2S1 suggests an active response, which should be measurable in expression data for example. This is in my opinion not essential, but a claim of self-recognition through the MTA and MTB should not be made.

      We express our gratitude to Reviewer #3 for proposing the occupation model and have incorporated this possibility into the manuscript. We believe it is possible that occupation may serve as the molecular mechanism through which self-recognition negatively regulates mating. If there is a physical interaction between mating-type proteins of the same type, but this interaction blocks the recognition machinery rather than initiating mating, it can be considered a form of self-recognition. This aligns with the observation that strains expressing MTA/B6 and MTB2 mate normally with WT cells of all mating types except for VI and II (line 203-204). A concise discussion on this topic is included in the manuscript (line 288-293, 659-661). We are actively investigating the downstream aspects of mating-type recognition, and we hope to provide further insights into this question soon.

      The authors discuss that T. thermophila has special mating-type proteins that are large, while those of other groups are generally small (lines 157-160 and discussion). The complex formed is very large and in the discussion, they argue that this might be due to the "highly complex process, given that there are seven mating types in all". There is no argument given why large is more complex, if this is complex, and whether more mating types require more complexity. In basidiomycete fungi, many more mating types than 7 exist, and the homeodomain genes involved in mating types are relatively small but highly diverse (Luo et al. 1994 PMID: 7914671). The mating types associated with GPCR receptors in fungi are arguably larger, but again their function is not that complex, and mating-type specific variations appear to evolve easily (Fowler et al 2004 PMID: 14643262; Seike et al. 2015 PMID: 25831518). The large protein complex formed is reminiscent of the fusion patches that develop in budding or fission yeasts. In these species, the mating type receptors are activated by ligand pheromones from the opposite mating type that induce polarity patch formation (see Sieber et al. 2023 PMID: 35148940 for a recent review). At these patches, growth (shmooing) and fusion occur, which is reminiscent (in a different order) of the tip transformation in T. thermophilia. The fusion of two cells is in all taxa a dangerous and complex event that requires the evolution of very strict regulation and the existence of a system like the MTRC and cyclin-dependent complex to regulate this process is therefore not unexpected. The existence of multiple mating types should not greatly complicate the process, as most of the machinery (except for the MTA and MTB) is identical among all mating types.

      We are very grateful that Reviewer #3 provide this insightful view and relevant papers. In response to the feedback, we removed the sentences regarding “multiple mating types greatly complicate the process” in the revised version. Instead, we have introduced a discussion section comparing the mating systems of yeasts and Tetrahymena (line 279-286).

      The Tetrahymena/ciliate genetics and lifecycle could be better explained. For a general audience, the system is not easy to follow. For example, the ploidy of the somatic nucleus with regards to the mating type is not clear to me. The MAC is generally considered "polyploid", but how does this work for the mating type? I assume only a single copy of the mating type locus is available in the MAC to avoid self-recognition in the cells. Is it known how the diploid origin reduces to a single mating type? This does not become apparent from Cervantes et al. 2013.

      In T. thermophila, the MIC (diploid) contains several mating-type gene pairs (mtGP, i.e., MTA and MTB) organized in a tandem array at the mat locus on each chromosome. In sexual reproduction, the new MAC of the progeny develops from the fertilized MIC through a series of genome editing events, and its ploidy increases to ~90 by endoreduplication. During this process, mtGP loss occurs, resulting in only one mtGP remaining on the MAC chromosome. The mating-type specificity of mtGPs on each chromosome within one nucleus becomes relatively pure through intranuclear coordination. After multiple assortments (possibly caused by MAC amitosis during cell fission), only mtGPs of one mating-type specificity exist in each cell, determining the cell’s mating type.

      It is pity that the exact mechanisms involved in this complicated process remain a black box. The loss of mating-type gene pairs is hypothesized to involve a series of homologous recombination events, but this has not been completely proven. Furthermore, there is no clear understanding of how intranuclear coordination and assortment are achieved. While we have made observations confirming these events, a breakthrough in understanding the molecular mechanism is yet to be achieved.

      We included more information in the revised version (line 672-683). Given the complexity of these unusual processes, we recommend an excellent review by Prof. Eduardo Orias (PMID: 28715961), which offers detailed explanations of the process and related concepts (line 685-686).

      Also, the explanation of co-stimulation is not completely clear (lines 49-60). Initially, direct cell-cell contact is mentioned, but later it is mentioned that "all cells become fully stimulated", even when unequal ratios are used. Is physical contact necessary? Or is this due to the "secrete mating-essential factors" (line 601)? These details are essential, for interpretation of the results and need to be explained better.

      Sorry that we didn’t realize the term “contact” is not precise enough. In Tetrahymena, physical contact is indeed necessary, but it can refer to temporary interactions. Unlike yeast, Tetrahymena cells exhibit rapid movement, swimming randomly in the medium. Occasionally, two cells may come into contact, but they quickly separate instead of sticking together. Even newly formed loose pairs often become separated. As a result, one cell can come into contact with numerous others and stimulate them. We have clarified this aspect in the revised version (line 50-51, 57).

      Abstract and introduction: Sexes are not mating types. In general, mating types refer to systems in which there is no obvious asymmetry between the gametes, beyond the compatibility system. When there is a physiological difference such as size or motility, sexes are used. This distinction is of importance because in many species mating types and sexes can occur together, with each sex being able to have either (when two) or multiple mating types. An example are SI in angiosperms as used as an example by the authors or mating types in filamentous fungi. See Billiard et al. 2011 [PMID: 21489122] for a good explanation and argumentation for the importance of making this distinction.

      We have clarified the expression in the revised version (line 20, 38, 40, 45).

      Recommendations for the authors:

      Reviewer #1:

      I really enjoyed reading this manuscript and I think a few tweaks in the writing/data presentation could greatly improve the experience for the reader:

      (1) The information about your previous work in identifying downstream proteins CDK19, CYC9, and CIP1 (lines 170-173) could be directly presented in the introduction.

      We have moved this information in the introduction in the revised version (line 74-77).

      (2) For a reader who is not familiar with Tetrahymena, a few more details on how reporter and knock-out lines are generated would be beneficial.

      We introduced the knock-out method in Figure 2 – figure supplement 1B, HA-tag method in Figure 3A, and MTB2-eGFP construction method in Figure 4E. In addition, we introduced how co-stimulation markers observed in Materials and Methods (line 404-410)

      (3) Figures 5 and 6: clarify the types of pairing and treatments that were done directly in the figure (eg. adding additional labels). As of now, it is necessary to go through the text and legend to try and understand in detail what was done.

      Cell types and treatments were directly introduced in the revised figure (Figure 5 and 6).

      (4) The logical transition in lines 136-142 is hard to follow.

      We rewrote this paragraph in the revised version (lines 143-156). Additionally, we added a figure to illustrate the theoretical mating-type recognition model between WT cells and ΔCDK19, ΔCYC9 cells, MTAxc, MTBxc proteins, and ΔMTA, ΔMTB cells (Figure 2 – figure supplement 1D-G).

      (5) Lines 191-196: the fact that cells expressing multiple mating types can self goes against an active self-rejection system - if this is the case there should be self-rejection among all expressed mating types. Unless non-self recognition is an active process and self-recognition is simply the absence of non-self recognition. The authors briefly mention this in lines 263-265, but it would be interesting to expand and clarify this.

      We appreciate that Reviewer #1 notice the interesting selfing phenotype of the MTB2-eGFP (MTVI background) strain. We further discussed it in the revised manuscript (line 298-306).

      (6) The authors briefly mention the possibility of different mating types using different recognition mechanisms (lines 255-260), based on the big differences in the size of the mating-specific region of MT proteins. Following this and the weakness nr. 2, I think it would be pertinent to gather and present more information on the properties and structures of the mating-type specific regions of MT proteins. Simple in silico analysis of motifs, structure, etc. could help clarify the role of these regions. It seems more parsimonious that MT proteins would have variable mating type specific regions that account for the recognition of the different mating types, and conserved cytoplasmic functions that could trigger a single downstream signaling cascade. It would be interesting to know the authors' opinion on this.

      We are very grateful for this suggestion. Actually, we are currently working on determining the 3D structure of MTRC. The Alphafold2 prediction indicates that the MT-specific region is comprised of seven global β-sheets, resembling the structure of immunoglobulins (Ig). Our most recent cryo-EM results have revealed a ~15Å structure, aligning well with the prediction. However, the main challenge lies in the low expression levels, both in Tetrahymena and insect/mammal cells. We anticipate obtaining more detailed results soon. Therefore, we prefer to present the MT recognition model with robust experimental evidence in the future, and didn’t discuss too much on this aspect in the current manuscript.

      (7) Adding a figure including a proposed model, as well as expanding the discussion on the points presented as "weaknesses" would help clarify the ideas/hypothesis on how the mating recognition works. I think this would really elevate the paper and help highlight the results.

      We added a figure to introduce the model and the weaknesses in the revised version (Figure 7, line 656-665).

      (8) Line 202-203: It is far-fetched to infer subcellular localization based on the data presented here, couterstaining with other dyes and antibodies specific to certain cell components, as well as negative control images, are required.

      Thanks for the suggestion. We attempted to stain cell components using various dyes and antibodies. Unfortunately, we found that cell surface and cilia (especially oral cilia) is very easy to give a false positive signal. We think this issue seriously affects the credibility of the results. It may seem like splitting hairs, but we are trying to be precise.

      Meanwhile, we still believe the mating-type proteins localizes to cell surface because MTA-HA is identified in the isolated cell surface proteins.

      Regarding negative control, as shown in Fig. 4G, where a MTB2-eGFP cell is pairing with a WT cell, no GFP signal is observed in the WT cell.

      (9) Lines 131: clarify the sentence - expression of Con-A receptors requires both MTA and MTB (MTA to receive the signal).

      We modified the sentence in the revised version (line 139-140).

      Reviewer #2:

      Minor points.

      (1) Line 194-196. Why are these cells able to self?

      These cells able to self may because the MTRC contain heterotypic mating-type proteins (MTA6 and MTB2), which activate mating when they interact with another heterotypic MTRC (line 207-208).

      (2) Line 232. What do the authors mean by the term synergistic effect here? Definition and statistics?

      Sorry about the confusion. The synergistic effect refers to the effect of MTAxc and MTBxc become stronger when using together. We clarified it in the revised version (line 232).

      (3) For Figure 4 panel D, are there antibodies that are available as a control for cilia? If so, then blotting this membrane would show that cilia-associated proteins are in the cilia preparation, which is a standard control for sub-cellular fractionation.

      Thanks for the suggestion. Unfortunately, we didn’t find a suitable cilia-specific antibody yet. Instead, we employed MS analysis to confirm the presence of cilia proteins in this sample (line 195-196, Figure 4–Source data 1). We also observed the sample under the microscope, which directly revealed the presence of cilia (Figure 4C).

      (4) At least one reference cited in the text was not present in the reference list. The authors should go through the references cited to ensure that all have made it into the reference list.

      We have checked all the references.

      Some minor edits:

      (1) MTA and MTB are presented in both roman and italics (e.g. line 209) in the manuscript. Maybe all should be in italics? Or is this a distinction between the gene and the protein?

      The italics word (MTA) refers to gene, and non-italics word (MTA) refers to protein.

      (2) Line 251. Change "achieving" to "achieve".

      We have corrected this word (line 266).

      Reviewer #3:

      Line 101. It would help to explain this expectation earlier in this paragraph.

      We explained the expectation in the revised version (line 92-97, 104-106).

      Line 109. How is a co-receptor different from the MTRC complex?

      We have rewritten the relevant sentences to enhance clarity (line 116-119). The molecular function of the MTRC complex could involve acting as a co-receptor or recognizer (functioning as both ligand and receptor). Based on the results presented in this section, we propose that MTA and MTB may function as a complex, but the confirmation of this hypothesis (MTRC) is provided in a later section. Therefore, we did not use the term “MTRC” here. These sentences briefly discuss the molecular function of this complex and explain why MTRC does not appear to function as a co-receptor.

      Line 251: which "dual approach" is referred to?

      Dual approach is referred to both self and non-self recognition. We explained it in the revised version (line 265-266).

      Line 258: what "different mechanisms" do the authors have in mind? Why would a different mechanism be expected? The different sizes could have evolved for (coevolutionary?) selection on the same mechanism.

      Sorry about the confusion. We clarified it in the revised version (line 269-278).

      What we intended to express is that we are uncertain whether the mating-type recognition model we discovered in T. thermophila is applicable to all Tetrahymena species due to significant differences in the length of the mating-type-specific region. We believe it is important to highlight this distinction to avoid potential misinterpretations in future studies involving other Tetrahymena species. At the same time, we look forward to future research that may provide insights into this question.

      Fig 2 C&D. Is it correct that these figures show the strains only after 'preincubation'? This is not apparent from the caption of the text. Additionally, the order of the images is very confusing. Write in the figures (so not just in the caption) what the sub-script means.

      These panels are re-organized in the revised version (Fig. 2C&D). There are three kinds of pictures: “not incubated”, “WT pre-incubated by mutant” and “mutant pre-incubated by WT”.

      The methods used to generate Figure 5 are not clearly described. I understand that the obtained xc proteins were added to the cells, and then washed, after which a test was performed mixing WT-VI and WT-VII cells. Were both cells treated? Or only one of the strains? The explanation for the reused washing medium is not clear and the method is not indicated.

      Both cells are treated. More details are provided in the revised manuscript (line 230-231, 633-634, 637-639, Fig. 5). To prepare the starvation medium containing mating-essential factors, cells were starved in fresh starvation medium for ~16 hours. Subsequently, cells were removed by three rounds of centrifugation (1000 g, 3 min) (line 330-332).

      In general, the figures are difficult to understand without repeated inquiries in the captions. Give more information in the figures themselves.

      More information is introduced in the figure (Fig. 2C, Fig. 3B, Fig. 4A, B, D, Fig. 5 and Fig. 6).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters is large and their uncertain range is not negligible».

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted θ∈Θ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted i ∈I) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy π_i at the start which could be called during the GRN’s trajectory to sample control actions π_i (a_(t+1) 〖|s〗_(t0:t+1),a_t) where s_t would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56]. While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10μM in RKIPP_RP levels and ~300μM in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a) Where is 'effective intervention' used in the method?

      b) in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and propose to replace it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we will try to clarify those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We propose to replace “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states. We could indeed add other case studies in the supplementary to support the argument for the revised version.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives. Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results. We report the novel results at https://developmentalsystems.org/curious-exploration-of-grn-competencies/tuto2.html (bottom cell, Figure 4). We suggest including the updated figure and caption in the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This is significant work, and you should certainly make the best case you can on the weaknesses discussed.

      We thank reviewer for this positive comment on the significance of our work. The referee indicates as weaknesses (i) that the force involving the bent or straight αI-helix is not readily apparent, (ii) the residue types were not varied in the helix mutations, and (iii) that the chemical shift perturbations are indirect observations.

      We think we have tried to address a large part of these questions by being very careful in our analysis and by the discussion in the manuscript. The following remarks may help to clarify this further:

      (i) The force emanating from the helix is e.g. visualized in the PC2 loadings in Figure 6E of the PCA carried on all observed SH3-SH2-KD resonances for all apo forms of the helix mutants. The SH2 residues identified by these loadings are in direct vicinity to the αI-helix. The respective PC2 scores correlate to 98% with the vmax of the catalytic reaction and to 94 % with the PC1 scores found for imatinib-induced opening. Importantly, the structure of the KD with the straight αI-helix indicates that mostly residues F516, Q517, S520, and I521 would clash with the SH2 domain in a closed core (Figure 6F). Thus, the expected clashes are in direct vicinity of the SH2 residues identified by the PC2 loadings as correlated to vmax and imatinib-induced opening. These data are completely orthogonal and show that most of the force is coming from residues F516, Q517, S520, and I521 in the αI-αI’ turn.

      (ii) We agree that we mainly used truncations of the αI-helix to study its involvement in activation. Point (i) makes it clear that a larger part of the αI-helix effects is caused by steric clashes of the residues in the αI-αI’ turn. In the latter region, we don’t expect strong amino acid type-specific effects besides excluded volume. Due to expression problems, we could not vary the helix length between residues 519 and 534. However, in this region we introduced the amino acid type mutation E528K. The latter showed a clear specific effect. Further amino acid type-specific effects may be possible in this region. However, we expect that the identified electrostatic E528-R479 interaction is one of the most important interactions in this region.

      (iii) We agree that chemical shift changes of individual resonances are often hard to interpret. However, we want to stress that our conclusions are all drawn from principal component analyses, which in all cases had as input well over 100 if not over 200 1H-15H resonances. The first two principal components of these analyses are robust averages over many residues, which reveal general correlated structural trends.

      We assume that chemical shift deposition etc will be pursued.

      We are currently depositing a larger collection of our Abl data to the “Biological Magnetic Resonance Data Bank (BMRB)”, which includes the NMR chemical shift data of the present work. A ‘collection’ will be a new feature of the BMRB, and we are in discussion with their staff. We will provide the accession codes as soon as possible (probably within the next month) to be included into the final version of the manuscript. We have amended the Data Availability Section accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1) The overall discussion of the implications of the described allostery on kinase activation is provided through lenses of imatinib binding, which is used as an experimental trigger to disassemble the autoinhibited core. Can the authors elaborate in the Discussion on what event would play this role in the kinase catalytic cycle, communicating to helix I? Would dissociation of the myristate from the active site be hypothesized to be the first step in kinase activation? While I understand that certainty may be challenging to attain, it would be good to introduce some ideas into the Discussion.

      We appreciate the reviewer’s suggestions for the discussion and added the following text to the Conclusion section:

      "We have used here imatinib binding to the ATP-pocket as an experimental tool to disassemble the Abl regulatory core. Our previous analysis (Sonti et al., 2018) of the high-resolution Abl transition-state structure (Levinson et al., 2006) indicated that due to the extremely tight packing of the catalytic pocket, binding and release of the ATP and tyrosine peptide substrates is only possible if the P-loop and thereby the N-lobe move towards the SH3 domain by about 1–2 Å. This motion is of similar size and direction as the motion of the N-lobe observed in complexes with imatinib and other type II inhibitors (Sonti et al., 2018). From this we concluded that substrate binding opens the Abl core in a similar way as imatinib. The present NMR and activity data now clearly establish the essential role of the αI-helix both in the imatinib- and substrate-induced opening of the core, thereby further corroborating the similarity of both disassembly processes.

      Notably, the used regulatory core construct Abl83-534 lacks the myristoylated N-cap. Although we have previously demonstrated that the latter construct is predominantly assembled (Skora et al., 2013), the addition of the myristoyl moiety is expected to further stabilize the assembled conformation in a similar way as asciminib.

      Considering this mechanism, dissociation of myristoyl from the native Abl 1b core may be a first step during activation. However, it should be kept in mind that the Abl 1a isoform lacks the N-terminal myristoylation, and it is presently unclear whether other moieties bind to the myristoyl pocket of Abl 1a during cellular processes."

      2) Can the authors comment more on the differentiation between assembled conformations induced by type I inhibitor binding vs apo forms (or AMP-PNP and allosteric inhibitor) reported in Figure 3B? The differences are clearly identified by PCA but not sufficiently discussed.

      As indicated in the text, we think two structural effects are intermingled within PC2. Due to this admixture, it is hard to draw strong conclusions and we don’t want to expand on this too much. We have slightly modified the respective paragraph (p.7) as follows):

      "As the affected residues react differently to perturbations by type I inhibitors and truncation of the αI’-helix (Figure 3A, right), we attribute this behavior to two effects intermixed into the PC2 detection: (i) a minor rearrangement of the SH3/KD N-lobe interface caused by filling of the ATP pocket with type I inhibitors, which in contrast to the stronger N-lobe motion induced by type II inhibitors does not yet lead to core disassembly and (ii) a small rearrangement of the SH2/KD C-lobe interface caused by shortening and mutations of the αI-helix."

      3) The allosteric connection between active site inhibitor binding and the myristate/allosteric inhibitor binding has been observed in the past and noted before, in papers such as Zhang et al, Nature 2010. While the authors reference this paper, they do not acknowledge its specific findings or engage in a broader discussion of how their conclusions relate to this work.

      We have modified the beginning of the Conclusion section:

      "The allosteric connection between Abl ATP site and myristate site inhibitor binding has been noted before, albeit specific settings such as construct boundaries and the control of phosphorylation vary in published experiments. Positive and negative binding cooperativity of certain ATP-pocket and allosteric inhibitors has been observed in cellular assays and in vitro (Kim et al., 2023; Zhang et al., 2010). Furthermore, hydrogen exchange mass spectrometry has indicated changes around the unliganded ATP pocket upon binding of the allosteric inhibitor GNF-5 (Zhang et al., 2010). Here, we present a detailed high-resolution explanation of these allosteric effects via a mechanical connection between the kinase domain N- and C-lobes that is mediated by the regulatory SH2 and SH3 domains and involves the αI helix as a crucial element.

      Specifically, we have established a firm correlation between the kinase activity of the Abl regulatory core, the imatinib (type II inhibitor)-induced disassembly of the core, which is caused by a force FKD–N,SH3 between the KD N-lobe and the SH3 domain, and a force FαI,SH2 exerted by the αI-helix towards the SH2 domain. The FαI,SH2 force is mainly caused by a clash of the αI-αI’ loop with the SH2 domain. Both the FKD–N,SH3 and FαI,SH2 force act on the KD/SH2SH3 interface and may lead to the disassembly of the core, which is in a delicate equilibrium between assembled and disassembled forms. As disassembly is required for kinase activity, the modulation of both forces constitutes a very sensitive regulation mechanism. Allosteric inhibitors such as asciminib and also myristoyl, the natural allosteric pocket binder, pull the αI-αI’ loop away from the SH2 interface, and thereby reduce the FαI,SH2 force and activity. Notably, all observations described here were obtained under nonphosphorylated conditions, as phosphorylation will lead to additional strong activating effects."

      4) Figure 6 could do a better job of providing an illustration of steric clashes.

      We have revised Figure 6, panel F, in order to better illustrate the steric clashes, and modified the legend accordingly.

      5) There is a typo in line 5 from the top on page 11 (dash missing from "83534" superscript).

      Thank you. This was fixed.

    1. Author Response

      Many thanks for handling our manuscript (eLife-RP-RA-2023-93968) entitled "Allosteric modulation of the CXCR4:CXCL12 axis by targeting receptor nanoclustering via the TMV-TMVI domain", by García-Cuesta et al. We are delighted to hear your willingness to consider our manuscript following appropriate revision. We have carefully read the referees' commentaries and have organized new experiments to address their specific queries.

      Reviewer #1 (Public Review)

      The computational methodology is going to be carefully reviewed. In particular to justify the software and techniques used in this manuscript. We will also describe the method for identifying the pocket on the CXCR4 structure as well as the workflow used to explain the transition from docking evaluation to MD analyses. Additionally, we will conduct experiments to enhance the results and address the specific feedback provided, ultimately improving the overall reliability.

      Reviewer #2 (Public Review)

      Although the paper was initiated by titrating the compounds in migration experiments, we are going to add new kinetics and titration of concentrations in these experiments. In addition, we are going to change the way in which we present the data from the singlemolecule tracking experiments. We will add a representative video of each experimental condition, and include some of the mean square displacement curves to support our data on the analysis of the diffusion coefficient (D1-4) to give a more conclusive view of receptor clustering. Regarding the tumorigenesis experiments we will include the individual data points and we will try to perform kinetics with distinct concentrations of the drug.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript from Kavanjoo et al examines the role of macrophages within the fetal liver beyond erythrocyte maturation. Using single-cell sequencing, high-resolution imaging, and inducible genetic deletion of yolk-sac (YS) derived macrophages, the authors demonstrate that heterogeneous fetal liver macrophages regulate erythrocyte enucleation, interact physically with fetal HSCs, and may regulate neutrophil accumulation in the fetal liver. The data as presented do not strongly support the authors’ conclusion that fetal macrophages in the liver regulate the HSC niche or granulopoiesis from HSCs.

      Fetal-derived resident tissue macrophages are increasingly implicated in regulation of adult tissue function and homeostasis, but considerably less is known regarding the function of fetal macrophages during development. Macrophages in the fetal liver have been shown to form erythroblastic islands, where they regulate erythrocyte maturation. Here, the authors performed single-cell sequencing on fetal liver macrophages (Cd11b-lo) to gain insight into heterogeneity and utilized previously published pre-Mac signatures from the YS to focus on YS-derived macrophages. These clusters were then further cross-referenced with surface protein expression as determined by multidimensional flow cytometry to hone in on a very specific subset of three groups of F4/80hi macrophages defined by multiple surface markers. Fate-mapping with three models (Tnfrsf11a-Cre - YS pMAC derived; Ms4a3Cre - FL monocyte derived; CXCR4-Cre-ERT2 - definitive HSC derived) revealed that three major subsets are all derived from YS pMACs.

      We thank the reviewer for the comments and have addressed all points below. If certain points were mentioned twice, we responded at the position where the point was raised the first time.

      However, the relative frequencies of these specific populations are not shown, and because the single sequencing analysis goes through so many iterations of re-clustering that initiates by focusing specifically on pMAC signatures, this result is not surprising.

      Probing gene expression within each of the three clusters revealed ligand expression suggesting cell-cell interactions, and cross-referencing with a fetal LT-HSC gene expression dataset revealed potential receptor-ligand interactions. Microscopic investigation of physical interactions between specific macrophage subsets and HSCs was not particularly convincing. In Figure 3C, for example, Cluster C is very difficult to visualize. It would again be helpful to know what the ratios are within the FL for each cluster. Data in Figure 3F are not well represented by Data in Figure 3E.

      We showed frequencies after CODEX in the original manuscript (Fig. S3A, now Figure 4 - figure supplement 1A) since isolation of cells often induces an artifact, and relative frequencies after scRNA-seq experiments never represent the actual cell numbers present in situ. However, also the CODEX analysis has its weakness, especially in dense tissues, as the automated gating method may not catch every macrophage due to its star-shaped structure. Thus, we have now included the absolute numbers of macrophage subpopulations in Figure 7C. We have tried to improve the visualization of the clusters in Figure 3C (now Figure 4C) by zooming into a specific region. The Voronoi diagram is a powerful method that allows for an overall spatial visualization of cell distribution in large tissue pieces. In the high-resolution PDF that we provide, zooming into the PDF file should allow the reader to see each cluster in great detail.

      To improve the data of macrophage-HSC interaction we have performed 3D reconstructions and quantified the distance of CD150+ and Iba1+ cells in 3D (new Figure 3C-E) as the thin cryosectioning used for CODEX is not suitable to reconstruct these interactions properly (see also lines 328-331). Thus, Figure 3E was not able and also not meant to represent data shown in Figure 3F (now Figure 4E and 4F). Figure 3E is just meant to show examples of all clusters sitting in proximity to CD150+ HSCs.

      Furthermore, deletion of YS pMAC-derived macrophages the Tnfrsf11a-Cre X Spi1fl/fl resulted in broad macrophage depletion - although the authors did not demonstrate this using the carefully refined phenotypes they had defined earlier in the manuscript. Nonetheless, the authors demonstrate that macrophage depletion did affect erythroid enucleation, as expected, and the authors also showed some effect of macrophage deletion on LT-HSC gene expression by bulk transcription analysis. These effects were relatively small, however, and this was clear in the absence of effects on hematopoiesis in vivo or HSC proliferation ex vivo. To further investigate the effects of macrophage deletion on downstream hematopoieisis, the authors re-assessed the myeloid compartment following macrophage deletion, and identified and specifically focused on an observed increase in neutrophils in response to macrophage depletion. Based on this increase, they tested HSC differentiation using a colony-forming assay, which shows a slight increase in GM colonies that is also reflective of a slight but insignificant increase in total colony forming capability. The authors concluded that loss of fetal macrophages causes a reprogramming of HSCs to the granulocytic lineage. However, the colony-forming assay and subtle differences in gene expression are not sufficient to conclude that fetal HSCs have been reprogrammed towards granulocytic lineage by macrophage deletion.

      We thank the reviewer for this comment and have improved the manuscript accordingly: We have performed the colony-forming assay again with n=5 embryos per genotype that were harvested on the same day, which resulted in a similar phenotype as before, with the differences of GM colonies now being significant. Further, we quantified the depletion of all macrophage subpopulations in the Tnfrsf11a-Cre X Spi1fl/fl model (Fig. 7C). To strengthen the point that the transient lack of macrophages when HSCs arrive in the fetal liver leads to their reprograming, we included flow cytometry data from E16.5 and E18.5 where we still see an increase of neutrophils in the fetal liver, despite the fact that macrophages are repopulating the empty niche (Fig. 7E, F). To show that this is a cell-intrinsic effect, we have performed adoptive transfer experiments supporting our claim that loss of macrophages reprograms HSCs toward the granulocytic lineage (Fig. 7H, I)

      Overall, there are some interesting pieces of data in this manuscript, including the classification of new subsets of macrophages in the liver, their fate-mapping to the YS, and gene expression analysis. However, the data as presented do not strongly support a role for these particular macrophage subsets in regulating HSCs or fetal hematopoiesis within the fetal liver niche. Although there may be specific subsets of fetal liver macrophages that more closely physically interact with HSCs, deletion of what appeared to be a vast majority of macrophages in the FL did not appear to affect cellularity of hematopoietic stem and progenitor cells in vivo, and was not shown to convincingly affect HSC function. The mechanism by which macrophage deletion affected granulopoiesis could be independent from HSCs, and would be interesting to further explore.

      We hope that with new set of experiments we were able to convince the reviewer of the importance of macrophages in the HSC niche.

      Reviewer #2 (Public Review):

      Using a single-cell omics approach combined with spatial proteomics and genetic fate mapping, Kayvanjoo et al found that fetal liver (FL) macrophages cluster into distinct yolk sac-derived subpopulations and that some of the HSCs in FL preferentially associate with one of the identified macrophage subpopulations. FLs lacking macrophages show a delay in erythropoiesis. The authors also try to identify a role of macrophages for HSCs function in FL, and claim that macrophages affect myeloid differentiation of HSCs. Experimental support for the function of macrophages on HSCs remains weak. Taken together, their data provide a precise map of FL macrophage subpopulations, which is novel and will serve the field well.

      We thank the reviewer for the positive assessment. We have now strengthened the data regarding the impact of granulopoiesis by performing additional CFU assays and adoptive transfers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank the reviewers for their insightful comments on our manuscript. We have addressed the reviewers’ comments below and in the revised manuscript.

      Reviewer #1:

      Comment #1: The authors found differences in the initial spike doublet of action potentials between cortical neurons in experimental and control conditions (Figure 2e). The action potential firing frequency of the first two APs (instant firing frequency) of recorded neurons shall be quantified to investigate whether there are statistical differences between the action potential firing frequency in cortical neurons in different experimental groups versus control conditions.

      Response: As suggested by the reviewer, we have quantified the first interspike interval (ISI; time between the 1st and 2nd action potential). The data is included in Fig. 2h as well as in Fig. S3e and Table 1. The Results and Methods have also been updated accordingly.

      Comment #2: The mTORS12215Y induced the largest changes in Ih current amplitudes in cortical neurons compared with other experimental conditions. Whether the HCN4 channel expression is regulated by mTOR pathway activation, or could there be possible interactions between the HCN channel and mTORS12215Y mutant protein?

      Response: Our previous findings using the RhebS16H mutation support the idea that increased expression of HCN4 channels is regulated by mTOR pathway activation. This is evidenced by its sensitivity to rapamycin (a mTOR inhibitor) and expression of constitutively active 4E-BP1 (a translational repressor downstream of mTORC1). Since mTORS2215Y directly hyperactivates mTORC1 and there are no known interactions between HCN channels and mTORS2215Y, our data strongly suggests that abnormal HCN4 channel expression occurs via mTORC1 hyperactivation in this condition. We have revised our Discussion to make this point clearer.

      Comment #3: A comparison of the electrophysiological characteristics of cortical neurons in different experimental conditions in the present study and pathological neurons in human FCD reported in previous literature could be interesting. Inducing pathological gene mutations or knocking out key genes in mTOR pathway in the rodent cortex - which approach could better model human FCD?

      Response: We agree with the reviewer and have added a new paragraph in the Discussion to compare our electrophysiology results to those of previous studies done on human FCDII and TSC cytomegalic neurons. With regards to the reviewer’s question about which of the two approaches in the rodent cortex – inducing pathological gene mutations or knocking out key genes in the mTOR pathway – would better model human FCD, our study emphasizes the importance of considering gene-specific mechanisms in FCDII. Thus, modeling the genetically distinct FCDIIs will require using gene-specific manipulations. We have revised our Discussion to include this point. With that said, for some phenotypes that are generalized across FCDII independent of the mTOR pathway genes, using pathogenic mutations of mTOR activators or knockout of negative mTOR regulators would likely both be appropriate models. Of note, as discussed in the manuscript, there are also technical factors to be considered when choosing to use a pathogenic gene mutation versus knocking out a gene (the latter which would depend on the half-life of the proteins).

      Reviewer #2:

      Comment #1: The authors postulate that all the findings are dependent on mTORC1-related effects but don't assess whether some of the differences could be due to effects on mTORC2 signaling. mTORC2 is an important and poorly understood alternative isoform of mTOR (due to rictor binding) that has effects on distinct cell signaling pathways and in particular actin polymerization. This doesn't diminish the effects of the current analysis of mTORC1 but could explain genotypic differences in each variable. A few prior studies have assessed the role of mTORC2 in epileptogenesis and cortical malformations (Chen et al., 2019).

      Response: We agree with the reviewer and have revised our Discussion to include the possibility of mTORC2 contribution to the gene-specific phenotypic differences.

      Comment #2: The slice recordings were performed in the usual recording aCSF buffer conditions but there is no assessment of the role of amino acids or nutrients in the bath. While it is clear that valuable and viable acute slice recordings can be made in aCSF, the role of the mTOR pathway is to modulate cell growth in response to nutrient conditions. Thus, one variable that could be manipulated and assessed currently in this study is the levels of amino acids i.e., leucine and arginine added to the bath since DEPDC5 and TSC1 are responsive to ambient amino acid levels.

      Response: We thank the reviewer for this great suggestion, and we intend to pursue this as part of another study.

      Comment #3: The analysis concedes that the role of somatic mutations in cortical malformations may depend not only on genotypic effects but also on allelic load and cellular subtype affected by the mutation. Thus, it would be interesting to see if electroporation either at E14 or E16, thereby affecting a distinct pool of progenitors, would mitigate or accentuate differences between mTOR pathway genes.

      Response: We agree with the reviewer. This is a crucial experiment that we hope to perform in the future. We have also added a paragraph in our Discussion to address this important point.

      Comment #4: Treatment with rapamycin and zatebradine in each condition would have added to the strength of the findings to determine the mTOR-dependence and reversibility of HCN4 effects.

      Response: We previously demonstrated the mTORC1 dependence of HCN4 expression in the RhebS16H condition using rapamycin and expression of constitutively active 4E-BP1. 4E-BP1 is a translational repressor downstream of mTORC1. In the 4E-BP1 study, we used a conditional system to express 4EBP1F113A (mutation that resists inactivation by mTORC1) in adolescent mice while RhebS16H (and thus mTORC1 activation) was expressed embryonically. 4E-BP1F113A expression suppressed Ih current and HCN4 expression, suggesting that aberrant HCN4 expression can be reversed by decreasing mTORC1regulated translation. Based on these data and the findings that rapamycin suppressed abnormal HCN4 expression, we postulate that increased HCN4 expression in the different gene conditions examined in the present study occurs via the mTORC1 pathway. However, we agree with the reviewer that treating each of the conditions with rapamycin would provide direct evidence of their mTORC1 dependence. Additionally, treating each condition with the HCN channel blocker zatebradine would also add strength to the findings. We have added a comment in the Discussion to acknowledge this point.

      Reviewer #1 (Recommendations For The Authors):

      Comment #1: The authors found increased frequency or amplitudes of spontaneous postsynaptic currents in different experimental cohorts. These data may not be sufficient to conclude increased synaptic excitability, because there are no pharmacological experiments to verify whether the recorded inward currents are excitatory or inhibitory postsynaptic currents. An alternative approach could be analyzing the decay time of spontaneous postsynaptic currents, the excitatory postsynaptic currents had relatively faster decay time compared with inhibitory postsynaptic currents.

      Response: Thank you for the comment. We apologize for the lack of clarity and have added the following text in the Results to clarify: “To separate sEPSCs from spontaneous inhibitory postsynaptic currents (sIPSCs), we used an intracellular solution rich in K-gluconate to impose a low intracellular Cl- concentration and recorded at a holding potential of -70 mV, which is near the Cl- reversal potential. The 90%-10% decay time of the measured synaptic currents ranged between 4-8 ms in all conditions (mean ± SD: control: 4.9 ± 1.6; RhebY35L: 5.2 ± 1.4; mTORS2215Y: 7.4 ± 1.4; control: 6.8 ± 0.7; Depdc5KO: 7.4 ± 1.0; PtenKO: 8.1 ± 0.9; Tsc1KO: 7.4 ± 0.9), consistent with the expected decay time for sEPSCs and shorter than the decay time for sIPSCs (Kroon et al, 2019). The recorded synaptic currents were therefore considered to be sEPSCs.”

      Comment #2: There are typos of Depdc5 in the text and figure legends.

      Response: Thank you for noticing this error. We have corrected the typos in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Zhu and colleagues aimed to clarify the importance of isoform diversity of PCDHg in establishing cortical synapse specificity. The authors optimized 5' single-cell sequencing to detect cPCDHg isoforms and showed that the pyramidal cells express distinct combinations of PCDHg isoforms. Then, the authors conducted patch-clamp recordings from cortical neurons whose PCDHg diversity was disrupted. In the elegant experiment in Figure 3, the authors demonstrated that the neurons expressing the same sets of cPCDHg isoforms are less likely to form synapses with each other, suggesting that identical cPCDHg isoforms may have a repulsive effect on synapse formation. Importantly, this phenomenon was dependent on the similarity of the isoforms present in neurons but not on the amount of proteins expressed.

      One of the major concerns in an earlier version was whether PCDHg isoforms, which are expressed at a much lower level than C-type isoforms, have true physiological significance. The authors conducted additional experiments to address this point by using PCDHg cKO and provided convincing data supporting their conclusion. The results from PCDHg C4 overexpression, showing no impact on synaptic connectivity, further clarified the importance of isoforms. I have no further concerns, however, I would like to point out that the evidence for the necessity of the PCDHg isoform is still lacking because most experiments were done by overexpression. It would be helpful for the readers if the authors could add this point to the discussion.

      Thank you for the positive feedback on our work. We have now incorporated a discussion of the limitations associated with overexpression.

      Reviewer #2 (Public Review):

      This short manuscript by Zhu et al. describes an investigation into the role of gamma protocadherins in synaptic connectivity in the mouse cerebral cortex. First, the authors conduct a single-cell RNA-seq survey of postnatal day 11 mouse cortical neurons, using an adapted 10X Genomics method to capture the 5' sequences that are necessary to identify individual gamma protocadherin isoforms (all 22 transcripts share the same three 3' "constant" exons, so standard 3'biased methods can't distinguish them). This method adaptation is an advance for examining individual gamma transcripts, and it is helpful to publish the method, the characterization of which is improved in this revised manuscript. The results largely confirm what was known from other approaches, which is that a few of the 19 A and B subtype gamma protocadherins are expressed in an apparently stochastic and combinatorial fashion in each cortical neuron, while the 3 C subtype genes are expressed ubiquitously. Second, using elegant paired electrophysiological recordings, the authors show that in gamma protocadherin cortical slices, the likelihood of two neurons on layers 2/3 being synaptically connected is increased. That suggests that gamma protocadherins generally inhibit synaptic connectivity in the cortex; again, this has been reported previously using morphological assays, but it is important to see it confirmed here with physiology. Finally, the authors use an impressive sequential in utero electroporation method to provide evidence that the degree of isoform matching between two neurons negatively regulates their reciprocal synaptic connectivity. These are difficult experiments to do, and while some caveats remain, the main result is consistent. Strengths include the impressive methodology and improved demonstration of the previously-reported finding that gamma protocadherins work via homophilic matching to put a brake on synapse formation in the cortex. Weaknesses include the writing, which even in the revision fails to completely put the new results in context with prior work, which together has largely shown similar results; a still-incomplete characterization of a new alpha protocadherin KO mouse (a minor point but it should still be addressed); and a lack of demonstration of protein levels in electroporated brains. Because of the unique organization and expression pattern of the gamma protocadherins, it is unlikely that these results will be directly applicable to the broader understanding of the role of cell adhesion molecules in synapse development. However, the methodology, which is now better described, should be applicable more broadly and the improved demonstration of the role of gamma protocadherin's negative role in cortical synaptogenesis is helpful.

      Thank you for the positive comments on our work. We have taken your suggestion into account and expanded our discussion to contextualize our research within the broader field of PCDH. Additionally, we have included more data to further illustrate the decrease in αPCDH expression in Pcdha conditional knockout mice. Your feedback has been invaluable in enhancing our manuscript.

      Reviewer #3 (Public Review):

      In this study, Zhu and authors investigate the expression and function of the clustered Protocadherins (cPcdhs) in synaptic connectivity in the mouse cortex. The cPcdhs encode a large family of cadherin-related transmembrane molecules hypothesized to regulate synaptic specificity through combinatorial expression and homophilic binding between neurons expressing matching cPcdh isoforms. But the evidence for combinatorial expression has been limited to a few cell types, and causal functions between cPcdh diversity and wiring specificity have been difficult to test experimentally. This study addresses two important but technically challenging questions in the mouse cortex: 1) Do single neurons in the cortex express different cPcdh isoform combinations? and 2) Does Pcdh isoform diversity or particular combinations among pyramidal neurons influence their connectivity patterns? Focusing on the Pcdh-gamma subcluster of 22 isoforms, the group performed 5'end-directed single-cell RNA sequencing from dissociated postnatal (P11) cortex. To address the functional role of Pcdhg diversity in cortical connectivity, they asked whether the Pcdhgs and isoform matching influence the likelihood of synaptic pairing between 2 nearby pyramidal neurons. They performed simultaneous whole-cell recordings of 6 pyramidal neurons in cortical slices, and measured paired connections by evoked monosynaptic responses. In these experiments, they measured synaptic connectivity between pyramidal neurons lacking the Pcdhgs, or overexpressing dissimilar or matching sets of Pcdhg isoforms introduced by electroporation of plasmids encoding Pcdhg cDNAs.

      Overall, the study applies elegant methods that demonstrate that single cortical neurons express different combinations of Pcdh-gamma isoforms, including the upper layer Pyramidal cells that are assayed in paired recordings. The electrophysiology data demonstrate that nearby Pyramidal neurons lacking the entire Pcdhg cluster are more likely to be synaptically connected compared to the control neurons, and that overexpression of matching isoforms between pairs decreases the likelihood to be synaptically connected. These are important and compelling findings that advance the idea that the Pcdhgs are important for cortical synaptic connectivity, and that the repertoire of isoforms expressed by neurons influence their connectivity patterns potentially through a self/nonself discrimination mechanism. However, the findings are limited to probability in connectivity and do they do not support the authors' conclusions that Pcdhg isoforms regulate synaptic specificity, 'by preventing synapse formation with specific cells' or to 'unwanted partners'. Characterizations of the cellular basis of these defects are needed to determine whether they are secondary to other roles in cell positioning, axon/dendrite branching and synaptic pruning, and overall synaptic formation. Claims that Pcdh-alpha and Pcdhg C-type isoforms are not functionally required are premature, due to limitations of the experiments. Moreover, claims that 'similarity level of γPCDH isoforms between neurons regulate the synaptic formation' are not supported due to weak statistical analyses presented in Fig4. The overstatements should be corrected. There was also missed opportunity to clearly discuss these results in the context of other published work, including recent publications focused on the cortex.

      Thank you for your feedback on the strengths and weaknesses of our work. In terms of the cellular basis of affected synaptic connectivity caused by γ-PCDH isoforms, we have compared the probability of connectivity for neuronal pairs with similar range of distance. Our findings indicate that the manipulation primarily affects pairs within the 50-150 micrometer range, suggesting that cell positioning might be a critical factor for the impact of γ-PCDH on synapse formation. However, we acknowledge that we couldn't definitively determine whether the negative effect on synaptic connectivity stems directly from impaired synapse formation or indirectly from synaptic pruning or the influence of PCDHγ on axon/dendritic branching. We've added these limitations to our discussion to provide a more comprehensive view of our research. Furthermore, we've adjusted our statements to better reflect the significance of our findings. Your feedback has been instrumental in improving the clarity and depth of our manuscript.

      Strengths:

      • The 5' end sequencing with a Pcdhg-amplified library is a technical feat and addresses the pitfall of conventional scRNA-Seq methods due to the identical 3'sequences shared by all Pcdhg isoform and the low abundance of the variable exons. New figures with annotated cell types confirm that several pyramidal and inhibitory cortical subpopulations were captured.

      Statistical assessment of co-occurrence of isoform expression within clusters is also a strength.

      • By establishing the combinatorial expression of Pcdhgs by maturing pyramidal cells, the study further substantiates the 'single neuron combinatorial code for cPcdhs' model. Although combinatorial expression is not universal (ie. serotonergic neurons), there was limited evidence. The findings that individual pyramidal neurons express ~1-3 variable Pcdhg transcripts plus the Ctype transcripts aligns with single RT-PCR studies of single Purkinje cells (Esumi et al 2005; Toyoda et al 2014). They differ from the findings by Lv et al 2022, where C-type expression was lower among pyramidal neurons. OSNs also do not substantially express C-type isoforms (Mountoufaris et al 2017; Kiefer et al 2023). Differences, and the advantages of the 5'end -directed sequencing (vs. SmartSeq) could be raised in the discussion.

      • Simultaneous whole-cell recordings and pairwise comparisons of pyramidal neurons is a technically outstanding approach. They assess the effects of Pcdhg OE isoform on the probability of paired connections.

      • The connectivity assay between nearby pairs proved to be sensitive to quantify differences in probability in Pcdhg-cKO and overexpression mutants. The comparisons of connectivity across vertical vs lateral arrangement are also strengths. Overexpressing identical Pcdhg isoform (whether 1 or 6) reduces the probability of connectivity, but there are caveats to the interpretations (see below).

      Weaknesses:

      n earby pairs but are not sufficient evidence for synapse specificity. The cPcdhs play multiple roles in neurite arborization, synaptic density, and cell positioning. Kostadinov 2015 also showed that starburst cells lacking the Pcdhgs maintained increased % connectivity at maturity, suggesting a lack of refinement in the absence of Pcdhgs. The known roles raise questions on how these manipulations might have primary effects in these processes and then subsequently impact the probability of connectivity. Investigations of morphological aspects of pyramidal development would strengthen the study and potentially refine the findings. The authors should more clearly relate their findings to the body of cPcdh studies in the discussion.

      Previous studies revealed the adverse effects of γ-PCDHs on dendritic spines, demonstrating that their absence results in increased dendritic spines density, while overexpression leads to a reduction. In our study, we consistently observed that γ-PCDHs exert a negative influence on synaptic connectivity. This consistency strengthens the overall body of evidence in support of the role of γ-PCDHs in synaptic connectivity and dendritic spine regulation. While we have previously mentioned this point in our discussion to highlight the concordance between our findings and prior research, your input is greatly appreciated in reinforcing the scientific context of our work.

      • Pcdhg cKO-dependent effects on connectivity occur between closely spaced soma (50-100um - Figure 2E), highlighting the importance of spatial arrangement to connectivity (also noted by Tarusawa 2016). Was distance considered for the overexpression (OE) assays, and did the authors note changes in cell distribution which might diminish the connectivity? Recent work by Lv et al 2022 reported that manipulating Pcdhgs influences the dispersion of clonally-related pyramidal neurons, which also impacts the likelihood of connections. Overexpression of Pcdhgc3 increased cell dispersion and decreased the rate of connectivity between pairs. Though these papers are mentioned, they should be discussed in more detail and related to this work.

      Our data indicated that variable γ-PCDH isoforms primarily influence synaptic connectivity in neuronal pairs within the 50-150 micrometer range. Notably, as the distance between neurons increases, we observed a corresponding reduction in synaptic connectivity, as illustrated in Figure 2E. We have also included additional discussion regarding potential variances among different C-type isoforms.

      • Though the authors added suggested citations and improved the contextualization of the study, several statements do not accurately represent the cited literature. It is at the expense of crystalizing the novelty and importance of this present work. For instance, Garrett et al 2012 PMID: 22542181 was the first to describe roles for Pcdhgs in cortical pyramidal cells and dendrite arborization, and that pyramidal cell migration and survival are intact. Line 52 cited Wang et al 2002, but this was limited to gross inspection. Garrett et al is the correct citation for: 'The absence of γ-PCDH does not cause general abnormality in the development of the cerebral cortex, such as cell differentiation, migration, and survival (Wang et al., 2002).' Second, single cell cPcdh diversity is introduced very generally, as though all neuron types are expected to show combinatorial variable expression with ubiquitous C-Type expression. But those initial studies were limited to Purkinje cells (Esumi 2005 and Toyoda 2014). Profiling of serotonergic neurons and OSN reveals different patterns (citations needed for Chen 2017 PMID: 28450636; Mountofaris et al PMID: 2845063; Canzio 2023 PMID: 37347873), raising the idea that cPcdh diversity and ubiquitous Ctype expression is not universal. Thus, the authors missed the opportunity to emphasize the gap regarding cPcdh diversity in the cortex.

      We would like to extend our gratitude to the reviewer for pointing out the citation related to the roles of γ-PCDHs in the neocortex. After a thorough review of both papers, Wang et al., 2002 and Garrett et al., 2012, we concur that it would be more appropriate to cite both of these papers here. Your suggestion to underscore the diverse expression patterns of γPCDHs in neocortical neurons is well-received, and we have integrated this aspect of our findings with previous observations into a new paragraph within the discussion section. Your insights have greatly enriched the depth of our paper, and we genuinely appreciate your contribution.

      • They have not shown rigorously and statistically that the rate of connectivity changes with% isoform matching. In Figure 4D, comparisons of % isoform matching in OE assays show a single statistical comparison between the control and 100% groups, but not between the 0%, 11% and 33% groups. Is there a significant difference between the other groups? Significant differences are claimed in the results section, but statistical tests are not provided. The regression analysis in 4E suggests a correlation between % isoform similarity and connectivity probability, but this is not sound as it is based on a mere 4 data points from 4D. The authors previously explained that they cannot evaluate the variance in these recordings as they must pool data together. However, there should be some treatment of variability, especially given the low baseline rate of connectivity. Or at the very least, they should acknowledge the limitations that prevent them from assessing this relationship. Claims in lines 230+ are not supported: ' Overall, our findings demonstrate a negative correlation between the probability of forming synaptic connections and the similarity level of γPCDH isoforms expressed in neuron pairs (Fig. 4E)".

      We employed a bootstrap method to estimate the potential variance in the analysis presented in Fig. 4E. It's important to note that due to methodological limitations, a comprehensive assessment of variance based solely on recordings from a single animal is challenging. As such, we have adjusted our claims to be more aligned with our observations.

      • Figure 4 provides connectivity probability, but this result might be affected by overall synapse density. Did connection probability change with directionality (e.g between red to green cells, or green to red cells).

      As suggested by the reviewer, we have conducted an analysis to assess the directionality of connections under different conditions. This analysis involved comparing the directionalities of connections following the overexpression of six variable isoforms, as depicted in Fig. 3E. Upon examining 33 connected OE-Ctrl pairs following the electroporation of these 6 isoforms, we observed 3 pairs with bidirectional connections, 19 pairs with connections from OE to Ctrl, and 11 with connections from Ctrl to OE. To assess the statistical significance of these observations, we applied a Chi-square test. The results from this analysis indicated that there was no significant difference in the directionality of connections. These findings offer further support for the idea that overexpressing multiple γ-PCDH isoforms within a single neuron might not be sufficient to alter its connections with other neurons.

      • Generally, the statistical approaches were not sufficiently described in the methods nor in the figure legends, making it difficult to assess the findings. They do not report on how they calculated FDR for connectivity data, when this is typically used for larger multivariate datasets.

      We employed the False Discovery Rate (FDR) correction, specifically the BenjaminiHochberg method, to determine which values remained statistically significant. This method is widely accepted and involves inputting all the p-values and the total number, 'n.' Additional details about this correction are now provided in the Method section for clarity.

      • The possibility that the OE effects are driven by total Pcdhg levels, rather isoform matching, should be examined. As shown by qRT-PCR in Fig. 3, expression of individual isoforms can vary. It is reasonable that protein levels cannot be measured by IHC, although epitope tags could be considered as C-terminal tagging of cPcdhs preserves the function in mice (see Lefebvre 2008). Quantification of constant Pcdhg RNA levels by qRT-PCR or sc-RT-PCR would directly address the potential caveat that OE levels vary with isoform combinations.

      Through a series of multiple whole-cell recordings, we examined neuronal pairs within the 0% group, where both neurons exhibited overexpression of different combinations of γPCDH isoforms. What we discovered is that the connectivity level within pairs of neurons where both neurons overexpressed γ-PCDH isoforms, pairs with only one neuron overexpressing these isoforms, and pairs with two control neurons (lacking overexpression) was remarkably similar. However, as we incrementally raised the similarity level between the recorded neurons by increasing the overlap in the combinatorial expression of γ-PCDH isoforms, we observed a gradual decrease in the connectivity probability between these neurons. Notably, the connectivity probability reached its minimum when the recorded cells had the exact same combinatorial expression of γ-PCDH isoforms at the 100% similarity level. These findings suggest that the similarity level between neurons, rather than the absolute expression level of γ-PCDH isoforms, plays a critical role in affecting synapse formation.

      -A caveat for the relative plasmid expression quantifications in Figure 3-S1 is that IHC was used to amplify the RFP-tagged isoform, and thus does not likely preserve the relationship between quantities and detection.

      We attempted to enhance the mNeongreen signal, known for its exceptional signal-tonoise ratio, by utilizing the 32f6-100 antibody from Chromotek. However, our observations did not reveal any additional cells through immunostaining compared to the images obtained solely based on the mNeongreen signal. This indicates that the application of the available antibody did not yield a significant improvement in cell detection.<br /> It's important to emphasize that if the RFP signal is overvalued, it would result in an increase in both the "red only" and "red in total" categories. However, it's worth noting that the "red only" category is more sensitive to the outcome than the "red in total" category. Therefore, an overvaluation of the RFP signal would lead to an underestimation of the total estimated plasmid content in electroporated neurons. Consequently, this would result in a lower estimate for the proportion of co-expression cells rather than a higher estimate. We have updated the calculation method in the "Estimating the numbers of overexpressed γPCDH isoform" section to reflect these considerations.

      • Figure 1 didn't change in response to reviews to improve clarity. New panels relating to the scRNASeq analyses were added to supplementary data but many are central and should be included in Figure 1 (ie. S1-Fig6D). In the Results, the authors state that neuronal subpopulations generally show a combinatorial expression of some variable RNA isoforms and near ubiquitous C-type expression. But they only show data for the Layer 2/3 neuron-specific cluster in S1-Fig-6D, and so it is not clear if this pattern applies to other clusters. Fig. S1-5 show a low number of expressed isoforms per cell, but specific descriptions on whether these include C-type isoforms would be helpful. Figure 1F showing isoform profile in all neurons is not particularly meaningful. There is a lot of interest in neuron-type specific differences in cPcdh diversity, and the authors could highlight their data from S1-5 accordingly.

      In addition to the layer 2/3 cluster, we observed a diverse combinatorial expression of various variable γ-PCDH isoforms alongside nearly ubiquitous C-type expression in all other clusters of cells. We have now explicitly mentioned this observation in the main text. To underscore this point further, we have included a new figure, Fig. 1-S6, which provides information on the similarity analysis for all other clusters. It's important to note that the data in previous Fig. S1-5 (now renumbered as S1-7) were solely related to "variable" isoforms. We apologize for any confusion and have made this clarification by including it in the title of the figure.

      • The concept of co-occurrence and results should be explained within the results section, to more clearly relate this concept to data and interpretations. Explanations are now found in the methods, but this did not improve the clarity of this otherwise very interesting aspect of the study.

      Thanks for your suggestion. We have incorporated some of the explanations from the methods section into the main text t, mainly for the concept of “co-occurence”.

      • The claim that C-type Pcdhgs do not functionally influence connectivity is premature. Tests were limited to PcdhgC4, which has unique properties compared to the other 2 C-type isoforms (Garrett et al 2019 PMID: 31877124; Mancia et al PMID: 36778455). The text should be corrected to limit the conclusion to PcdhgC4, and not generally to C-type. The authors should test PcdhgC3 and PcdhgC5 isoforms.

      We have changed the claim for PcdhgC4, but not generally for C-type to better reflect our observation.

      • The group generated a novel conditional Pcdh-alpha mouse allele using CRISPR methods, and state that there were no changes in synaptic connectivity in these Pcdh-alpha mutants. But this claim is premature. The Southern blots validate the targeting of the allele. But further validations are required to establish that this floxed allele can be efficiently recombined, disrupting Pcdha protein levels and function. Pcdha alleles have been validated by western blots and by demonstration of the prominent serotonergic axonal phenotype of Pcdha-KO (ie. Chen 2017 PMID: 28450636; IngEsteves 2018 PMID: 29439167).

      We have obtained a new set of qRT-PCR data that confirms the decreased expression of α-PCDH in Pcdha CKO mice. These data have been integrated into Figure 2-S2D.

      • The Discussion would be strengthened by a deeper discussion of the findings to other cPcdh roles and studies, and of the limitations of the study. The idea that the Pcdhgs are influencing the rate of connectivity through a repulsion mechanism or synaptic formation (ie through negative interactions with synaptic organizers such as Nlgn - Molumby 2018, Steffen 2022) could be presented in a model, and supported by other literature.

      I would like to express my sincere appreciation to the reviewer for their invaluable comments and suggestions, which have led to extended discussions within our work. We have incorporated these suggestions into our paper to establish stronger connections between our observations and prior research findings.

      Reviewer #1 (Recommendations For The Authors):

      1) In Figure S6, the authors measured Euclidean distance from the single cell data to take account of the isoform expression levels in explaining diversity. However, it is hard to interpret the data without any control. The authors could measure the same value from a shuffled /randomized dataset for comparison (similarly to Fig 1F).

      We understand the reviewer's concern about the significance of the Euclidean distance analysis without an appropriate control. The inclusion of the Euclidean distance metric was initially a response to suggestions from other reviewers who recommended incorporating diverse methods for analyzing expression patterns among neurons.

      In response to your valuable feedback, we have taken measures to address these concerns. We have introduced shuffled data for comparison, thus enhancing the meaningful context for interpreting the results derived from the Euclidean distance analysis.

      2) The authors need to clarify which cortical regions were used for electrophysiological experiments.

      Apologies for any confusion. To clarify, all recordings were conducted on neurons located in layer 2/3 of the neocortex without further discrimination. We have reinstated this information in both the main text and the methods section to ensure its clarity.

      Reviewer #2 (Recommendations For The Authors):

      There are still some issues that must be addressed.

      1) The references to gamma protocadherin repulsion are not correct in context. A repulsive role of homophilic interaction has been inferred from certain knockout phenotypes in a subset of neurons (not in cortical neurons). However, repulsion has never been shown to follow gamma protocadherin engagement. The authors present no new evidence that their results are attributable to cellular repulsion at nascent synaptic contacts. The mechanism is unknown. The references to repulsion to explain their results should make it clear that this is one possible explanation, but it is not shown. Also some references in the text are not correct. For example, line 63/64: the results of Molumby and Steffen are not involving homophilic adhesion or repulsion, but rather a cis interaction with neuroligins. Those papers should not be discussed as involving repulsion as in the reference to Lefebvre 2012. Also line 268/269 "Together with previous findings (Molumby,,,Tarusawa), our observations solidify repulsion effect of g-PCDH on synapse formation. . .". This is not the case. Neither Molumby nor Tarusawa demonstrated any such repulsion.

      Thank you to the reviewer for pointing out the errors in our citations. We have made the necessary corrections to the citations and have also refined the descriptions of our observations to improve clarity and accuracy.

      2) The discussion of the results when C4 is overexpressed must also be greatly toned down. C4 is a strange C-type protein--it cannot get to the cell surface alone but relies on other cPCDHs for this, and its primary role is in preventing cell death. It is odd that the authors used this isoform to represent C-types. They should have used C3, which two recent papers showed have specific roles at some synapses (Meltzer et al 2023, Ginty lab) and in dendrite branching (Steffen et al 2023, Weiner lab) , or C5. It is entirely possible that just C4 has no role in synaptic matching--but C3 and C5 might. They should not conclude that the C-types have no such role and only A and B types do. That must be toned down (e.g., line 198/199, line 281).

      We acknowledge that using C4 to represent all three C-types (C3, C4, and C5) is not accurate. We have now modified the statement in the main text to rectify this.

      3) For the citation of Pcdhg flox/flox mice (line 126), Prasad et al., Development, 2008, Weiner lab, should also be cited as it fully characterized that line that was also used in Lefebvre et al 2008. They were co-published.

      Thank you for highlighting the missing citation, and we have now included it in the relevant section.

      4) the Pcdh alpha KO Mouse characterization is still insufficient. The authors must show that alpha expression is gone following introduction of Cre, either by RT-PCR using alpha constant domain primers, or an alpha antibody on Western. blot. The southern and off-target sequencing do not confirm that all alpha gene expression is gone.

      Thank you for your feedback. We have conducted the qRT-PCR analysis as per your suggestion. The results clearly indicate a substantial reduction in α-PCDH expression within the neocortex of Pcdha cKO mice. We have thoughtfully incorporated this data into the manuscript, and it is visually represented in the new panel of Figure 2-S2D. Your valuable input has contributed to enhancing the quality of our work, and we sincerely appreciate the opportunity to address this important aspect.

      5) I do not understand something in Figure 4-S1A. Why with 0% matching is synaptic connectivity so low? This is not the same as in Figure 3E. This has to be explained because it does suggest that overexpression of ANY isoforms can inhibit synapse formation, which is consistent with Molumby 2017, even though this paper says it is not just the levels but the isoform specificity.

      The panel of Fig.4-S1A illustrates the connection rate between neurons with the same color (icons in upper left), representing cells that express the same combination of γ-PCDHs (100% of similarity). The X-axis (0%, 11%, 33%, and 100%) reflects the similarity level between the 2 populations of cells (GFP and RFP).

      6) There are still issues with the English grammar in the paper. It is not too bad in the main text but someone should re-edit it. However, the figure legends are indeed much worse and truly must be edited professionally before they are acceptable.

      We apologize for our English writings in the paper. We have now polished most part of the manuscript, especially the parts for figure legends.

      Reviewer #3 (Recommendations For The Authors):

      • This study has many strengths and innovative findings. Most comments above included suggestions to strengthen the paper. The overall message that Pcdhgs influence the rate of synaptic connectivity between nearby cells is compelling. How this Pcdhg-isoform-dependent process could influence synaptic specificity can be explored in a model in the discussion. But this study did not test a role in 'synaptic specificity'; this term should be removed from the title and line 81 in the intro.

      Thank you for your invaluable comments aimed at improving our paper. Regarding the title, we believe that "synaptic connectivity" might be a more suitable choice than "synaptic specificity." However, we're open to considering other alternatives as well.

      • The manuscript and overall quality of the science will be improved by removing those sections that are not adequately investigated (ie.Pcdh-a cKO; PcdhgC4 is assessed but findings can't be extended to other C-type isoforms) and by outlining limitations of the study.

      We have modified the related claim mentioned in the main text.

      • The studies negatively correlating between isoform matching and connectivity are not robust. Additional approaches are needed if the authors want to make this claim.

      In Figure 4E, we have implemented a bootstrapping method. Bootstrapping is a statistical technique falling under the broader category of resampling methods. It involves random sampling from the observed data with replacement, enabling the calculation of standard errors, confidence intervals, and supporting hypothesis testing.

      • Statistical approaches should be described in methods, figure legends.

      More information about statistical approaches has been added in the figure legends.

      • The discussion should elaborate on the limitations of the study, and relate to other studies, including Lv et al 2022.

      We have added more discussion to relate our observations to previous findings.

    1. Author Response

      Note to the editor and reviewers.

      All the authors would like to thank the editorial team and the two anonymous reviewers for their efforts and thoughtfulness in assessing our manuscript. We very much appreciate it and we all believe that the manuscript has been much improved in addressing the comments and suggestions made.

      General considerations on the revised manuscript

      We have applied extensive modifications to the manuscript with our main goal being the improvement of clarity. The Introduction has been changed mainly to introduce precisely our terminology and we have stuck to it in the rest of the manuscript. The Results section has been divided up into more defined sections. The discussion has been extensively re-written to improve clarity, following the suggestion of the reviewers. Main figures 1 and 4 have been modified with clearer schematics. Supplementary figures and legends have been modified and several supplementary schematic figures have been added to clearly present our interpretations for various data. We have added a Supplementary Discussion where the most detailed technical parts of our discussion are presented to avoid unnecessarily weighing down the main discussion, where our main conclusions are outlined. We have presented our mass photometry mixing experiment in a new supplementary figure, with detailed explanation. We have also expanded our discussion of in vivo and general relevance of our study.

      Response to manuscript evaluation

      Our manuscript has been evaluated as a valuable study and presenting solid experimental evidence. We appreciate the recognition of our work.

      Two weaknesses were identified by reviewers: 1) our experiments do not completely exclude the possibility of an alternative nucleophile. This relates to the evaluation of our experimental evidence. 2) Our study does not address the in vivo relevance of the interface swapping phenomenon, which relate to the value of the study for the community.

      Response to the evaluation of experimental evidence (Weakness #1):

      We argued in the original manuscript that we have excluded completely the presence of an alternative nucleophile. This conclusion is based on a series of experiments which were presented in the originally submitted manuscript. These experiments are not discussed by the reviewers in relation to this main conclusion and therefore we suggest that they have not been properly evaluated. We believe our conclusion to be appropriately supported by these data (see our response to reviewer #1). In addition, the criticism of our gel-filtration data by reviewer #2 was based on a misinterpretation of Supplementary figure 1 b. We accept of course that the way the data was presented could be misleading and we assume responsibility for this. We have attempted to correct this by changing the main text and the figures legends and annotation. In conclusion, we believe that the evaluation of experimental evidence as presented in the revised manuscript could be upgraded to “convincing”.

      Response to our study general relevance evaluation (weakness #2):

      We agree with both reviewers about the in vivo relevance of our observation being an important question, not addressed so far. Indeed, the value of our study would be greatly increased by in vivo data and be of interest to a wider audience. However, we would like to argue that our study would interest a wider audience than initially stated for the following reasons: 1) Our study is the first evidence of interface swapping in vitro and will constitute a base to investigate this phenomenon both in vivo and in vitro. It will therefore interest a wide audience due to the potential involvement of interface swapping in a wide range of processes, such as recombination, evolution, and drug targeting (see also below). 2) DNA cleavage is the central mode of action of antibiotics targeting bacterial type II topoisomerases (i.e. topoisomerases “poisons”). This already established target is one of the few having produced new scaffolds and too few new antibacterial are in production to fulfill medical needs. The role of interface stability is also emerging as a modulator of the efficiency of topoisomerase poisons. See for instance (Germe, Voros et al. 2018, Bandak, Blower et al. 2023). By shedding light on interface dynamics, our study will be of interest to scientist interested in the development of these drugs. In addition, the heterodimer system can potentially produce detailed mechanistic information (Gubaev, Weidlich et al. 2016, Hartmann, Gubaev et al. 2017, Stelljes, Weidlich et al. 2018) not only on gyrase but also on other, dimeric type II topoisomerases or even other dimeric enzyme in general. We have amended the manuscript to make these points clearer. Therefore, we believe that the evaluation of the revised manuscript’s relevance could be upgraded to “important”.

      Point-by-point response to the reviewer

      Reviewer #1 (Public Review):

      Germe and colleagues have investigated the mode of action of bacterial DNA gyrase, a tetrameric GyrA2GyrB2 complex that catalyses ATP-dependent DNA supercoiling. The accepted mechanism is that the enzyme passes a DNA segment through a reversible double-stranded DNA break formed by two catalytic Tyr residues-one from each GyrA subunit. The present study sought to understand an intriguing earlier observation that gyrase with a single catalytic tyrosine that cleaves a single strand of DNA, nonetheless has DNA supercoiling activity, a finding that led to the suggestion that gyrase acts via a nicking closing mechanism. Germe et al used bacterial co-expression to make the wild-type and mutant heterodimeric BA(fused). A complexes with only one catalytic tyrosine. Whether the Tyr mutation was on the A side or BA fusion side, both complexes plus GyrB reconstituted fluoroquinolone-stabilized double-stranded DNA cleavage and DNA supercoiling. This indicates that the preparations of these complexes sustain double strand DNA passage. Of possible explanations, contamination of heterodimeric complexes or GyrB with GyrA dimers was ruled out by the meticulous prior analysis of the proteins on native Page gels, by analytical gel filtration and by mass photometry. Involvement of an alternative nucleophile on the Tyr-mutated protein was ruled unlikely by mutagenesis studies focused on the catalytic ArgTyrThr triad of residues. Instead, results of the present study favour a third explanation wherein double-strand DNA breakage arises as a consequence of subunit (or interface/domain) exchange. The authors showed that although subunits in the GyrA dimer were thought to be tightly associated, addition of GyrB to heterodimers with one catalytic tyrosine stimulates rapid DNA-dependent subunit or interface exchange to generate complexes with two catalytic tyrosines capable of double-stranded DNA breakage. Subunit exchange between complexes is facilitated by DNA bending and wrapping by gyrase, by the ability of both GyrA and GyrB to form higher order aggregates and by dense packing of gyrase complexes on DNA. By addressing a puzzling paradox, this study provides support for the accepted double strand break (strand passage) mechanism of gyrase and opens new insights on subunit exchange that may have biological significance in promoting DNA recombination and genome evolution.

      The conclusions of the work are mostly well supported by the experimental data.

      Strengths:

      The study examines a fundamental biological question, namely the mechanism of DNA gyrase, an essential and ubiquitous enzyme in bacteria, and the target of fluoroquinolone antimicrobial agents.

      The experiments have been carefully done and the analysis of their outcomes is comprehensive, thoughtful and considered.

      The work uses an array of complementary techniques to characterize preparations of GyrA, GyrB and various gyrase complexes. In this regard, mass photometry seems particularly useful. Analysis reveals that purified GyrA and GyrB can each form multimeric complexes and highlights the complexities involved in investigating the gyrase system.

      The various possible explanations for the double-strand DNA breakage by gyrase heterodimers with a single catalytic tyrosine are considered and addressed by appropriate experiments.

      The study highlights the potential biological importance of interactions between gyrase complexes through domain-or subunit-exchange

      We thank the reviewer for their support, effort, and comments. The above is a great summary.

      Weaknesses:

      The mutagenesis experiments described do not fully eliminate the perhaps unlikely participation of an alternative nucleophile.

      We agree that the mutagenesis experiment on its own does not fully eliminate the possibility of an alternative nucleophile. The number of residues mutated is limited, and therefore it is possible we have missed a putative alternative nucleophile.

      However, we have other data and experiments supporting the conclusion that no alternative nucleophile exists. Therefore, we want to stress that our conclusion that no such alternative exist is based on these extra data. These data and experiments are not discussed by either reviewer despite being present in the original manuscript. This puzzled us and we have modified the manuscript and the figures in the hope that they, and their significance, would not be missed.

      Briefly:

      1) We have performed cleavage-based labeling of the nucleophile responsible for cleavage. This experiment is depicted in Figure 4. The nucleophilic activity of the residue involved results in covalent link between the polypeptide (that includes the residue) and radiolabeled DNA. Therefore, a polypeptide that includes an active nucleophile will be radiolabeled and visible, whereas a polypeptide that is missing an active nucleophile will remain unlabeled and invisible. We can distinguish the BA and the A polypeptide from their size. In the case of the BA.A complex both the BA polypetide and the A polypetide are radiolabeled and therefore both have an active nucleophile. In the case of the BAF.A complex, the unmutated A polypeptide is labeled, meaning that a nucleophile is still active. In contrast, the BAF polypeptide shows no detectable labeling. This result means that removing the hydroxyl group from the catalytic tyrosine abolishes any protein-DNA covalent link, suggesting that no other nucleophile from the BA polypetidic chain can substitute for the catalytic tyrosine hydroxyl group. This experiment excludes the possibility of an alternative nucleophile coming from the polypeptidic chain of either GyrA or GyrB. This experiment, described in figure 4, is not discussed by the reviewer. This experiment is similar in principle to early experiments identifying catalytic tyrosine in topoisomerases. See for instance, (Shuman, Kane et al. 1989).

      2) The experiment above does not exclude a nucleophile coming from the solvent. To exclude this possibility, we have used T5 exonuclease (which needs a free 5’ DNA end to digest) and ExoIII (which need a free 3’ DNA end to digest). We have shown the reconstituted cleavage is not sensitive to T5 and sensitive to ExoIII. This shows that the 5’ end of the cleaved sites are protected by a bulky polypeptide impairing T5 activity, which is active in our reaction as shown by the digestion of a control DNA fragment. This experiment shows that the reconstituted cleavage is very unlikely to come from a small nucleotide potentially provided by the solvent. This experiment is described in the main text and the results are shown in supplementary figure 5. It is not mentioned by either reviewer.

      3) Finally, we would like to emphasize our experiment comparing the BAF.A59 to BALLL.A59. The BALLL.A59 complex displays increased cleavage compared to BAF.A59. If this increased cleavage was due to an alternative nucleophile on the BALLL side, we would expect an accompanying increase in supercoiling activity since the BALLL.A59 possesses one CTD, which is sufficient for supercoiling. The fact that no increased supercoiling activity is observed strongly suggests subunit exchange reconstituting an A59 dimer, inactive for supercoiling but active for cleavage. We believe this somewhat complex observation to be quite significant and we have attempted to clarify the manuscript and discuss its full significance in several places.

      Reviewer #1 (Recommendations For The Authors):

      An interesting paper on DNA gyrase that explains a puzzling paradox in terms of the double-strand break mechanism.

      Major points

      1) The authors consider several mechanisms that could potentially explain their data. On page 15, the authors present the evidence against the nicking closing mechanism proposed by Gubaev et al. Throughout the manuscript, they indicate where their experimental results agree with this earlier work but should also indicate and account for differences. For example, Gubaev et al describe cross linking experiments that they claim rule out subunit exchange. These aspects should be clearly explained.

      Thank you for the suggestion. We have re-written the discussion to address this point. We are extensively discussing experiments from (Gubaev, Weidlich et al. 2016), and offer our interpretation of apparently conflicting results. We suggest that their experiments are basically consistent with our data when correctly interpreted. To keep the main manuscript clear, we have added a supplementary discussion where experiments from (Gubaev, Weidlich et al. 2016) are discussed further in relation to our data.

      2) Page 9. The experiments done to rule out the perhaps unlikely alternative nucleophile hypothesis relate to the possible role of the Arg and Threonine of the RYT triad. These residues are close to the DNA and therefore are prime candidates and attractive targets for mutagenesis. However, strictly speaking, the mutant enzyme data presented do not rule all possibilities. For example, Serine is often the nucleophile used by resolvases to effect DNA recombination via subunit exchange. The ideal experiment to rule out/rule in other nucleophiles would be to identify the residue(s) that become attached to DNA in the cleavage reaction.

      Please see above. We have effectively ruled an alternative nucleophile with our cleavage-based labeling experiment and others that were present and discussed in the original manuscript but were missed. We have modified the manuscript and figures in order to make this point clearer than before.

      3) p17. The readout for subunit exchange used by the authors is double-stranded DNA cleavage. Attempts to directly detect the formation of the DNA cleaving complexes GyrA2B2 and (GyrBA)2 (arising from subunit exchange between heterodimers) by mass photometry were not successful. Perhaps FRET would have been another approach to try as it could also detect interface and domain interchanges.

      Directly detecting interface exchange directly by proximity experiment would be extremely useful. FRET would have to be done in the BAF.A + GyrB configuration where the amount of interface exchange is important. Now, we do not have the tools to do that and developing them would be outside the scope of the study. We propose cross linking experiment to be done in the future. We argue that the manuscript is convincing without these for now. This will be addressed in the future. This point, and other possible future experiments are now discussed in the discussion section.

      4) The underlying canvas of this paper is the strand passage mechanism of gyrase. It would seem appropriate to include the papers first proposing it - Brown P.O and Cozzarelli N.R. (1979) and Mizuuchi K et al (1980).

      We very much agree. These papers have now been added in the introduction as appropriate, highlighting the relationship between double-strand cleavage and the strand-passage mechanism.

      5) Figure 1. The quality of the insets is poor. It is difficult to pick out the key catalytic residues and their disposition vis-a-vis DNA.

      We agree, Figure 1 has been re-done and the schematic theme has been harmonized throughout the whole manuscript. We very much hope that clarity has improved. Thank you for the suggestion.

      6) The experimental work is a very detailed analysis of a specific feature of engineered gyrase heterodimers. Making the work accessible to the general reader will be important. Using shorter paragraphs each with a specific theme might help. In particular, the second paragraph of the Results on p7, the section on p9 and bottom of p11, p13 and the first paragraph of the Discussion on p14 are each a page or more long. A shorter manuscript that avoids overinterpretation of the smaller details would also help.

      We agree. We have now split long paragraphs into individual sections, with titles, in the Results. This structure is recapitulated at the beginning of the discussion, and we have split the discussion into shorter paragraphs, each with a unique point being made.

      7) The impact of the Gubaev et al (2016) paper for the field in general, and as the catalyst for the present work should be better documented. Mention of this earlier paper and its significance at the beginning of the Abstract and elsewhere e.g in the Introduction might also help with a more logical organization of the current findings and result in a shorter paper (which would be easier to read).

      We have added a reference to (Gubaev, Weidlich et al. 2016) in the abstract and have expanded our introduction

      Minor points

      1) Legends for Figs 2 and 6; Supplementary Figs 1 and 8. The designation of subfigures as a, b, c, d , e etc appears to be incorrect. Check throughout and in the text.

      The manuscript has been checked for such errors.

      2) Figure 2, and first paragraph p8. Peaks in Fig 2c should be labelled to facilitate discussion on p8.

      Agreed, this has been done.

      3) Supplementary Fig 4 and elsewhere in the manuscript. A variety of notations are used to denote phenylalanine mutants e.g. AsubscriptF, AsuperscriptF and AF. Check and use one format throughout.

      Done

      4) Figures showing gels include the label '+EtBr, +cipro'. This is somewhat confusing because EtBr was contained in the gel (not the samples) whereas cipro was included in the reaction. Modify or describe in the legend..

      We have re-written the figure legend.

      5) Supplementary Fig 4b describes a small effect on the ratio of linear to nicked DNA for the triple LLL mutant. Is this significant? How many times was the measurement made?

      This has been addressed in the original manuscript in the supplementary data. In term of quantification, the experiment has been done 3 times for each prep, with the same GyrB prep and concentration. The standard error is displayed on the figure. This result is very reproducible and have been reproduced more than 3 times. No LLL cleavage assay showed more single-strand than double-strand cleavage. For the phenylalanine mutant, no cleavage assay showed more double-strand than single-strand cleavage.

      6) Supplementary Fig 5 legend. Should 'L' read 'size markers' (and give their sizes)?

      Yes indeed, we have modified the figure to clarify.

      7) p11 line 5. Is this statement correct?

      Yes, it is correct. Although we hope we are on the same line. When the Tyrosine is mutated on one side only of the heterodimer, both single- and double-strand cleavage are protected from T5 exonuclease digestion.

      8) 12 last line should read...and supercoiling activity (not shown)..were

      Thank you, done.

      There are a number of typos throughout the text, for example:

      Page 3 line..Difficult to conclude...what?

      Page 3 para 3...Lopez....and Blazquez

      We have corrected these typos and checked the whole manuscript.

      Reviewer #2 (Public Review):

      DNA gyrase is an essential enzyme in bacteria that regulates DNA topology and has the unique property to introduce negative supercoils into DNA. This enzyme contains 2 subunits GyrA and GyrB, which forms an A2B2 heterotetramer that associates with DNA and hydrolyzes ATP. The molecular structure of the A2B2 assembly is composed of 3 dimeric interfaces, called gates, which allow the cleavage and transport of DNA double stranded molecules through the gates, in order to perform DNA topology simplification. The article by Germe et al. questions the existence and possible mechanism for subunit exchange in the bacterial DNA gyrase complex.

      The complexes are purified as a dimer of GyrA and a fusion of GyrB and GyrA (GyrBA), encoded by different plasmids, to allow the introduction of targeted mutations on one side only of the complex. The conclusion drawn by the authors is that subunit exchange does happen, favored by DNA binding and wrapping. They propose that the accumulation of gyrase in higher-order oligomers can favor rapid subunit exchange between two active gyrase complexes brought into proximity.

      The authors are also debating the conclusions of a previous article by Gubaev, Weidlich et al 2016 (https://doi.org/10.1093/nar/gkw740). Gubaev et al. originally used this strategy of complex reconstitution to propose a nicking-closing mechanism for the introduction of negative supercoils by DNA gyrase, an alternative mechanism that precludes DNA strand passage, previously established in the field. Germe et al. incriminate in this earlier study the potential subunit swapping of the recombinant protein with the endogenous enzyme, that would be responsible for the detected negative supercoiling activity.

      Accordingly, the authors also conclude that they cannot completely exclude the presence of endogenous subunits in their samples as well.

      Strengths

      The mix of gyrase subunits is plausible, this mechanism has been suggested by Ideka et al, 2004 and also for the human Top2 isoforms with the formation of Top2a/Top2b hybrids being identified in HeLa cells (doi: 10.1073/pnas.93.16.8288).

      Germe et al have used extensive and solid biochemical experiments, together with thorough experimental controls, involving :

      • the purification of gyrase subunits including mutants with domain deletion, subunit fusion or point mutations.

      • DNA relaxation, cleavage and supercoiling assays

      • biophysical characterization in solution (size exclusion chromatography, mass photometry, mass spectrometry)

      Together the combination of experimental approaches provides solid evidence for subunit swapping in gyrase in vitro, despite the technical limitations of standard biochemistry applied to such a complex macromolecule.

      We thank the reviewer for their supportive and considered comments.

      Weaknesses

      The conclusions of this study could be strengthened by in vivo data to identify subunit swapping in the bacteria, as proposed by Ideka et al, 2004. Indeed, if shown in vivo, together with this biochemical evidence, this mechanism could have a substantial impact on our understanding of bacterial physiology and resistance to drugs.

      Thank you for this comment. Indeed, whether this interface exchange can happen in vivo and lead to recombination is a very important question. However, we believe that this is outside the scope of this study simply because of the amount of work one can fit into one paper. Proving that interface exchange can happen in vitro has already necessitated a number of non-trivial experiments and likewise investigating interface exchange in vivo will require a careful, long-term study (see our reply to reviewer #2 comment, who also raised this point). We can’t address it with one additional experiment with the tools we have. However, we very much hope to do it in the future.

      Reviewer #2 (Recommendations For The Authors):

      Specific questions and comments for the authors:

      1) Complex identification during purification

      The statement line 236-237 that "Our heterodimer preparation showed a single-peak on a gel-filtration column, distinct from the GyrA dimer peak" is not entirely clear. In Fig supp 1 b, how can the authors conclude from the superose 6 that GyrBA is separated from the GyrA dimer? Since they seem close in size 160/180kDa, they are unlikely to be well separated in a superose 6 gel filtration column. The SDS-PAGE seems to show both species in the same fractions #15-17 therefore it would not be possible to distinguish GyrBA. A from A2.

      There appears to be some confusion about what Supp Fig. 1b shows. First, in all our gel filtration conditions both GyrBA and GyrA can’t exist as monomers at a significant concentration. Therefore, we can never observe the GyrBA monomer on a gel filtration column. Supp Fig. 1b shows the gel filtration profile of the BA.A heterodimer only. This is the output of the last, polishing step in the reaction. We analyze these results using SDS-PAGE. Therefore, the BA.A heterodimer will be denatured and separated into 2 polypeptides: GyrBA and GyrA, which migrates according to their size in an SDS-PAGE and forms two bands. These two bands do not represent two separate species in solution. They represent the separation of one species only, the BA.A heterodimer into its two, denatured, subunits: GyrA and GyrBA. We do not conclude from Supp Fig. 1 as a whole that GyrBA and the GyrA dimer are well separated, and this is not stated in the manuscript. We conclude that the BA.A dimer is fairly well separated from the GyrA dimer. They have significant different size (~260 kDa and ~180 kDa respectively) and form different peaks on a gel filtration column. The BA.A heterodimer has a GyrA subunit and therefore will shows a GyrA band on an SDS-PAGE, like the GyrA dimers but the two are obviously distinct in their quaternary structure. We are hoping that our new schematics and re-write of some of the results and figure legends will clarify this.

      Panel 6 shows a different elution volume for the 2 species BA.A and A2 on an analytical S200 column, which appears better at separating the complexes in this size range.

      Did the authors consider using a S200 column instead of superose 6 for the sample preparation, to optimize the separation of GyrBA. A from A2?

      This is not a necessarily true statement (see above). We have not run the GyrA dimer on a Superose 6 column. The analysis was done on an s200 because extensive data for the GyrA dimer was already available with this, already calibrated column. We do not expect the Superose 6 to be worse in this size range. In fact, it might even be better. The Superose 6 profile in Supp. Fig. 1b shows BA.A only and no GyrA dimer. We have clarified the annotations in the figure to make this clearer.

      Regarding the analytical gel filtration experiment, there is however an overlap in the elution volume in the analytical column, therefore how can the authors ensure there is no excess free A2 complex in the GyrBA. A sample?

      Indeed, there is an overlap, but we argue that it is overstated. The important part of the overlap is where the maximum height of the GyrA peak is positioned compared to the BA.A trace, not where the traces intersect. This overlap is minimal. If a contaminating GyrA peak was hidden in the BA.A peak, it would have to be at least 10 times less intense than the BA.A peak. Since BA.A and GyrA dimer have roughly the same extinction coefficient, this means that a contamination would detectable at 10 % or even less. Our mass photometry further excludes such contamination.

      Alternatively, the addition of a larger (cleavable) tag at the C-terminal end of the BA construct (therefore not disturbing dimer association) could allow to better distinguish the 2 populations already at the size exclusion step.

      This is true and could allow cleaner purification. There are also other ways to achieve cleaner purification, like adding a secondary tag. However, like we argue in the manuscript, our contaminations are already minimal. It is questionable what benefits could be gained in changing the protocol. We also argue that the tandem tag method does not completely exclude contamination (Supplementary Discussion) and therefore we are not sure if this would be worth the time and expenditure.

      2) GyrA and GyrB Oligomers:

      In the mass photometry experiment, the authors explain that the low concentration of the proteins promotes dissociation of GyrA dimers, hence the detection of GyrA monomers instead of GyrA dimers, which are also detected in the GyrBA.A sample.

      However, it cannot be concluded that the GyrA dimer is not formed in the condition of the gel filtration chromatography, at higher concentration.

      In our mass photometry experiment, The BA.A sample is not as diluted as the GyrA dimer and much closer to our experimental condition. Since we have calculated the dissociation constant, we can calculate the expected level of dissociation (or reassociation). The level of dissociation is minimal in these conditions. If some dissociation is expected from the BA.A heterodimers, a very low amount of GyrBA monomer should also be present and yet they are not observed. We presume that it is because mass photometry is much more sensitive to GyrA (see our mixing mass photometry experiment that we have added). If the GyrA would reassociate at higher concentration, it would do so either with itself (forming a GyrA dimer) or with the GyrBA monomer, reforming the heterodimer. Assuming both GyrA dimer and heterodimer have the same dissociation constant, roughly one third of the GyrA monomer would reassociate with themselves. Assuming even complete reassociation of the GyrA dimer, this would leave only GyrA dimer accounting for 2% of the prep.

      Another interpretation would be to assume that GyrBA monomers are not present at all and that GyrA monomer are reassociating only with themselves. This is not valid because of the following thermodynamic reason:

      Since the profile for the GyrA dimer are collected at equilibrium, we should expect a ratio between GyrA monomer and dimers that follow the dissociation constant. In other words, if the GyrA monomer were in equilibrium with GyrA dimer we should expect a much higher dimer concentration already as the GyrA monomers are not as dilute. We do not observe a GyrA dimer peak in the BA.A profile, even though we can detect a low amount of GyrA dimer mixed with BA.A. Therefore, we conclude that the observed GyrA monomer must be in equilibrium with another dimerization partner, which is most probably the GyrBA monomer (see above). Therefore, only a minimal amount of GyrA dimer is expected to be formed at higher concentration by direct reassociation. This could probably increase if we let this solution-based exchange carry on for a long time at dissociation equilibrium. We have actually shown that this solution-based exchange is very slow and take several days because of the low dissociation at equilibrium.

      The mass spectrometry analysis in Fig 2 confirms the presence of (monomeric) GyrA in the sample, despite different experimental conditions.

      The concentration of heterodimer in the mass spectrometry experiment is actually higher than in the mass photometry experiment. This shows that self-reassociation of the GyrA monomer as suggested above is undetectable with mass spectrometry at higher concentration.

      We considered that the “GyrA monomer” peak could be a contaminating GyrB monomer, which is ~90 kDa, which would explain the lack of reassociation. However, the mass spectrometry peak shows precisely the expected molecular weight of GyrA so we interpret this peak as arising from very limited dissociation of the BA.A heterodimer. The reassociation is limited at high concentration due simply to the fact that the difference in concentration between the mass photometry and our other experimental conditions is not that high. The GyrA dimer had to be diluted 400 times to see significant dissociation and yet even at this very low concentration the dissociation is far from complete.

      Our general conclusions on the couple of point above is that we cannot completely exclude the presence of GyrA dimers being present, although they are undetectable in our working conditions either by mass photometry (lower concentration), Mass spectrometry (higher concentration) and even gel filtration (even higher concentration, see above). For the mass photometry, we have established that our detection threshold for a contamination is very low (see our mixing experiment).

      Figure 2A: the authors state in the introduction that GyrB is a monomer in solution and then explain that the upper bands in the native gel are multimer of GyrB. Could the authors comment and provide the size exclusion profile of the Gyr B purification?

      We have expanded our discussion of this. However, we have not been successful in collecting a gel filtration profile for GyrB. This is likely due to excessive oligomerization at the concentration we are using for gel filtration. We suggest that our mass photometry and Blue-Native PAGE experiment shows clearly that GyrB can be detected as a monomer in solution at the appropriate dilution. However, GyrB tends to oligomerize in a regular fashion (Consider especially Supp Fig. 8a), which suggest that it could align heterodimers on DNA in a linear, regular orientation. We have added a discussion of this.

      Together the relevance of the oligomeric state of purified GyrA or GyrB should be clarified, relative to their role in subunit swapping.

      We have added explanation in our discussion, while also trying to not be too speculative. Basically, we believe that GyrB oligomerization is likely to be involved. It is difficult to conclude for GyrA since no experiment has allowed us to test it. Therefore, the role of GyrA oligomerization, if any, is unclear. The GyrA tetramer is very prominent though and forms very easily. GyrB on the contrary forms longer oligomers more readily than GyrA and we surmise that this would help interface exchange. However, the structure of these GyrA and GyrB oligomers is not clear, which make it difficult to go beyond speculation on this. It would be a very interesting experiment if we were able to suppress GyrB oligomerization whilst conserving its ability to promote strand-passage and cleavage. Same goes for GyrA. Unfortunately, we are unable to do that at this time.

      4) Subunit exchange

      Line 320: the concept of subunit exchange in this context should be clearly explained. If one understands correctly, the authors mean that the BAF polypeptide, part of the BAF.A complex, could be replaced by a combination of B+A therefore forming a fully functional WT A2B2 gyrase complex.

      Thank you for the suggestion. We have harmonized and clearly defined our terminology for interface swapping and subunit exchange in the introduction and attempted to be much more rigorous when referring to it.

      A great effort has been done in this study to explain all the pros and cons of the experimental design but the length of the explanations may prevent readers outside of the field to fully appreciate the conclusions. This article would benefit from the addition of a few schematics to summarize the working hypothesis.

      Thanks for the suggestion. We have added a series of schematics to illustrate our interpretation for each construct. As mentioned above the terminology has been more rigorously defined and updated throughout the manuscript.

      5) Presence of endogenous GyrA

      Line 419-425: it is quite difficult to follow the explanations regarding the possible contamination of the sample by endogenous GyrA.

      Maybe these points should rather be addressed in the discussion, when debating the conclusions of Gubaev et al.

      We agree. We have re-organized the Discussion doing just that. We added a Supplementary Discussion in which we further discuss the contamination problem in relation to (Gubaev, Weidlich et al. 2016).

      Production of the subunits in another (non bacterial) expression system or a cell free system may prevent the association of endogenous protein.

      Absolutely. We are planning on addressing this in the future, using the yeast expression system.

      6) Mechanism for subunit swapping

      Lines 588-595: As described by the authors the BA fusion shows decreased activity when compared with the WT probably due to limited conformational flexibility in absence of an additional linker sequence between the fused subunits.

      The affinity of BA for A may possibly be reduced compared to the free A2B2 complex, due to a relative stiffness of the fusion upon full association with a free B subunit, as rightfully pointed by the authors.

      If subunit exchange do happen in vitro, at least in the conditions of this study, the authors could assess the affinity of BA for A, when compared to the association of free B and A subunits

      Experiments using analytical ultracentrifugation or surface plasmon resonance (SPR) may allow to determine the relative affinity of the BA +(A+B) compared to the A2B2 complex. This could be done also for the BALLL mutant and association with A59.

      It would be extremely useful to measure the affinity of BA for A. However, this is difficult because of the high affinity of the interface. To measure a dissociation constant, one has to be able to measure the concentration of the monomer and the dimer at equilibrium. Because of this, the complex must be diluted enough to see any dissociation, making detection difficult. In practice, this also means that we cannot purify monomeric versions of these subunits. We therefore can’t perform “on-rate” study on an SPR surface, which would require flowing monomers on its partner subunit tethered to the SPR surface. However, we could perform “off-rate” studies, but the dissociation time is likely to be very long, making the measurement difficult. We have not tried it though, and it could turn out to be informative. An analysis of antibodies off-rate done in the past could provide a guideline for us to perform this experiment. Analytical ultracentrifugation is an excellent technique and could in theory provide information. In practice however it would be still necessary to dilute the complex enough to obtain significant dissociation at equilibrium, making detection difficult. As far as we are aware, analytical ultracentrifugation rely on UV absorbance for protein detection and therefore we probably would not detect our material at the necessary dilution. We are however open-minded about technique with very sensitive detection methods that could be used.

      9) In vivo relevance

      The study does not conclude on the subunits exchange in vivo, which have been suggested by earlier studies by Ikeda et al. To elaborate further on the relevance of such mechanism in the bacteria, experiments involving the fluorescent labeling of endogenous / exogenous mutant subunits may be required to provide further information on this phenomenon.

      We completely agree that the in vivo relevance of such phenomena is the central question. Addressing this directly is not trivial though. Expressing both BA and A in vivo will results in random partnering and lead to a mix of dimers: A2 (1/4), BA2(1/4) and BA.A (1/2), assuming equal interface affinity. Therefore, to see subunit exchange in the same way as in vitro, one would have to get rid of the BA2 and A2 dimer together, or the BA.A dimer only. Our initial strategy to do that would be to engineer a specific dimer as being uniquely targeted for degradation. This could allow us to “get rid” of for instance the BA.A dimer. Subsequently, we would turn off the degradation and translation together and observe the rate of subunit exchange. This is not trivial though and would be the subject of a further study.

      10) Figure 3: I guess the "intact" label refers to the supercoiled DNA (SC) ? It also appears as "uncleaved" in supp Figure 6. The same label for this topoisomer should be used throughout.

      Thank you for pointing that out. It has now been corrected.

      Bandak, A. F., T. R. Blower, K. C. Nitiss, R. Gupta, A. Y. Lau, R. Guha, J. L. Nitiss and J. M. Berger (2023). "Naturally mutagenic sequence diversity in a human type II topoisomerase." Proceedings of the National Academy of Sciences 120(28).

      Germe, T., J. Voros, F. Jeannot, T. Taillier, R. A. Stavenger, E. Bacque, A. Maxwell and B. D. Bax (2018). "A new class of antibacterials, the imidazopyrazinones, reveal structural transitions involved in DNA gyrase poisoning and mechanisms of resistance." Nucleic Acids Res.

      Gubaev, A., D. Weidlich and D. Klostermeier (2016). "DNA gyrase with a single catalytic tyrosine can catalyze DNA supercoiling by a nicking-closing mechanism." Nucleic Acids Res 44(21): 10354-10366.

      Hartmann, S., A. Gubaev and D. Klostermeier (2017). "Binding and Hydrolysis of a Single ATP Is Sufficient for N-Gate Closure and DNA Supercoiling by Gyrase." J Mol Biol 429(23): 3717-3729. Shuman, S., E. M. Kane and S. G. Morham (1989). "Mapping the active-site tyrosine of vaccinia virus DNA topoisomerase I." Proc Natl Acad Sci U S A 86(24): 9793-9797.

      Stelljes, J. T., D. Weidlich, A. Gubaev and D. Klostermeier (2018). "Gyrase containing a single C-terminal domain catalyzes negative supercoiling of DNA by decreasing the linking number in steps of two." Nucleic Acids Res.

    1. Author Response

      Reviewer #3 (Public Review):

      Strengths:

      NanoPDLIM2, nanotechnologies that efficiently deliver lentivirus overcomes resistance to chemotherapy and anti-PD-1 immunotherapy. This is a new strategy for enhancing the efficiency of immune checkpoint inhibitors.

      This finding is important from a clinical translation perspective, but I have several minor concerns.

      Weaknesses:

      1) Please describe the mechanism of increased MHC class I and PD-L1 by PDLIM2.

      Our previous studies showed that PDLIM2 induces MHC-I induction through decreasing STAT3 whereas it is dispensable for PD-L1 expression (Sun et al, 2019, PMID: 31757943). In line with the studies, PD-L1 is induced by chemotherapeutic drugs, but not by NanoPDLIM2 (Figure 6A). Together with the roles of PDLIM2 in repressing RelA-dependent MDR1 induction by chemotherapy and in preventing expression of cell survival and proliferation genes by targeting both RelA and STAT3 (Sun et al, 2019, PMID: 31757943), further providing the mechanistic basis for the combination and synergistic effect of nanoPDLIM2, anti-PD-1 and chemo drugs. The improvement has now been further incorporated.

      2) Please describe the mechanism of decreased MDR1, nuclear RelA and STAT3 by PDLIM2.

      Our previous studies demonstrated that PDLIM2 reduces MDR1 expression by degrading nuclear RelA (Sun et al, 2019, PMID: 31757943).

      3) Please determine whether PDLIM2 expression directly impacts immune cells (function and number)?

      As shown in Figure 5, NanoPDLIM2 increased the number and activation of tumor infiltrating lymphocytes (TILs); and in prior study, PDLIM2 knockout repressed the numbers of TILs and inhibited the activation of CD4+ and CD8+ T cells, while its re-expression in lung tumors led to T cell activation (Sun et al. 2019, PMID: 31757943). On the other hand, selective deletion of PDLIM2 in immune cells and in particular myeloid cells repressed the numbers and activation of TILs (Li et al, 2021, PMID: 33539325; PMCID: PMC8021114). Thus, PDLIM2 may impact immune cells both directly and indirectly, particularly when nanoparticles can deliver PDLIM2 into both tumor cells and tumor-associated immune cells (despite PDLIM2 is delivered into much fewer immune cells compared to tumor cells).

      4) What is the efficiency of PDLIM2 delivery? Does delivery efficiency determine anti-tumor effect?

      As shown in the manuscript, the dose of PDLIM2 used already shows high delivery (20-30 copies per tumor cell in Figure 3B) and therapeutic efficacy in the mouse model of refractory lung cancer and particularly when being combined with anti-PD-1 and chemo drugs. It is of interest to test different doses in the model for the best delivery and efficacy, which is actively being pursued in the lab.

      5) Authors used a non-immunogenic tumor model. Can you demonstrate the combination effect with PDLIM2 in immunogenic lung cancer models to determine whether the combination of PDLIM2 with anti-PD-1 Ab confers a synergistic effect without chemotherapy?

      Yes, it is of interest to demonstrate the combination of PDLIM2 and anti-PD-1 in immunogenic lung cancer models with chemotherapy although a synergy is highly expected. The greatest challenge in the lung cancer field is the low response of non-immunogenic tumor, which is the focus of the current manuscript.

      6) On page 11, % change can make one over-interpret data.

      The % change has been removed from the manuscript.

      7) In Figure 5, what is the difference between 5A and 5D?

      Figure 5A shows the increase of TILs by nanoPDLIM2 in animals that did not receive PD-1 blockade immunotherapy, Figure 5D shows the increase of TILs by nanoPDLIM2 in animals received PD-1 blockade immunotherapy.

      8) It is unclear whether PDLIM2 confers an additive or a synergistic effect with anti-PD-1/chemo.

      PDLIM2 nanotherapy confers a synergistic effect with chemotherapy on increasing apoptosis in tumors (Figure 4B) and tumor reduction (Figure 4A and 6E, left panel, tumor number), confers a synergistic effect with antiPD-1 on increasing CD4+ and CD8+ TILs (Figure 5A and 5D), and apoptosis in tumors (Figure 5F), and an additive effect on tumor reduction (Figure 5C and 6E), and confers a synergistic effect with chemotherapy plus anti-PD-1 on increasing CD4+ and CD8+ TILs (Figure 5A and 6F) and tumor reduction (Figure 6E, left panel, tumor number).

      9) Have the authors tested any toxicity in normal lungs?

      Same to tumor lungs, no obvious toxicity has been observed in normal lungs.

      Reviewer #1 (Recommendations For The Authors):

      The paper is clear and well-written, although some minor edits are needed. For example, the title could be changed to reflect both human and mouse studies in the manuscript for more general readers. Moreover, 'lung cancer' should be used instead of 'lung cancers'. The manuscript could be further improved by validating their findings in a different model and particularly the syngeneic model of metastatic lung cancer for a better overall survival time by the new combination therapy, given the fact that clinical trial studies usually start in patients with metastatic tumors. But this is optional because the therapeutic effect on primary lung cancer is already significant.

      Thanks for the correction and wonderful suggestions. The “lung cancers” were replaced with “lung cancer”, and the title was changed to “Improving PD-1 blockade plus chemotherapy for complete remission of lung cancer by nanoPDLIM2”.

      Reviewer #2 (Recommendations For The Authors):

      1) What is the rationale for i.v. injection of nanoparticles containing PDLIM2 plasmid? Intranasal administration of nanoparticles may potentially target nanoPDLIM2 specifically to the lungs. Another potential option is intranasal infection of mice with adenovirus expressing PDLIM2.

      The rationale for i.v. injection of nanoPDLIM2 is that iv injected nanoPDLIM2 first reach into the lung and more importantly tumor tissues as well as the convenience and high efficacy of mouse i.v. injection, particularly when multiple injections are needed. Mice are much less stressful compared to other intranasal or even intratracheal injection. Adenovirus can be used only once, because it will initiate ant-viral immune response in mice.

      2) The authors examine PDLIM2 expression in lung tumors 1 week after i.v. administration of nanoparticles (Fig. 3A). Do all tumor cells express PDLIM2 after nanoPDLIM2 treatment? How long does PDLIM2 persist in the tumors? The kinetics of PDLIM2 expression may be informative to help interpret the results from the various combination treatments given to the mice. Multiple rounds of nanoPDLIM2 treatment could potentially improve the efficacy of the treatment.

      For all the sections examined (n=6), PDLIM2 was re-expressed in most but not all lung cancer cells at 1-week of the i.v administration. Accordingly, nanoPDLIM2 was injected weekly. We are examining if PDLIM2 reexpression can last longer. We are also testing the best dose with the best efficacy.

      3) Does the plasmid DNA from nanoparticles trigger an innate immune response in the lung that contributes to anti-tumor responses?

      In line with previous studies showing no effect on immune responses (Bonnet et al. 2008. PMID: 18709489), the dose used in current study does not significantly affect immune cells in the lung, suggesting no obvious effect of nanoparticles with empty plasmid on innate immune response.

      4) In Fig. 4, does the combination of nanoPDLIM2 and chemotherapy diminish STAT3 nuclear staining?

      NanoPDLIM2 alone decreased nuclear STAT 3 in tumor cells (Figure 2C), it also diminished nuclear STAT3 in tumor cells with the combination of chemotherapy.

    1. Author Response

      We thank you for your careful review of our manuscript and helpful comments and suggestions. We have carefully considered each point and have addressed them by adding changes to the manuscript and figures. The text below detailed our responses and edits.

      Reviewer #1 (Public Review):

      Summary:

      Liao et al leveraged two powerful genomics techniques-CUT&RUN and RNA sequencing-to identify genomic regions bound by and activated or inactivated by SMAD1, SMAD5, and the progesterone receptor during endometrial stromal cell decidualization.

      Strengths:

      The authors utilized powerful next generation sequencing and identified important transcriptional mechanisms of SMAD1/5 and PGR during decidualization in vivo.

      Weaknesses:

      Overall, the manuscript and study are well structured and provide critical mechanistic updates on the roles of SMAD1/5 in decidualization and preparation of the maternal endometrium for pregnancy. Please consider the following to improve the manuscript:

      • Figure 4: A and C show bar graphs, not histograms. Please alter this phrasing.

      Figure legends were adjusted as suggested.

      • What post hoc test was performed on qPCR analyses? (Figure 6). It is evident that any assumptions of equal variance need to be negated due to the wide dispersion in experimental response invalidating the assumptions of a one-way ANOVA.

      Yes, a Tukey’s post hoc test was performed on the qPCR analyses. To address the reviewer’s question regarding equal variance, normality of the dataset was examined by D’agostino & Pearson test in GraphPad Prism. The data demonstrated a normal distribution pattern, thus justifying the one-way ANOVA test.

      • Figure 6: what data points are plotted? Are these technical replicates from individual wells or qPCR technical replicates?

      The dataset represents three technical and three biological data points.

      • Figure 6: Consider changing graph colors to increase visibility of error bars and data points.

      Thank you for this suggestion. The colors of the error bars in Figure 6 have been changed to increase visibility. Additionally, different shapes have been utilized to distinguish between different groups.

      • Figure 6 legend: no histograms are shown in this figure. Refer to all gene names utilizing proper nomenclature and conventions (gene names should be italicized).

      The legend was adjusted as suggested with the correct nomenclature implemented.

      • qPCR analyses: qPCR normalization should be done to at least two internal control genes, preferably three according to the MIQE guidelines (PMID: 19246619).

      As suggested, we have performed additional qPCR analysis with normalization done to three internal controls.

      • Supplement figure 2: graphs are bar graphs, not histograms.

      The legends have been changed as suggested.

      Reviewer #2 (Public Review):

      Summary:

      Liao and colleagues generated tagged SMAD1 and SMAD5 mouse models and identified genome occupancy of these two factors in the uterus of these mice using the CUT&RUN assay. The authors used integrative bioinformatic approaches to identify putative SMAD1/5 direct downstream target genes and to catalog the SMAD1/5 and PGR genome co-localization pattern. The role of SMAD1/5 on stromal decidualization was assayed in vitro on primary human endometrial stromal cells. The new mouse models offer opportunities to further dissect SMAD1 and SMAD5 functions without the limitation from SMAD antibodies, which is significant. The CUT&RUN data further support the usefulness of these mouse models for this purpose.

      Strengths:

      The strength of this study is the novelty of new mouse models and the valuable cistromic data derived from these mice.

      Weaknesses:

      The weakness of the present version of the manuscript includes the self-limited data analysis approaches such as the proximal promoter based bioinformatic filter and a missed opportunity to investigate the role of SMAD1/5 on determining the genome occupancy of major uterine transcription regulators.

      Thank you for the comments. We addressed the limitation of the promoter-based analysis in the discussion and pointed out the possibility of analyzing additional genomics features (Lines 548551). Based on the suggestions, we also included an analysis in which we compared SMAD1/5 binding activities in this study to known major uterine transcription regulators’ binding activities (namely, SOX17 and NR2F2) using published ChIP-seq data in the mouse uterus. Results from this analysis are discussed in Lines 426-436. Content from the adjusted manuscript is copied below.

      Lines 548-551:

      “From pathway enrichment analysis, we demonstrate that genes with SMAD1/5 and PR bound at the promoter regions are enriched for key pathways in directing the decidualization process, such as WNT and relaxin signaling pathways. Future studies can benefit from analyzing binding events beyond the promoter regions.”

      Lines 426-436:

      “To further evaluate the key roles of SMAD1/5 as major uterine transcription regulators, we cross-compared the genomic binding sites of SMAD1/5 with known key transcription factors, namely aforementioned SOX17 (Supplement Figure 1E), as well as NR2F2 (Supplement Figure 1F), an essential regulator of hormonal response, using our CUT&RUN data sets and published mouse uterine SOX17 and NR2F2 ChIP-seq data sets (GSE118328, GSE232583). Among the annotated genes, 5402 genes are shared between SMAD1/5 and SOX17, and 1922 genes are shared between SMAD1/5 and NR2F2. Such observations indicate a potential co-regulatory mechanism between SMAD1/5 and other key uterine transcription factors in maintaining appropriate uterine functions. Overall, our analyses demonstrate that the transcriptional activity of SMAD1, SMAD5, and PR coordinate the expression of key genes required for endometrial receptivity and decidualization.”

      Reviewer #3 (Public Review):

      Summary:

      As SMAD1/5 activities have previously been indistinguishable, these studies provide a new mouse model to finally understand unique downstream activation of SMAD1/5 target genes, a model useful for many scientific fields. Using CUT&RUN analyses with gene overlap comparisons and signaling pathway analyses, specific targets for SMAD1 versus SMAD5 were compared, identified, and interpreted. These data validate previous findings showing strong evidence that SMADs directly govern critical genes required for endometrial receptivity and decidualization, including cell adhesion and vascular development. Further, SMAD targets were overlapped with progesterone receptor binding sites to identify regions of potential synergistic regulation of implantation. The authors report strong correlations between progesterone receptor and SMAD1/5 direct targets to cooperatively promote embryo implantation. Finally, the authors validated SMAD1/5 gene regulation in primary human endometrial stromal cells. These studies provide a data-rich survey of SMAD family transcription, defining its role as a governor of early pregnancy.

      Strengths:

      This manuscript provides a valuable survey of SMAD1/5 direct transcriptional events at the time of receptivity. As embryo implantation is controlled by extensive epithelial to stromal molecular crosstalk and hormonal regulation in space and time, the authors state a strong, descriptive narrative defining how SMAD1/5 plays a central role at the site of this molecular orchestration. The implementation of cutting-edge techniques and models and simple comparative analyses provide a straightforward, yet elegant manuscript.

      Although the progesterone receptor exists as a major regulator of early pregnancy, the authors have demonstrated clear evidence that progesterone receptor with SMAD1/5 work in concert to molecularly regulate targets such as Sox17, Id2, Tgfbr2, Runx1, Foxo1 and more at embryo implantation. Additionally, the authors pinpoint other critical transcription factor motifs that work with SMADs and the progesterone receptor to promote early pregnancy transcriptional paradigms.

      Weaknesses:

      Although a wonderful new tool to ascertain SMAD1 versus SMAD5 downstream signaling, the importance of these factors in governing early pregnancy is not novel. Furthermore, functional validation studies are needed to confirm interactions at promoter regions. Addtionally, the authors presume that all overlapped genes are shared between progesterone receptor and SMAD1/5, yet some peak representations do not overlap. Although, transcriptional activation can occur at the same time, they may not occur in the same complex. Thus, further confirmation of these transcriptional events is warranted.

      Thank you for the review; we appreciate these valuable comments. Although we used an overlap approach to investigate the gene regulatory networks between SMAD1/5 and PR at the gene level, we functionally validated the regulatory effect in an in vitro decidualization model using a qPCR approach. We acknowledge that gene activations may not occur at the exact same complex, but functional validation screenings at the promoter level are beyond the scope of the study. However, we added the discussion about the possibility of proposed investigations in Lines 553-558. Our current dataset and validation studies support our conclusions with robust evidence. Content from Lines 553-558 is copied below.

      Lines 553-558: “In this study, we determined the overlapped transcriptional control between SMAD1/5 and PR at the gene level, and functionally validated the regulatory effect at the transcript level in a human stromal cell decidualization model. While we observe a subset of peak representations that do not overlap at the base pair level in the promoter regions, future functional screenings at the promoter level, such as luciferase reporter assays to assess transcriptional co-activation by SMAD1/5 and PR, will advance this study.”

      • Since whole murine uterus was used for these studies, the specific functions of SMAD1/5 in the stroma versus the epithelium (versus the myometrium) remain unknown. Specific roles for SMAD1/5 in the uterine stroma and epithelial compartments still need to be examined. Also, further work is needed to delineate binding and transcriptional activation of SMAD1/5 and the progesterone receptor in stromal versus epithelial uterine compartments.

      Thank you for the comments. Indeed, our study was performed in the whole mouse uterus, which includes stroma, epithelium and myometrium. Our previous data shows that nuclear SMAD1/5 are localized to both the stroma and epithelium in the decidua zone during the decidualization process at 4.5 dpc (PMID:34099644). Published in vivo studies also demonstrate the essential role of SMAD1/5 in the uterine epithelium and stroma compartments, respectively (PMIDs:35383354/27335065/17967875). Although we believe the binding/transcriptional activation of SMAD1/5 and PR occurs in both compartments based on the mouse phenotypic data, opportunities for further compartment-specific analysis were granted and discussion regarding such investigations was added (Lines 501-513). Content from Lines 501-513 is copied below.

      Lines 501-513:

      “Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5dpc during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that transcriptional co-regulation by SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out the potential coregulation of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

      • There are asynchronous gene responses in the SMAD1/5 ablated mouse model compared to the siRNA-treated human endometrial stromal cells. These differences can be confounding, and more clarity is required in understanding the meaning of these differences and as they relate to the entire SMAD transcriptome.

      Thank you for the comments. From the mouse models with SMAD1/5 conditional deletions, we observed phenotypic defects at 4.5 dpc, which is the beginning of decidualization in the mouse. Our study used human endometrial stromal cells as a model to validate our findings functionally, aiming to mimic the specific time point during decidualization. Differences between the two models may arise from the strategy used to perturb SMAD1/5; in the mouse, a complete knockout of SMAD1/5 was used, resulting in failed decidualization, while the human endometrial stromal cells used an siRNA knockdown approach, which decreased the potential for decidualization. As such, this information needs to be considered when evaluating genome-wide effects on the transcriptome. We added a discussion of this point to Lines 564-572. Content from Lines 564-572 is copied below.

      Lines 564-572:

      “Since mice only undergo decidualization upon embryo implantation whilst human stromal cells undergo cyclic decidualization in each menstrual cycle in response to rising levels of progesterone, asynchronous gene responses may occur in comparison between mouse models and human cells. However, cellular transformation during decidualization is conserved between mice and humans, which makes findings in the mouse models a valuable and transferable resource to be evaluated in human tissues. Accordingly, our functional validation studies were performed using human endometrial stromal cells induced to decidualize in vitro for four days, which models the early phases of decidualization. Additional transcriptomic studies of the SMAD1/5 perturbations in human endometrial stromal cells will be of great resource in understanding the entire SMAD1/5 regulomes in humans.”

      Reviewer #1 (Recommendations For The Authors):

      • Minor grammatical errors requiring attention such as inserting punctuation at the end of sentences and including figure legends prior to the end of sentence punctuation.

      Thanks for the comments. Additional proofreading was conducted for the revision.

      Reviewer #2 (Recommendations For The Authors):

      1) Between SMAD1 and SMAD5, does losing one SMAD affect the other SMAD's genome occupancy?

      Thanks for the comments. Based on the mouse phenotypic data that conditional deletion of SMAD1 in the uterus does not affect female fertility, while conditional deletion of SMAD5 leads to subfertility, and conditional deletion of both SMAD1 and SMAD5 leads to complete infertility. We believe losing one SMAD will affect the other SMAD's genome occupancy. This point is discussed in Lines 514-517, with contents copied below.

      Lines 514-517: “Although our studies herein confirm that SMAD1 and SMAD5 proteins have distinct transcriptional regulatory activities, our previous studies demonstrated that while SMAD5 can functionally replace SMAD1, SMAD1 cannot replace SMAD5 in the uterus. How this epistatic relationship is established in a tissue-specific manner still needs to be determined by further biochemical investigations.”

      2) In light of SMAD1/5 and PGR co-occupied cis-acting elements and coregulating uterine transcriptome, does loss of SMAD1/5 alter the PGR and ESR1 genome occupancy?

      Thanks for the comments. In the SMAD1/5 double conditional knockout mice, we observe the hyposensitivity towards progesterone and unopposed estrogen responses. We hypothesize that loss of SMAD1/5 alters PR genome occupancy and subsequently ER genome occupancy is altered as a secondary effect. To functionally address this question, genomic profiling studies need to be performed in the SMAD1/5 knockout mice, and, ideally, also performed in the PR knockout mice. However, such large-scale studies are beyond the scope of the current study and will not affect our conclusions under physiological conditions. We did include additional discussion regarding this comment in Lines 551-553, with the contents copied below.

      Lines 551-553: “Profiling the PR genome occupancy in the SMAD1/5 deficient mice would provide an interesting perspective to reevaluate the major regulatory roles of SMAD1/5 in mediating uterine transcriptomes.”

      3) In terms of investigating the impact of SAMD1/5 on cell type composition, perhaps the digital cytometry approach (e.g., PMID: 31061481) could provide unbiased inferences.

      Thank you for the comments. We included expression analysis of a subset of SMAD1/5 direct target genes over different uterine compartments (Figure 4E). We also added the discussion of the opportunities for further compartment-specific analysis, including but not limited to the digital cytometry approach in Lines 506-513, with the contents copied below.

      Line 506-513:

      “Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential co-regulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

      4) The limitation of focusing on the promoter occupied SMADs should be discussed.

      Additional discussion of the limitation of focusing on the promoter regions was added in Lines 548-551, with contents copied below.

      Lines 548-551:

      “From pathway enrichment analysis, we demonstrate that genes with SMAD1/5 and PR bound at the promoter regions are enriched for key pathways in directing the decidualization process, such as WNT and relaxin signaling pathways. Future studies can benefit from analyzing binding events beyond the promoter regions.”

      5) Methods: The reagent and the condition for PGR CUT&RUN is missing.

      Information added in Line 153.

      1. Line 260: Please clarify the statement of "suggesting the transcriptional of PR depends on BMP/SMAD1/5 signaling".

      Thanks for the suggestion. The sentence was rephrased to (Lines 258-261) “Our previous studies revealed that conditional ablation of SMAD1 and SMAD5 in the uterus decreased P4 response during the peri-implantation period, suggesting that the transcriptional activities of PR depend on BMP/SMAD1/5 signaling.”

      7) Line 280-289: This statement belongs to the discussion section.

      The statement was moved as suggested.

      8) Figure 4E is not cited in the result section.

      Figure 4E was cited in the results section in the revised version. (Line 386)

      9) Figures 3C, 3D, 3E, 3F, 5B and 5D: please include the full lists in the supplemental data so that labs with limited bioinformatic capabilities could use these findings to facilitate scientific discovery.

      Data regarding the aforementioned figures were included in Supplement Tables 3-8 and Supplement Files 1-2.

      10) Figure 2B and Figure 5A: the heatmaps without further grouping on common and distinct genome occupancy among assayed factors provided minimum useful information. Please reconsider the presentation format in order to deliver more meaningful results.

      Figure 2B and Figure 5A were replotted with clustering using the k-means algorithm. Methods and legends were updated accordingly.

      Reviewer #3 (Recommendations For The Authors):

      To delineate specific roles for SMAD1/5 in the uterine stroma and epithelial compartments, methods such as single cell sequencing or spatial transcriptomic analysis may be warranted.

      The manuscript now includes the discussion of future opportunities in investigating the roles of SMAD1/5 in different uterine compartments using single-cell sequencing and/or spatial transcriptomic analysis (Lines 498-513), with contents copied below.

      Lines 498-513:

      “Our studies also examined the role of SMAD1/5 in mediating progesterone responses at the genomic and transcription levels. Similarly, our analysis was based on data sets generated from the whole mouse uterus, which contains multiple compartments of the uterine structures, including but not limited to epithelium and stroma. Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5 dpc, during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential co-regulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future single-cell sequencing (i.e., digital cytometry) and/or spatial transcriptomic analysis.”

    1. Author Response

      We would like to thank the editor and the reviewers for their constructive comments and the chance to revise the manuscript. The suggestions have allowed us to improve our manuscript. We have been able to fulfil all reviewer comments and added new statistical analyses to examine associations for subsets of data. Whilst suggested by a reviewer, we did not perform large-scale experiments to confirm the viability of low sporozoite densities at different time-points post salivary gland colonization. For these assays there are currently no satisfactory in vitro models for sporozoites harvested from single mosquitoes and setting up and validating such experiments could be a PhD project in itself. We do consider this suggestion very relevant but beyond the scope of the current work.

      Relevantly, during the time the manuscript was under review at eLife, we have been able to examine the multiplicity of infection in our field experiments. This was, as written in the original manuscript, a key reason to also perform experiments in the field where there is a greater diversity of parasite lines. We have successfully performed AMA-1 amplicon deep sequencing on infected mosquito salivary glands and infected skins. Although this does not change the key messages of the manuscript and is secondary to our main hypothesis, we do consider it a relevant addition since we were able to demonstrate that for some infected mosquitoes from the Burkina Faso study, multiple clones were expelled by mosquitoes during probing on a single piece of artificial skin. We have added a short paragraph to our revised manuscript and updated the acknowledgement section to include the supporting researcher who conducted those experiments.

      Reviewer #1 (Public Review):

      Summary: There is a long-believed dogma in the malaria field; a mosquito infected with a single oocyst is equally infectious to humans as another mosquito with many oocysts. This belief has been used for goal setting (and modelling) of malaria transmission-blocking interventions. While recent studies using rodent malaria suggest that the dogma may not be true, there was no such study with human P. falciparum parasites. In this study, the numbers of oocysts and sporozoite in the mosquitoes and the number of expelled sporozoites into artificial skin from the infected mosquito was quantified individually. There was a significant correlation between sporozoite burden in the mosquitoes and expelled sporozoites. In addition, this study showed that highly infected mosquitoes expelled sporozoites sooner.

      Strengths:

      • The study was conducted using two different parasite-mosquito combinations; one was lab-adapted parasites with Anopheles stephensi and the other was parasites, which were circulated in infected patients, with An. coluzzii. Both combinations showed statistically significant correlations between sporozoite burden in mosquitoes and the number of expelled sporozoites.

      • Usually, this type of study has been done in group bases (e.g., count oocysts and sporozoites at different time points using different mosquitoes from the same group). However, this study determined the numbers in individual bases after multiple optimization and validation of the approach. This individual approach significantly increases the power of correlation analysis.

      Weaknesses:

      • In a natural setting, most mosquitoes have less than 5 oocysts. Thus, the conclusion is more convincing if the authors perform additional analysis for the key correlations (Fig 3C and 4D) excluding mosquitoes with very high total sporozoite load (e.g., more than 5-oocyst equivalent load).

      In the revised manuscript, we have also performed our analysis including only the subset of mosquitoes with low oocyst burden. In our Burkina Faso experiments, where we could not control oocyst density, 48% (15/31) of skins were from mosquitoes with <5 oocyst sheets. Whilst low oocyst densities were thus not very uncommon, we acknowledge that this may have rendered some comparisons underpowered. At the same time, we observe a strong positive trend between oocyst density and sporozoite density and between salivary gland sporozoite density and mosquito inoculum. This makes it very likely that this trend is also present at lower oocyst densities, an association where sporozoite inoculation saturates at high densities is plausible and has been observed before for rodent malaria (DOI: 10.1371/journal.ppat.1008181) whilst we consider it less likely that sporozoite expelling would be more efficient at low (unmeasured) sporozoite densities.

      • As written as the second limitation of the study, this study did not investigate whether all expelled sporozoites were equally infectious. For example, Day 9 expelled sporozoites may be less infectious than Day 11 sporozoites, or expelled sporozoites from high-burden mosquitoes may be less infectious because they experience low nutrient conditions in a mosquito. Ideally, it is nice to test the infectivity by ex vivo assays, such as hepatocyte invasion assay, and gliding assay at least for salivary sporozoites. But are there any preceding studies where the infectivity of sporozoites from different conditions was evaluated? Citing such studies would strengthen the argument.

      We appreciate this thought and can see the value of these experiments. We are not aware of any studies that examined sporozoite viability in relation to the day of salivary gland colonization or sporozoite density.

      One previous study assessed the NF54 sporozoite infectivity on different days post infection (days 12-13-14-15-16-18) and observed no clear differences in ‘per sporozoite hepatocyte invasion capacity’ over this period (DOI: 10.1111/cmi.12745). We nevertheless agree that it is conceivable that sporozoites require maturation in the salivary glands and might not all be equally infectious. While hepatocyte invasion experiments are conducted with bulk harvesting of all the sporozoites that are present in the salivary glands, it would even be more interesting to assess the invasion capacity of the smaller population of sporozoites that migrate to the proboscis to be expelled. This would, as the reviewer will appreciate, be a major endeavour. To do this well the expelled sporozoites would need to be harvested from the salivary glands/proboscis and used in the best and most natural environment for invasion. The suggested work would thus depend on the availability of primary hepatocytes since conventional cell-lines like HC-04 are likely to underestimate sporozoite invasion. Importantly, there are currently no opportunities to include the barrier of the skin environment in invasion assays whilst this may be highly important in determining the likelihood that sporozoites manage to achieve invasion and give rise to secondary infections. In short, we agree with the reviewer that these experiments are of interest but consider these well beyond the scope of the current work. We have added a section to the Discussion section to highlight these future avenues for research. ‘Of note, our assessments of EIP and of sporozoite expelling did not confirm the viability of sporozoites. Whilst the infectivity of sporozoites at different time-points post infection has been examine previously (https://doi.org/10.1111/cmi.12745), these experiments have never been conducted with individual mosquito salivary glands. To add to this complexity, such experiments would ideally retain the skin barrier that may be a relevant determinant for invasion capacity and primary hepatocytes.’

      • Since correlation analyses are the main points of this paper, it is important to show 95% CI of Spearman rank coefficient (not only p-value). By doing so, readers will understand the strengths/weaknesses of the correlations. The p-value only shows whether the observed correlation is significantly different from no correlation or not. In other words, if there are many data points, the p-value could be very small even if the correlation is weak.

      We appreciate this comment and agree that this is indeed insightful. We have added the 95% confidence intervals to all figure legends and main text. We also provide them below.

      Fig 3b: 95% CI: 0.74, 0.85

      Fig 3c: 95% CI: 0.17, 0.50

      Fig 4c: 95% CI: 0.80, 0.95

      Fig 4d: 95% CI: 0.52, 0.82

      Supp Fig 5a: 95% CI: 0.74, 0.85

      Supp Fig 5b: 95% CI: 0.73, 0.93

      Supp Fig 6: 95% CI: 0.11, 0.48

      Supp Fig 7: 95% CI: -0.12, 0.16

      Reviewer #2 (Public Review):

      Summary: The malaria parasite Plasmodium develops into oocysts and sporozoites inside Anopheles mosquitoes, in a process called sporogony. Sporozoites invade the insect salivary glands in order to be transmitted during a blood meal. An important question regarding malaria transmission is whether all mosquitoes harbouring Plasmodium parasites are equally infectious. In this paper, the authors investigated the progression of P. falciparum sporozoite development in Anopheles mosquitoes, using a sensitive qPCR method to quantify sporozoites and an artificial skin system to probe for parasite expelling. They assessed the association between oocyst burden, salivary gland infection intensity, and sporozoites expelled.

      The data show that higher sporozoite loads are associated with earlier colonization of salivary glands and a higher prevalence of sporozoite-positive salivary glands and that higher salivary gland sporozoite burdens are associated with higher numbers of expelled sporozoites. Intriguingly, there is no clear association between salivary gland burdens and the prevalence of expelling, suggesting that most infections reach a sufficient threshold to allow parasite expelling during a mosquito bite. This important observation suggests that low-density gametocyte carriers, although less likely to infect mosquitoes, could nevertheless contribute to malaria transmission.

      Strengths: The paper is well written and the work is well conducted. The authors used two experimental models, one using cultured P. falciparum gametocytes and An. stephensi mosquitoes, and the other one using natural gametocyte infections in a field setup with An. coluzzii mosquitoes. Both studies gave similar results, reinforcing the validity of the observations. Parasite quantification relies on a robust and sensitive qPCR method, and parasite expelling was assessed using an innovative experimental setup based on artificial skin.

      Weaknesses: There is no clear association between the prevalence of sporozoite expelling and the parasite burden. However, high total sporozoite burdens are associated with earlier and more efficient colonization of the salivary glands, and higher salivary gland burdens are associated with higher numbers of expelled sporozoites. While these observations suggest that highly infected mosquitoes could transmit/expel parasites earlier, this is not directly addressed in the study. In addition, whether all expelled sporozoites are equally infectious is unknown. The central question, i.e. whether all infected mosquitoes are equally infectious, therefore remains open.

      We agree that the manuscript provides important steps forward in our understanding of what makes an infectious mosquito but does not conclusively demonstrate that highly infected mosquitoes are more likely to initiate a secondary infection. We consider this to be beyond the scope of the current work although the current work lays the foundation for these important future studies. For human Plasmodium infections the most satisfactory answer on the infectiousness of low versus high infected mosquitoes comes from controlled human infection models. In response to reviewer comments, we have extended our Discussion section to highlight this importance. To accommodate the (very fair) reviewer comments, we have avoided any phrasings that suggest that our findings demonstrate differences in transmission.

      Reviewer #3 (Public Review):

      Summary: This study uses a state-of-the-art artificial skin assay to determine the quantity of P. falciparum sporozoites expelled during feeding using mosquito infection (by standardised membrane feeding assay SMFA) using both cultured gametocytes and natural infection. Sporozoite densities in salivary glands and expelled into the skin are quantified using a well-validated molecular assay. These studies show clear positive correlations between mosquito infection levels (as determined by oocyst numbers), sporozoite numbers in salivary glands, and sporozoites expelled during feeding. This indicates potentially significant heterogeneity in infectiousness between mosquitoes with different infection loads and thus challenges the often-made assumption that all infected mosquitoes are equally infectious.

      Strengths: Very rigorously designed studies using very well validated, state-of-the-art methods for studying malaria infections in the mosquito and quantifying load of expelled sporozoites. This resulted in very high-quality data that was well-analyzed and presented. Both sources of gametocytes (cultures vs. natural infection) show consistent results further strengthening the quality of the results obtained.

      Weaknesses: As is generally the case when using SMFAs, the mosquito infections levels are often relatively high compared to wild-caught mosquitoes (e.g. Bombard et al 2020 IJP: median 3-4 ), and the strength of the observed correlations between oocyst sheet and salivary gland sporozoite load even more so between salivary gland sporozoite load and expelled sporozoite number may be dominated by results from mosquitoes with infection levels rarely observed in wild-caught mosquitoes. This could result in an overestimation of the importance of these well-observed positive relationships under natural transmission conditions. The results obtained from these excellently designed and executed studies very well supported their conclusion - with a slight caveat regarding their application to natural transmission scenarios

      For efficiency and financial reasons, we have worked with an approach to enhance mosquito infection rates. If we had worked with gametocytes at physiological concentrations and a small number of donors, we probably have had considerably lower mosquito infection rates. Whilst this would indeed result in lower infection burdens in the sparse infected mosquitoes, addressing the reviewer concern, it would have made the experiments highly inefficient and expensive. The skin mimic was initially provided free of charge when the matrix was close to the expiry date but for the experiments in Burkina Faso we had to purchase the product at market value. Whilst we consider the biological question sufficiently important to justify this investment – and think our findings prove us right – it remained important to avoid using skins for uninfected mosquitoes. Since oocyst prevalence and density are strongly correlated (doi: 10.1016/j.ijpara.2012.09.002; doi: 10.7554/eLife.34463), a low oocyst density in natural infections typically coincides with a high proportion of negative mosquitoes.

      Of note, our approach did result in the inclusion of 15 skins from infected mosquitoes with 1-4 oocysts. This number may be modest but we did include observations from this low oocyst range which is, we agree, highly important for better understanding malaria epidemiology.

      This work very convincingly highlights the potential for significant heterogeneity in the infectiousness between individual P. falciparum-infected mosquitoes. Such heterogeneity needs to be further investigated and if again confirmed taken into account both when modelling malaria transmission and when evaluating the importance of low-density infections in sustaining malaria transmission.

      Reviewer #4 (Public Review):

      Summary: The study compares the number of sporozoites expelled by mosquitoes with different Plasmodium infection burden. To my knowledge this is the first report comparing the number of expelled P. falciparum sporozoites and their relation to oocyst burden (intact and ruptured) and residual sporozoites in salivary glands. The study provides important evidence on malaria transmission biology although conclusions cannot be drawn on direct impact on transmission.

      Strengths: Although there is some evidence from malaria challenge studies that the burden of sporozoites injected into a host is directly correlated with the likelihood of infection, this has been done using experimental infection models which administer sporozoites intravenously. It is unclear whether the same correlation occurs with natural infections and what the actual threshold for infection may be. Host immunity and other host related factors also play a critical role in transmission and need to be taken into consideration; these have not been mentioned by the authors. This is of particular importance as host immunity is decreasing with reduction in transmission intensity.

      Weaknesses: The natural infections reported in the study were not natural as the authors described. Gametocyte enrichment was done to attain high oocyst infection numbers. Studying natural infections would have been better without the enrichment step. The infected mosquitoes have much larger infection burden than what occurs in the wild.

      Nevertheless, the findings support the same results as in the experiments conducted in the Netherlands and therefore are of interest. I suggest the authors change the wording. Rather than calling these "natural" infections, they could be called, for example, "experimental infections with wild parasite strains".

      We have addressed these concerns and, in the process, also changed our manuscript title. The following sentences have been changed:

      “It is currently unknown whether all Plasmodium falciparum infected mosquitoes are equally infectious. We assessed sporogonic development using cultured gametocytes in the Netherlands and natural infections in Burkina Faso”.

      Now reads: “It is currently unknown whether all Plasmodium falciparum infected mosquitoes are equally infectious. We assessed sporogonic development using cultured gametocytes in the Netherlands and experimental infections with naturally circulating parasite strains in Burkina Faso”. 226-228 “Experimental infections with naturally circulating parasite strains show comparable correlation between oocyst density, salivary gland density and sporozoite inoculum”.

      Has now replaced the original phrasing: “Natural infected mosquitoes by gametocyte carriers in Burkina Faso show comparable correlation between oocyst density, salivary gland density and sporozoite inoculum”.

      I do not believe the study results generate sufficient evidence to conclude that lower infection burden in mosquitoes is likely to result in changes to transmission potential in the field. In study limitations section, the authors say "In addition, our quantification of sporozoite inoculum size is informative for comparisons between groups of high and low-infected mosquitoes but does not provide conclusive evidence on the likelihood of achieving secondary infections. Given striking differences in sporozoite burden between different Plasmodium species - low sporozoite densities appear considerably more common in mosquitoes infected with P. yoelii and P. berghei the association between sporozoite inoculum and the likelihood of achieving secondary infections may be best examined in controlled human infection studies. However, in the abstract conclusion the authors state "Whilst sporozoite expelling was regularly observed from mosquitoes with low infection burdens, our findings indicate that mosquito infection burden is associated with the number of expelled sporozoites and may need to be considered in estimations of transmission potential." Kindly consider ending the sentence at "expelled sporozoites." Future studies on CHMI can be recommended as a conclusion if authors feel fit.

      We agree that we need to be very cautious with conclusions on the impact of our findings for the infectious reservoir. We have rephrased parts of our abstract and have updated the Discussion section following the reviewer suggestions. We agree with the reviewer that CHMI studies are recommended and have expanded the Discussion section to make this clearer. The sentence in the abstract now ends as:

      "Whilst sporozoite expelling was regularly observed from mosquitoes with low infection burdens, our findings indicate that mosquito infection burden is associated with the number of expelled sporozoites. Future work is required to determine the direct implications of these findings for transmission potential."

      Reviewer #1 (Recommendations For The Authors):

      • Prevalence data shown in Fig 2A and Table S1 are different. For example, >50K at Day 11, Fig 2A shows ~85% prevalence, but Table S1 says 100%. If the prevalence in Table S1 shows a proportion of observations with positive expelled sporozoites (instead of a proportion of positive mosquitoes shown in Fig 2A), then the prevalence for <1K at Day 11 cannot be 6.7% (either 0 or 20% as there were a total of 5 observations). So in either case, it is not clear why the numbers shown in Fig 2A and Table S1 are different.

      Figure 2A and Table S2 are estimated prevalence and odds ratios from an additive logistic regression model (i.e. excluding the interaction between day and sporozoite categories). Table S1 includes this interaction when estimating prevalence and odds ratios and as we can see some categories in the interaction were extremely small resulting in blown up confidence intervals especially in day 11. So Table S1 and Fig 2A are the results from two different models. Whilst our results are thus correct, we can understand the confusion and have added a sentence to explain the model used in the figure/table legends.

      Figure. 2 Extrinsic Incubation Period in high versus low infected mosquitoes. A. Total sporozoites (SPZ) per mosquito in body plus salivary glands (x-axis) were binned by infection load <1k; 1k-10k; 10k-50k; >50k and plotted against the proportion of mosquitoes (%) that were sporozoite positive (y-axis) as estimated from an additive logistic regression model with factors day and SPZ categories. Supplementary Table S1. The extrinsic incubation period of P. falciparum in An. stephensi estimated by quantification of sporozoites on day 9, 10, 11 by qPCR. Based on infection intensity mosquitoes were binned into four categories (<1k, 1k-10k, 10k-50k, >50) that was assessed by combining sporozoite densities in the mosquito body and salivary gland. Prevalences and odds ratios were estimated from a logistic regression model with factors day, SPZ category and their interaction.

      There are 3 typos in the paper. Please fix them.

      Line 464; ...were counted using a using an incident....

      Line 473; Supplementary Figure 7 should be Fig S8.

      Line 508: ...between days 9 and 10 using a (t=-2.0467)....

      We appreciate the rigour in reviewing our text and have corrected all typos.

      Reviewer #2 (Recommendations For The Authors):

      High infection burdens may result in earlier expelling capacity in mosquitoes, which would reflect more accurately the EIP. The fact that earlier colonization of SG and correlation between SG burden and numbers expelled suggest it could be the case, but it would be interesting to directly measure the prevalence of expelling over time to directly assess the effect of the sporozoite burden (not just at day 15 but before). This could reveal how the parasite burden in mosquitoes is a determinant of transmission.

      We appreciate this suggestion and will consider this for future experiments. It adds another variable that is highly relevant but will also complicate comparisons where sporozoite expelling is related to both time since infectious blood meal and salivary gland sporozoite density (that is also dependent on time since infectious bloodmeal). Moreover, we then consider it important to measure this over the entire duration of sporozoite expelling, including late time-points post infectious bloodmeal. This may form part of a follow-up study.

      Another question is whether all sporozoites (among expelled parasites) are equally infective, i.e. susceptible to induce secondary infection. If not, this could reconcile the data of this study and previous results in the rodent model where high burdens were associated with an increased probability to transmit.

      As also indicated above, we are aware of a single study that assessed NF54 sporozoite infectivity on different days post infection (days 12-13-14-15-16-18) and observed no clear differences in ‘per sporozoite hepatocyte invasion capacity’ over this period (DOI: 10.1111/cmi.12745). We nevertheless agree that it is conceivable that sporozoites require maturation in the salivary glands and might not all be equally infectious. While hepatocyte invasion experiments are conducted with bulk harvesting of all the sporozoites that are present in the salivary glands, it would even be more interesting to assess the invasion capacity of the smaller population of sporozoites that migrate to the proboscis to be expelled. This would, as the reviewer will appreciate, be a major endeavour. To do this well the expelled sporozoites would need to be harvested from the salivary glands/proboscis and used in the best and most natural environment for invasion. The suggested work would thus depend on the availability of primary hepatocytes since conventional cell-lines like HC-04 are likely to underestimate sporozoite invasion. Importantly, there are currently no opportunities to include the barrier of the skin environment in invasion assays whilst this may be highly important in determining the likelihood that sporozoites manage to achieve invasion and give rise to secondary infections. In short, we agree with the reviewer that these experiments are of interest but consider these well beyond the scope of the current work. We have added a section to the Discussion section to highlight these future avenues for research. ‘Of note, our assessments of EIP and of sporozoite expelling did not confirm the viability of sporozoites. Whilst the infectivity of sporozoites at different time-points post infection has been examine previously (ref), these experiments have never been conducted with individual mosquito salivary glands. To add to this complexity, such experiments would ideally retain the skin barrier that may be a relevant determinant for invasion capacity and primary hepatocytes.’

      The authors evaluated oocyst rupture at day 18, i.e. 3 days after feeding experiments (performed at day 15). Did they check in control experiments that the prevalence of rupture oocysts does not vary between day 15 and day 18?

      We did not do this and consider it very unlikely that there is a noticeable increase in the number of ruptured oocysts between days 15 and 18. We observe that salivary gland invasion plateaus around day 12 and the provision of a second bloodmeal that is known to accelerate oocyst maturation and rupture (doi: 10.1371/journal.ppat.1009131) makes it even less likely that a relevant fraction of oocysts ruptures very late. Perhaps most compellingly, the time of oocyst rupture will depend on nutrient availability and rupture could thus occur later for oocysts from a heavily infected gut compared to oocysts from mosquitoes with a low infection burden. We observe a very strong association between salivary gland sporozoite density (day 15) and oocyst density (assessed at day 18) without any evidence for change in the number of sporozoites per oocyst for different oocyst densities. In our revised manuscript we have also assessed correlations for different ranges of oocyst intensities and see highly consistent correlation coefficients and find no evidence for a change in ‘slope’. If oocyst rupture would regularly happen between days 15 and 18 and this late rupture would be more common in heavily infected mosquitoes, we would expect this to affect the associations presented in figures 3B and 4C This is not the case.

      The authors report higher sporozoite numbers per oocyst and a higher proportion of SG invasion as compared to previous studies (30-50% rather than 20%). How do they explain these differences? Is it due to the detection method and/or second blood meal? Or parasite species?

      We were also intrigued by these findings in light of existing literature. To address potential discrepancies, it is indeed possible that the 2nd bloodmeal made a difference. In addition, NF54 is known to be a highly efficient parasite in terms of gametocyte formation and transmission. And there are marked differences in these performances between NF54 isolates and definitely between NF54 and its clone 3D7 that is regularly used. We also used a molecular assay to detect and quantify sporozoites but consider it less likely that this is a major factor in terms of explaining SG invasion since sporozoite densities were typically within the range that would be detected by microscopy. We can only hypothesize that the 2nd bloodmeal may have contributed to these findings and acknowledge this in the revised Discussion section.

      The median numbers of expelled sporozoites seem to be higher in the natural gametocyte infection experiments as compared to the cultures. Is it due to the mosquito species (An. coluzzii versus An. stephensi?).

      The added value of our field experiments, a more relevant mosquito species and more relevant parasite isolates, is also a weakness in terms of understanding possible differences between in vitro experiments and field experiments with naturally circulating parasite strains. We only conclude that our in vitro experiments do not over-estimate sporozoite expelling by using a highly receptive mosquito source and artificially high gametocyte densities. We have clarified this in the revised Discussion.

      39% of sporozoite-positive mosquitoes failed to expel, irrespective of infection densities. Could the authors discuss possible explanations for this observation?

      In paragraph 304-307 we now write that:” This finding broadly aligns with an earlier study of Medica and Sinnis that reported that 22% of P. yoelii infected mosquitoes failed to expel sporozoites. For highly infected mosquitoes, this inefficient expelling has been related to a decrease of apyrase in the mosquito saliva”.

      In Figure 3, it would be interesting to zoom in the 0-1k window, below the apparent threshold for successful expelling.

      We have generated correlation estimates for different ranges of oocyst and sporozoite densities and added these in Supplementary Table 5. We agree that this helps the reader to appreciate the contribution of different ranges of parasite burden to the observed associations.

      In Fig S8. Did they observe intact oocysts with fixed samples? These could be shown as well in the figure.

      We have incorporated this comment. An intact oocyst from fixed samples was now added to Fig S10.

      Minor points

      -line 119: LOD and LOQ could be defined here.

      We agree that this should have been defined. We changed line 119 to explain LOD and LOQ to: …“the limit of detection (LOD) and limit of quantification (LOQ)”….

      • line 126: the title does not reflect the content of this paragraph.

      We have changed the title: “Immunolabeling allows quantification of ruptured oocysts ”into: A comparative analysis of oocyst densities using mercurochrome staining and anti-CSP immunostaining.

      -line 269: infectivity is not appropriate. The data show colonization of SG.

      Line 269: infectivity has been changed with colonization of salivary glands.

      There seems to be a problem with Fig S6. The graph seems to be the same as Fig 3C. Please check whether the graph and legends are correct.

      Supplementary Figure 6 shows the sporozoite expelling density in relation to infection burden with a threshold set at > 20 sporozoites while Fig 3C shows the total sporozoite density (residual salivary gland sporozoites + sporozoites expelled, X-axis) in relation to the number of expelled sporozoites (Y-axis) by COX-1qPCR without any threshold density. We have explained this in more detail in the revised supplemental figure where we now state

      “Of note, this figure differs from Figure 3C in the main text in the following manner. This figure presents sporozoite expelling density in relation to infection burden with a threshold set at > 20 sporozoites to conclude sporozoite positivity while Figure 3C shows the total sporozoite density (residual salivary gland sporozoites + sporozoites expelled, X-axis) in relation to the number of expelled sporozoites (Y-axis) by COX-1 qPCR without any threshold density and thus includes all observations with a qPCR signal”

      Reviewer #3 (Recommendations For The Authors):

      Congratulations to the authors for the really excellently designed and rigorously conducted studies.

      My main concern is in regards to the relatively high oocyst numbers in their experimental mosquitoes (from both sources of gametocytes) compared to what has been reported from wild-caught mosquitoes in previous studies in Burkina Faso.

      We have addressed this concern above. For completeness, we include the main points here again. We enriched gametocytes for efficiency reasons, experiments on gametocytes at physiological concentrations would have resulted in a lower oocyst density (and thus more ‘natural’ although a minority of individuals achieves very high oocyst densities in all studies that included a broad range of oocyst densities (e.g. doi: 10.1016/j.exppara.2014.12.010; doi: 10.1016/S1473-3099(18)30044-6). Of note, we did include 15 skins from low oocyst densities (1-4 oocysts). Whilst low oocyst densities were thus not very uncommon in our sample set, we acknowledge that this may have rendered some comparisons underpowered. At the same time, we observe a strong positive trend between oocyst density and sporozoite density and between salivary gland sporozoite density and mosquito inoculum. This makes it very likely that this trend is also present at lower oocyst densities, an association where sporozoite inoculation saturates at high densities is plausible and has been observed before for rodent malaria (DOI: 10.1371/journal.ppat.1008181) whilst we consider it less likely that sporozoite expelling would be more efficient at low (unmeasured) sporozoite densities. In the revised manuscript we have also performed our analysis including only the subset of mosquitoes with low oocyst burden.

      The best way to address this would be to do comparable artificial skin-feeding experiments on such wild-caught mosquitoes, but I appreciate that this is very difficult to do.

      This would indeed by difficult to do. Mostly because infection status can only be examined post-hoc and it is likely that >95% of mosquitoes are sporozoite negative at the moment experiments are conducted (in many settings this will even be >99%). Importantly, also in wild-caught mosquitoes very high oocyst burdens are observed in a small but relevant subset of mosquitoes (doi: 10.1016/j.ijpara.2020.05.012).

      Instead, I would suggest the authors conduct addition analysis of their data using different cut-offs for maximum oocyst numbers (e.g. <5, <10, <20) to determine if these correlations hold across the entire range of observed oocyst sheets and salivary gland sporozoite load.

      We have provided these calculations for the proposed range of oocyst numbers. In addition, we also provided them for a range of sporozoite densities. These findings are now provided in

      Entire range of observed oocyst sheets and salivary gland sporozoite load. A minor point on the regression lines in Figures 3 & 4: both variables in these plots have inherent variation (measurement & natural), but regression techniques such as reduced major exit regression (MAR) that allow error in both x and y variables may be preferable to a standard lines regression. Also, as it is implausible that mosquitoes with zero sporozoite in salivary glands expel several hundred sporozoites at feeding, the regression should probably also be constrained to pass through the 0,0 point.

      Since the main priority of the analyses is the correlation, and not the fit of the regression line – which is only for indication, and also because of the availability of software, we did not change the type of regression. We have however added a disclaimer to the legend, and we have also forced the intercept to 0 – which does indeed better reflect the biological association. Additionally we added 95% confidence intervals to all Spearman’s correlation coefficients in the legends.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors propose a hypothesis for ovarian carcinogenesis based on epidemiological data, and more specifically they suggest that the latter relates to ascending genital tract "infection" or "dysbiosis", the resulting fallopian tube inflammation ultimately predisposing to ovarian cancer.

      While this hypothesis would ideally be addressed in a longitudinal set-up with repeated female genital tract sampling, such an approach is obviously hard to realize. Rather, the authors present this hypothesis as a rationale for a cross-sectional study involving 81 patients with ovarian cancer (most with the most common subtype of high grade serous ovarian carcinoma, though other subtypes were also included), as well as 106 control patients with various non-infectious conditions including endometriosis and benign ovarian cysts. In all patients was there a comprehensive microbiome sampling of ovarian surface/fallopian tube, cervix and peritoneal cavity as well sampling of a number of potential sources of contamination, including surgery sites, ambient environment, consumables used in the DNA extraction and sequencing pipeline, etc. In line with the hypothesis presented at the outset, species with a threshold of at least 100 reads in both at least one cervical and at least one fallopian tube sample, while absent from environmental swabs, were considered relevant to the postulated pathway.

      Remarkably, fallopian tube microbiota in ovarian cancer patients tended to cluster more closely to those retrieved from the paracolic gutter, than fallopian tube microbiota in non-cancer controls, which showed more relative similarity to vaginal/genital tract microbiota.

      Although not really addressed by the authors, there also seem to be quite a few differences, at least in terms of abundance, in cervical microbiota between ovarian cancer patients and controls as well, which is an interesting finding, even when accounting for differences in age distribution between ovarian cancer patients and included control patients.

      Overall, very few data are available thus far on the upper genital tract/fallopian tube microbiome, while also invariably controversial, as it has proven extremely difficult to obtain pelvic samples in a valid, "sterile" manner, i.e. without affecting a resident low-biomass microbiome to be analyzed. The authors took a number of measures to counter so, and in this respect, this is likely the largest and most valid study on the subject, even though biases and contamination can never be completely excluded in this context.

      As such, I believe the strength of this study and paper primarily relates to the rigour of the methodology, thereby giving us a valuable insight in the presumed fallopian tube/ovarian surface microbiome, which may definitely serve as an impetus and a reference to future translational ovarian cancer research, or ovarian microbiome research for that matter.

      I believe that the authors should acknowledge in more detail, that the data obtained from their cross-sectional study, valid as these are, do not provide any direct support to the hypothesis - albeit also plausible - set forth, a discussion that I somehow missed to a certain extent. It is important to realize in this and related contexts that neoplasia may well induce microbiome alterations through a variety of mechanisms, hence microbiome alterations not per se being causative. Conclusions should therefore be more reserved. Along the same lines, potential biases introduced through the selection of control patients (some detail here would be insightful) also deserves some discussion, as it is not known, whether other conditions such as benign ovarian cysts or endometriosis have some relationship with the human microbiome, be it causative or 'reversely causative', see for instance very recent work in Science Translational Medicine.

      We appreciate the reviewer’s detailed review and thoughtful comments. We have added the following sentences in the Discussion to address the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

      Reviewer #2 (Public Review):

      The authors aimed to investigate the microbiota present in the fallopian tubes (FT) and its potential association with ovarian cancer (OC). They collected swabs intraoperatively from the FT and other surgical sites as controls to profile the FT microbiota and assess its relationship with OC.

      They observed a clear shift in the FT microbiota of OC patients compared to non-cancer patients. Specifically, the FT of OC patients had more types of bacteria typically found in the gastrointestinal tract and the mouth. In contrast, vaginal bacterial species were more prevalent in non-cancer patients. Serous carcinoma, the most common OC subtype, showed a higher prevalence of almost all FT bacterial species compared to other OC subtypes.

      The strengths of the study include its large sample size, rigorous collection methods, and use of controls to identify the possible contaminants. Additionally, the study employed advanced sequencing techniques for microbiota analysis. However, there are some weaknesses to consider. The study relied on swabs collected intraoperatively, which may not fully represent the microbiota in the FT during normal physiological conditions. The study also did not establish causality between the identified bacteria and OC but rather demonstrated an association. Regardless, the findings are important and these questions need to be addressed by future studies. A few additions in data representation and analysis are instead recommended.

      Overall, the authors achieved their aims of identifying the FT microbiota and assessing its relationship with OC. The results support the conclusion that there is a clear shift in the FT microbiota in OC patients, paving the way for further investigations into the role of these bacteria in the pathogenesis of ovarian cancer.

      The identification of specific bacterial species associated with OC could contribute to the development of novel diagnostic and therapeutic approaches. The study design and the data generated here can be valuable to the research community studying the microbiota and its impact on cancer development. However, further research is needed to validate these findings and elucidate the underlying mechanisms linking the FT microbiota shift and OC.

      We appreciate the reviewer’s detailed review and positive comments.

      Reviewer #3 (Public Review):

      The findings of Bo Yu and colleagues titled "Identification of fallopian tube microbiota and its association with ovarian cancer: a prospective study of intraoperative swab collections from 187 patients" describes the identification of the fallopian tube microbiome and relationship with ovarian cancer. The studies are highly rigorous obtaining specimens from the fallopian tube, ovarian surfaces, paracolic gutter of patients of known or suspected ovarian cancer or benign tumor patients. The investigators took great care to ensure there was no or limited contamination including test the surgical suite air, as the test locations are from low abundance microbiota. The findings provide evidence that the microbiota in the fallopian tube, especially in ovarian cancer has similarities to gut microbial communities. This is a potentially novel observation.

      The studies investigate the microbiome of >1000 swabs from 81 ovarian cancer and 106 non-cancer patients. The sites collected are low biomass microbiota making the study particularly challenging. The studies provide descriptive evidence that the ovarian cancer fallopian tube microbiota contain species that are similar to the gut microbiota. In contrast the fallopian tube microbiota of non-cancer patients that exhibit more similarity to the uterine/cervical microbiota. This may be a relevant observation but is highly descriptive with limited insights on the functional relevance.

      The data indicate the presence of low biomass FT microbiota. The findings support the existence of FT microbiota in ovarian cancer that appears to be related to gut microbial species. While interesting, there is no insights on how and why these microbial species are found in the FT. The studies only identify the species but there is no transcriptomic analysis to provide an indication on whether the bacteria are activating DNA damage pathways. This is an interesting observation that requires more insights to address how these bacteria reach the fallopian tube and a related question is whether these bacteria are found in the peritoneum.

      An additional concern is whether these data can be used to develop biomarkers of disease and early detection of disease. can the investigators detect the ovarian cancer FT microbiota in cervical/vaginal secretions? That may yield more significant insights for the field.

      We appreciate the reviewer’s detailed review and thoughtful comments. We have added the following sentences in the Discussion to acknowledge the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

      Reviewer #1 (Recommendations For The Authors):

      I have no additional comments here.

      Reviewer #2 (Recommendations For The Authors):

      The data analysis and data representation could be improved by the following points:

      1. To compare the microbiota and assess the overall microbiota structure difference between the cancer vs non cancer cohort alpha- and beta-diversity of the microbial communities can be conducted.

      2. A differential abundance analysis could also be conducted to assess the differences at the genera and taxa level between the cancer vs non cancer cohorts.

      3. The analysis suggested above can also be conducted in the serous vs non serous cancer cohorts.

      4. In Figure 4 and 5 it would be more intuitive to show the predominant niche of each bacterium by color coding

      We appreciate these helpful suggestions from the reviewer. We have added Figure 2B to address the diversity as well as the differences between cancer versus non-cancer cohorts. We have added in the Results section the description of our findings in Figure 2B. We have added color coding to Figure 4 and 5 as the reviewer suggested.

      Reviewer #3 (Recommendations For The Authors):

      These studies are interesting but are very descriptive with no obvious approaches for understanding the mechanisms of FT microbiota in ovarian cancer. The identification of these bacteria is not sufficient to draw implications on their impact on ovarian cancer development or progression. This needs to be addressed.

      We agree with the reviewer and have added the following sentences in the Discussion to acknowledge the reviewer’s concern: “Due to the cross-sectional nature of the study, we have limited ability to link specific bacteria to ovarian carcinogenesis, as we would need to demonstrate that exposure to bacteria precedes the cancer. However, identifying associations between FT microbiota and OC is a critical first step. Further investigations, especially backed by in vitro studies, are needed to test our initial hypotheses.”

    1. Author Response

      Responses to public reviews

      Reviewer 1

      We thank the reviewer for the valuable and constructive comments and are pleased that the re-viewer finds our study timely and our behavioral results clear.

      1) The RSA basically asks on the lowest level, whether neural activation patterns (as measured by EEG) are more similar between linked events compared to non-linked events. At least this is the first question that should be asked. However, on page 11 the authors state: "We ex-amined insight-induced effects on neural representations for linked events [...]". Hence, the critical analysis reported in the manuscript fully ignores the non-linked events and their neu-ral activation patterns. However, the non-linked events are a critical control. If the reported effects do not differ between linked and non-linked events, there is no way to claim that the effects are due to experimental manipulation - neither imagination nor observation. Hence, instead of immediately reporting on group differences (sham vs. control) in a two-way in-teraction (pre vs. post X imagination vs. observation), the authors should check (and re-port) first, whether the critical experimental manipulation had any effect on the similarity of neural activation patterns in the first place.

      We completely agree that the non-link items are a critical control. Therefore, we had reported not only the results for linked but also for non-linked events on page 15, lines 336-350. We clarified this important point now on page 12 lines 283-286:

      “Subsequently, we examined insight-induced effects on neural representations for linked (vs. non-linked) events by comparing the change from pre- to post-insight (post-pre) and the difference between imagination and observation (imagination - observation) between cTBS and sham groups using an independent cluster-based permutation t-test.”

      Moreover, to directly compare linked and non-linked events we performed a four-way in-teraction including link vs. non-link. This analysis yielded a significant four-way interaction, showing that the interaction of time (pre vs. post), mode of insight (imagination vs. obser-vation) and cTBS differed for linked vs. non-linked items. We then report the follow-up analyses, separately for linked and non-linked events. Please see pages 12-13, lines 287-294:

      “First, we included the within-subject factors time (pre vs. post), mode of insight (imagina-tion vs. observation) and link (vs. non-link) by calculating the difference waves. Subse-quently we conducted a cluster-based permutation test comparing the cTBS and the sham groups. This analysis yielded a four-way interaction within a negative cluster in a fronto-temporal region (electrode: FT7; p = 0.007, ci-range = 0.00, SD = 0.00). This result indicates that the impact of cTBS over the angular gyrus on the neural pattern reconfiguration follow-ing imagination- vs. observation-based insight may differ between linked and non-linked events. For linked events, this analysis yielded a […]”

      2) Overall, the focus on the targeted three-way interaction is poorly motivated. Also, a func-tional interpretation is largely missing.

      In order to better explain our motivation for the three-way interaction, we em-phasized in the introduction the importance of disentangling potential differences due to the mode of insight, given the known role of the angular gyrus in imagination on pages 4-5, lines 107-115:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked event.”

      As discussed on page 21 (starting from line 478; see also the intro on page 4), we expected that the angular gyrus would be particularly implicated in imagination-based insight, given its known role in imagination (e.g.: Thakral et al., 2017). Moreover, given the angular gyrus’s strong connectivity with other regions, the results observed may not be driven by this re-gion alone but also by interconnected regions, such as the hippocampus. We clarified these important points at the very end of the discussion on pages 23-24, lines 543-560:

      “Furthermore, the differential impact of cTBS to the angular gyrus on neural reconfigura-tions between events linked via imagination and those linked via observation may be at-tributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Uddin et al., 2010). This stronger connectivity between the ventral angular gyrus and the hippocampus may shed light on the greater impact of cTBS to the angular gyrus on im-agination-based insight. Given the angular gyrus’s robust connectivity with other brain re-gions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also origi-nate from these interconnected regions. This notion may bear particular importance given the required accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the an-gular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gy-rus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014).”

      3) "Interestingly, we observed a different pattern of insight-related representational pattern changes for non-linked events." It is not sufficient to demonstrate that a given effect is pre-sent in one condition (linked events) but not the other (non-linked events). To claim that there are actually different patterns, the authors would need to compare the critical condi-tions directly (Nieuwenhuis et al., 2011).

      We completely agree and now compared the two conditions directly. Specifical-ly, we now report the significant four-way interaction, including the factor link vs. non-link, before delving into separate analyses for linked and non-linked events on pages 12-13, lines 287-294:

      “First, we included the within-subject factors time (pre vs. post), mode of insight (imagina-tion vs. observation) and link (vs. non-link) by calculating the difference waves. Subse-quently we conducted a cluster-based permutation test comparing the cTBS and the sham groups. This analysis yielded a four-way interaction within a negative cluster in a fronto-temporal region (electrode: FT7; p = 0.007, ci-range = 0.00, SD = 0.00). This result indicates that the impact of cTBS over the angular gyrus on the neural pattern reconfiguration follow-ing imagination- vs. observation-based insight may differ between linked and non-linked events. For linked events, this analysis yielded a […]”

      4) "This analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B)." (p. 11). The authors report results with specificity for certain topographical locations. However, this is in stark contrast to the fact that the authors derived time X time RSA maps.

      We did derive time × time similarity maps for each electrode within each partic-ipant, which allowed us to find a cluster consisting of specific electrodes. We apologize for not making this aspect clear enough and have, therefore, modified the respective part of our methods section on page 38, lines 951-952:

      “In total, this analysis produced eight Representational Dissimilarity Matrices (RDMs) for each electrode and each participant.”

      5) "These theta power values were then combined to create representational feature vectors, which consisted of the power values for four frequencies (4-7 Hz) × 41 time points (0-2 sec-onds) × 64 electrodes. We then calculated Pearson's correlations to compare the power pat-terns across theta frequency between the time points of linked events (A with B), as well as between the time points of non-linked events (A with X) for the pre- and the post-phase separately, separately for stories linked via imagination and via observation. To ensure un-biased results, we took precautions not to correlate the same combination of stories twice, which prevented potential inflation of the data. To facilitate statistical comparisons, we ap-plied a Fisher z-transform to the Pearson's rho values at each time point. This yielded a global measure of similarity on each electrode site. We, thus, obtained time × time similarity maps for the linked events (A and B) and the non-linked events (A and X) in the pre- and post-phases, separately for the insight gained through imagination and observation." (p. 34+35).

      If RSA values were calculated at each time point and electrode, the Pearson correlations would have been computed effectively between four samples only, which is by far not enough to derive reliable estimates (Schönbrodt & Perugini, 2013). The problem is aggra-vated by the fact that due to the time and frequency smoothing inherent in the time-frequency decomposition of the EEG data, nearby power values across neighboring theta frequencies are highly similar to start with. (e.g., Schönauer et al., 2017; Sommer et al., 2022).

      Alternative approaches would be to run the correlations across time for each electrode (re-sulting in the elimination of the time dimension) or to run the correlations at each time point across electrodes (resulting in the elimination of topographic specificity).

      At least, the authors should show raw RSA maps for linked and non-linked events in the pre- and post-phases separately for the insight gained through imagination and observa-tion in each group, to allow for assessing the suitability of the input data (in the supple-ments?) before progressing to reporting the results of three-way interactions.

      Although we do see the reviewer’s point, we think that an RSA specific to the theta range yielding electrode specific time × time similarity maps must be run this way, otherwise, as you pointed out, one or the other dimension is compromised. Running an RSA across time for each electrode will lead to computing a similarity measure between the events without information on when these stimuli become more or less similar, thereby ig-noring the temporal dynamics crucial to EEG data and not taking advantage of the high temporal resolution. Conversely, conducting an RSA across electrodes might result in an overall similarity measure per participant, disregarding the spatial distribution and potential variations among electrodes. Although EEG has limited spatial resolution, different elec-trodes can capture differences that may aid in understanding neural processing. However, as suggested by the reviewer, we included the raw RSA maps for linked and non-linked events separately for pre- and post-phases, imagination and observation and link and non-link in the supplement and refer to these data in the results section on pages 12-13, lines 293-295:

      “For linked events, this analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B; Figure 3 – Figure sup-plement 1).”

      And on page 15, lines 339-341:

      “This analysis yielded a positive cluster (p = 0.035, ci-range = 0.00, SD = 0.00) in a fronto-temporal region (electrode: FT7; Fig. 3C; Figure 3 – Figure supplement 2).”

      Reviewer 2

      We thank the reviewer for the very helpful and constructive comments and appreciate that the reviewer finds our study relevant to all areas of cognitive research.

      1) While the observed memory reconfiguration/changes are attributed to the angular gyrus in this study, it remains unclear whether these effects are solely a result of the AG's role in re-configuration processes or to what extent the hippocampus might also mediate these memory effects (e.g., Tambini et al., 2018; Hermiller et al., 2019).

      We agree that, in addition to the critical role of the angular gyrus, there may be an involvement of the hippocampus. We point now explicitly to the modulatory capacities of angular gyrus stimulation on the hippocampus. Please see page 4, lines 81-88:

      “One promising candidate that may contribute to insight-driven memory reconfiguration is the angular gyrus. The angular gyrus has extensive structural and functional connections to many other brain regions (Petit et al., 2023), including the hippocampus (Coughlan et al., 2023; Uddin et al., 2010). Accordingly, previous studies have shown that stimulation of the angular gyrus resulted in altered hippocampal activity (Thakral et al., 2020; Wang et al., 2014). Furthermore, the angular gyrus has been implicated in a myriad of cognitive func-tions, including mental arithmetic, visuospatial processing, inhibitory control, and theory-of-mind (Cattaneo et al., 2009; Grabner et al., 2009; Lewis et al., 2019; Schurz et al., 2014).”

      We further added a new paragraph to the discussion pointing at the possibility that not solely the angular gyrus but another brain region, such as the hippocampus, may have me-diated the changes observed in our study on pages 23-24, lines 546-562:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagination-based insight. Given the angular gyrus’s robust connectivity with other brain regions, includ-ing the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      2) Another weakness in this manuscript is the use of different groups of participants for the key TMS intervention, along with underspecified or incomplete hypotheses/predictions.

      In our view, the chosen between-subjects design is to be preferred over a crossover design for several reasons. First, our choice aimed to eliminate potential se-quence effects that may have adversely affected performance in the narrative-insight task (NIT). Second, this approach ensured consistency in expectations regarding the story links while also mitigating potential differences induced by fatigue. Additionally, we accounted for the potential advantage of a within-subject design – the stimulation of the same brain – by utilizing neuro-navigated TMS for targeting the stimulation coordinate. Finally, it is im-portant to note that we measured the event representations pre- and post-insight and that also the mode of insight was manipulated within-subject. Thus, our design did include a within-subject component and we are convinced that the chosen paradigm balances the different strengths and weaknesses of within-subject and between-subjects designs in the best possible manner. We specified our rationale for choosing a between-subjects ap-proach in the introduction on page 5, lines 122-126:

      “We intentionally adopted a mixed design, combining both between-subjects and within-subject methodologies. The between-subjects approach was chosen to minimize the risk of carry-over effects and sequence biases. Simultaneously, we capitalized on the advantages of a within-subject design by altering the pre- to post-insight comparison and the mode of insight (imagination vs. observation) within each participant.”

      Moreover, to provide a comprehensive portrayal of the two groups, we incorporated de-scriptions concerning trait and state variables alongside age and motor thresholds and in-cluded t-test comparisons between these variables on page 7, lines 157-160:

      “Notably, the groups did not differ on levels of subjective chronic stress (TICS), state and trait anxiety (STAI-S, STAI-T), depressive mood (BDI), imaginative capacities (FFIS), person-ality dimensions (BFI), age, and motor thresholds (for descriptive statistics see Table 1; all p > 0.053).”

      And further included age and motor thresholds as control variables in Table 1 on page 18, lines 402-404:

      “Overall, levels of subjective chronic stress, anxiety, and depressive mood were relatively low and not different between groups. The groups did further not differ in terms of per-sonality traits, imagination capacity, age or motor thresholds (all p > 0.053; see Table 1).”

      For greater precision in outlining our hypotheses, we specified these at the end of the in-troduction on pages 4-55, lines 107-118:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked events. We further predicted that cTBS to the left angular gyrus would reduce the impact of (imagination-based) insight into the link of initially unrelated events on memory performance during free recall, given its higher variability compared to other memory measures.”

      3) Furthermore, in some instances, the types of analyses used do not appear to be suitable for addressing the questions posed by the current study, and there is limited explanation pro-vided for the choice of analyses and questionnaires.

      We addressed this concern by inserting a new section “control variables” in the methods explaining our rationale for employing the different questionnaires as control var-iables on pages 40-41, lines 1003-1019:

      “Control variables In order to ensure that the observed effects were solely attributable to the TMS manipula-tion and not influenced by other factors, we comprehensively evaluated several trait and state variables. To account for potential variations in anxiety levels that could impact our re-sults, we specifically measured state and trait anxiety using STAI-S and STAI-T (Laux et al., 1981), thus minimizing the potential confounding effects of anxiety on our findings (Char-pentier et al., 2021). Additionally, we evaluated participants’ chronic stress levels using the TICS (Schulz & Schlotz, 1999) to exclude any group variations that might explain the effect on memory, cosidering the well-established impact of stress on memory (Sandi & Pinelo-Nava, 2007; Schwabe et al., 2012). Moreover, we assessed participants’ depressive symp-toms employing the BDI (Hautzinger et al., 2006), to guarantee group comparability on this clinical measure. We further assessed fundamental personality dimensions using the BFI-2 (Danner et al., 2016) to exclude any potential group discrepancies that could account for dif-ferences observed. Lastly, we assessed participants’ imaginative capacities using the FFIS (Zabelina & Condon, 2019), to ensure uniformity across groups regarding this central varia-ble, considering the significant role of imagination in relation to the cTBS-targeted angular gyrus (Thakral et al., 2017).”

      We further specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-85:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Moreover, we added an explanation on why we opted for the RSA approach in the meth-ods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      Reviewer 3

      We thank the reviewer for the constructive and very helpful comments. We are pleased that the reviewer considered our experimental design to be strong and our behavioral results to be striking.

      1) My major criticism relates to the main claim of the paper regarding causality between the angular gyrus and the authors' behavior of interest. Specifically, I am not convinced by the evidence that the effects of stimulation noted in the paper are attributable specifically to the angular gyrus, and not other regions/networks.

      While our results showed specific changes after cTBS over the angular gyrus, demonstrating a causal involvement of the angular gyrus in these effects, we completely agree that this does not rule out an involvement of additional areas. In particular, there is evidence suggesting that cTBS over parietal regions, such as the angular gyrus, could poten-tially influence hippocampal functioning. We address this issue now in a new paragraph that we have added to the discussion, on pages 23-24, lines 546-564:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagination-based insight. Given the angular gyrus’s robust connectivity with other brain regions, includ-ing the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus. Expanding upon this idea, it is conceivable that targeting a more dorsal segment of the angular gyrus might exert a stronger influence on observation-based linking – an aspect that warrants future in-vestigations.”

      Responses to reviewer recommendations

      Reviewer 1

      1) On page 26, the authors write: "[...] different video events (A, B, and X) were recalled from day one [...]". I may have missed this point, but I had the impression that the task was con-ducted within one day.

      Indeed, this study was conducted within a single day. We rephrased the respec-tive statement accordingly. Please see page 7, lines 149-153:

      “To test this hypothesis and the causal role of the angular gyrus in insight-related memory reconfigurations, we combined the life-like video-based narrative-insight task (NIT) with representational similarity analysis of EEG data and (double-blind) neuro-navigated TMS over the left angular gyrus in a comprehensive investigation within a single day.”

      We further included this information in the methods section on page 27, lines 634-635:

      “In total, the experiment took about 4.5 hours per participant and was completed within a single day. ”

      Reviewer 2

      1) There is a substantial disconnection between the introduction and the methods/results sec-tion. One reason is that there is not sufficient detail regarding the hypotheses/predictions and the specific types of analyses chosen to test these hypotheses/predictions. Additionally, it is not explained what comparisons and outcomes would be informative/expected. This should be made clear. Second and related to the above, the rationale for conducting certain types of analyses (correlation, coherence, see below) sometimes is not specified.

      To address this concern, we elaborated on our hypotheses incorporating specif-ic predictions for the free recall, given its higher variability than the other memory measures, and for imagination vs. observation at the end of the introduction on pages 4-5, lines 107-122:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to interfere with the increase in neural similarity for linked events and with the decrease of neural similarity for non-linked events. We further predicted that cTBS to the left angular gyrus would reduce the impact of (imagination-based) insight into the link of initially unrelated events on memory performance during free recall, given its higher variability compared to other memory measures. Considering the high connectivity profile of the angular gyrus within the brain (Seghier, 2013), we conducted an EEG connec-tivity analysis building upon prior findings concerning alterations in neural reconfigurations. To establish a link between neural and behavioral findings, we chose a correlational ap-proach to relate observations from these two domains.”

      Moreover, we made our rationale for the employed analyses more explicit and specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-851:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Moreover, we added an explanation on why we opted for the RSA approach in the meth-ods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      2) The authors suggest that besides Branzi et al. (2021), this is one of the first studies showing that memory update is linked to the AG. I suggest having a look at work from Tambini, Nee, & D'Esposito, 2018, JoCN, and other papers from Joel Voss' group that target a similar re-gion of AG/Inferior parietal cortex. Many studies, using multiple TMS protocols, have now shown this brain region is causally involved in episodic and associative memory encoding.

      As mentioned above, further consideration of this literature is important as it delves into the region's hippocampal connectivity (and other network properties), and how that mediates the memory effects. Indeed because of the nature of the methods employed in this study, we do not know if the memory-related behavioural effects are due to TMS-changes induced at the AG's versus the hippocampal' s level, or both. How do the current findings square with the existing TMS effects from this region? Can the connectivity profile of the target re-gion highlighted by previous studies provide further insight into how the current behaviour-al effect arises? Some comments on this could be added to the discussion.

      We completely agree that the other studies showing enhanced associative memory after TMS to parietal regions need to be addressed. Therefore, we updated the discussion on page 20, lines 449-453:

      “Interestingly, recent work has additionally indicated that targeting parietal regions with TMS led to alterations in hippocampal functional connectivity, thereby enhancing associa-tive memory (Nilakantan et al., 2017; Tambini et al., 2018; Wang et al., 2014), potentially shedding light on the underlying mechanisms involved.”

      Moreover, we included a section specifically addressing the possibility that the effects ob-served may pertain to having modulated other regions via the targeted region and updated the discussion on pages 23-24, lines 543-562:

      “Furthermore, the differential impact of cTBS to the angular gyrus on neural reconfigura-tions between events linked via imagination and those linked via observation may be at-tributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Uddin et al., 2010). This stronger connectivity between the ventral angular gyrus and the hippocampus may shed light on the greater impact of cTBS to the angular gyrus on im-agination-based insight. Given the angular gyrus’s robust connectivity with other brain re-gions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also origi-nate from these interconnected regions. This notion may bear particular importance given the required accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the an-gular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gy-rus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      3) Another comment I have regards the results observed for the observation vs imagination insight conditions. The authors mention that the 'changes in representational similarity for the observation condition should be interpreted with caution, as these seemingly opposite changes appeared to be at least in part driven by group differences already in the pre-phase before participants gained insight.' I wonder what these group differences are and whether the authors have any hypothesis about what factors determined them.

      We could only speculate about the basis of the observed pre-insight phase dif-ferences. However, we provide now the raw RSA data as supplemental material to make the pattern of the (raw) RSA findings in the pre- and post-insight phases more transparent. We refer the interested reader to this material on pages 12-13, lines 293 to 295:

      “For linked events, this analysis yielded a negative cluster (p = 0.032, ci-range = 0.00, SD = 0.00) in the parieto-temporal region (electrodes: T7, Tp7, P7; Fig. 3B; Figure 3 – Figure sup-plement 1).”

      And on page 15, lines 339-341:

      “This analysis yielded a positive cluster (p = 0.035, ci-range = 0.00, SD = 0.00) in a fronto-temporal region (electrode: FT7; Fig. 3C; Figure 3 – Figure supplement 2).”

      Furthermore, the age of participants is not reported separately for the two groups (cTBS to AG vs Sham), I think. This should be reported including a t-test showing that the two groups have the same age.

      We agree and report now explicitly that groups did not significantly differ in rel-evant control variables including age. Please see page 7, lines 157-160:

      “Notably, the groups did not differ on levels of subjective chronic stress (TICS), state and trait anxiety (STAI-S, STAI-T), depressive mood (BDI), imaginative capacities (FFIS), person-ality dimensions (BFI), age, and motor thresholds (for descriptive statistics see Table 1; all p > 0.053).”

      And further included age and motor thresholds as control variables in Table 1 on page 18, lines 402-412:

      “Overall, levels of subjective chronic stress, anxiety, and depressive mood were relatively low and not different between groups. The groups did further not differ in terms of per-sonality traits, imagination capacity, age or motor thresholds (all p > 0.053; see Table 1).”

      The fact this study is not a within-subject design makes difficult the interpretation of the results and this should be recognised as an important limitation of the study.

      As outlined above, a within-subject design would in our view come with several disadvantages, such as significant sequence/carry-over effects. Moreover, the neural rep-resentation change was measured in a pre-post design, enabling us to measure the insight-driven neural reconfiguration at the individual level.

      We clarify our rationale for the between-subjects factor TMS in the introduction on page 5, lines 122-126:

      “We intentionally adopted a mixed design, combining both between-subjects and within-subject methodologies. The between-subjects approach was chosen to minimize the risk of carry-over effects and sequence biases. Simultaneously, we capitalized on the advantages of a within-subject design by altering the pre- to post-insight comparison and the mode of insight (imagination vs. observation) within each participant.”

      Furthermore, we included our rationale for choosing a between-subjects approach for the crucial TMS manipulation in the methods section on page 25, lines 601-604:

      “We implemented a mixed-design including the within-subject factors link (linked vs. non-linked events), session (pre- vs. post-link), and mode (imagination vs. observation) as well as the between-subjects factor group (cTBS to the angular gyrus vs. sham) to mitigate the risk of carry-over effects and sequence biases of the crucial cTBS manipulation.”

      4) The angular gyrus is a heterogeneous region with multiple graded subregions. The one tar-geted in the present study is the ventral AG which has strong connections with the episodic-hippocampal memory system. I was wondering if this might explain why the AG TMS ef-fects on representational changes have been observed for events linked via imagination but not direct observation. Perhaps the stimulation of a more 'visual' AG subregion (see Hum-phreys et al., 2020, Cerebral Cortex) would have resulted in a different (opposite) pattern of results. It would be good to add some comments on this in the discussion.

      We appreciate this interesting perspective offered regarding the potential out-comes of our study, particularly in relation to the activation of a more ventral sub region of the angular gyrus. We incorporated this idea into our discussion, alongside considerations regarding the potential effects of a more dorsal angular gyrus stimulation on observation-based linking. However, caution is warranted recognizing the inherent limitations posed by the precision of TMS manipulations, which is further underscored by our electric field simu-lations, utilizing a 10 mm radius. We included this section in the discussion on pages 23-24, lines 546-569:

      “Another intriguing aspect to consider is that the stimulated site was situated in the more ventral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagina-tion-based insight. Given the angular gyrus’s robust connectivity with other brain regions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus. Expanding upon this idea, it is conceivable that targeting a more dorsal segment of the angular gyrus might exert a stronger influence on observation-based linking – an aspect that warrants future in-vestigations. Yet, while acknowledging the functional heterogeneity within the angular gy-rus (Humphreys et al., 2020), pinpointing specific sub regions via TMS remains challenging due to its limited focal precision at the millimeter level (Deng et al., 2013; Thielscher & Kammer, 2004), as reinforced by our electric field simulations utilizing a 10 mm radius. Hence, drawing definitive conclusions regarding distinct angular gyrus sub regions requires future research employing rigorous checks to assess the focality of their stimulation.”

      5) Regarding the methods section, I have the following specific queries. It is unclear what is the purpose of the coherence and correlation analyses (pages 35, 36). Could the authors pro-vide further clarification on this? These analyses seem not to be mentioned anywhere in the introduction. This should be clarified briefly in the introduction and then in the methods sec-tion. The same for the questionnaires (anxiety, stress, etc): It is unclear the reason for col-lecting this type of data. This should be clarified in the introduction as well.

      We agree, and have updated the introduction as follows on page 5, lines 118-122:

      “Considering the high connectivity profile of the angular gyrus within the brain (Seghier, 2013), we conducted an EEG connectivity analysis building upon findings from the RSA anal-yses concerning alterations in neural reconfigurations. To establish a link between neural and behavioral findings, we chose a correlational approach to relate observations from these two domains.”

      We additionally provided an explanation for including these questionnaires in the introduc-tion on page 5, lines 126-129:

      “To control for any group differences beyond the TMS manipulation, we gathered various control variables through questionnaires, including trait- and state-anxiety, depressive symptoms, chronic stress levels, personality dimensions, and imaginative capacities.”

      Moreover, we elaborated on the underlying rationale guiding our chosen analytical ap-proaches. Therefore, we specified why we chose to analyze our behavioral data using LMMs on page 34, lines 849-851:

      “For our behavioral analyses we opted to employ linear-mixed models (LMM), given their high robustness regarding the underlying distribution and high sensitivity to individual varia-tion (Pinheiro & Bates, 2000; Schielzeth et al., 2020).”

      Furthermore, we added an explanation on why we opted for the RSA approach in the methods section on page 37, lines 920-923:

      “This method is ideally suited to measure neural representation changes and was specifical-ly chosen as it has been previously identified as the preferred approach for quantifying in-sight-induced neural changes (Grob et al., 2023b; Milivojevic et al., 2015).”

      To clarify on the rationale behind our coherence analysis, we incorporated an explanatory sentence in the methods section on page 39, lines 966-967:

      “Due to the robust connectivity between the angular gyrus and other brain regions (Petit et al., 2023; Seghier, 2013), we proceeded with a connectivity analysis as a next step.”

      6) The preregistration webpage is in German. This is not ideal as it means that the information is available only to German speakers.

      This webpage can easily be switched to English by changing the settings in the top right corner:

      To address this issue, we included a description of how to set the webpage to English in the methods section on page 25, lines 581-582:

      “For translation to English, please adjust the page settings located in the top right corner.”

      7) Page 18. 'NIT' and 'MAT' - avoid abbreviations when possible.

      We included the full name for the narrative-insight task (NIT) on page 7, line 151, line 153, and line 165, page 8 lines 177-178 and line 187, page 19 on line 427, page 26 on line 615, line 629 and line 632, page 27, line 653, page 30, lines 730-731, page 31, line 754, page 35, line 870, line 873, and page 36 and line 885.

      We further included the full name for the multi-arrangements task (MAT) on page 19, lines 428-429.

      8) Line 21....we further observed DECREASED...should be replaced with INCREASED, if I am not wrong.

      We checked the sentence again and it looks correct to us, since it describes the change for observation-based insight, not imagination-based insight. We clarified that this finding pertains to observation-based linking by modifying the sentence on page 23, lines 525-528, as follows:

      “Following cTBS to the angular gyrus, we further observed decreased pattern similarity for non-linked events in the observation-based condition, resembling the pattern change ob-served in the sham group for linked events, which may highlight the role of the angular gy-rus in representational separation during observation-based linking”

      Reviewer 3

      1) The major claim of the paper is that the angular gyrus is causally involved in insight-driven memory reconfiguration. To the authors' credit, they localized stimulation to the angular gyrus using an anatomical scan, the strength of the estimated electromagnetic field in the angular gyrus correlated with their behavioral results, and there were also brain-behavior correlations involving sensors located in the parietal lobe. However, the minimum evidence needed to claim causality is 1) evidence of a behavioral change (which the authors found) and 2) evidence of target engagement in the angular gyrus. It is also important to show brain-behavior correlations between target engagement and behavior. Although the au-thors stimulated the angular gyrus, that does not mean that rTMS specifically affected this region or that the behavioral results can be attributed to rTMS effects on the angular gyrus. As the authors point out, the angular gyrus has dense connections with other regions such as the hippocampus. In fact, several studies have shown that angular gyrus (or near AG) stimulation affects the hippocampal network (Wang et al., 2014, Science; Freedberg et al. 2019, eNeuro; Thakral et al., 2020, PNAS). EEG also has a poor spatial resolution, so even though the results were attributable to parieto-temporal sensors, this is not sufficient evi-dence to claim that the angular gyrus was modulated. Source localization would be re-quired to reconstruct the signal specifically from the AG. Thus, with the manuscript written as is, the authors can claim that "cTBS to the angular gyrus modulates insight-driven memory reconfiguration," but the current claim is not sufficiently substantiated.

      While acknowledging the potential role of the angular gyrus in driving the ob-served changes, we recognize that the available evidence may not be sufficient. Conse-quently, we have introduced several modifications within our manuscript to address this concern.

      In the revised Introduction, we now explicitly address the possibility of a stimulation of the hippocampus via the angular gyrus on page 4, lines 84-85:

      “Accordingly, previous studies have shown that stimulation of the angular gyrus resulted in altered hippocampal activity (Thakral et al., 2020; Wang et al., 2014).”

      Additionally, we included relevant evidence demonstrating previous instances of targeted stimulation of the angular gyrus, which led to alterations in hippocampal connectivity and associative memory. These insights have been included in the discussion on page 20, lines 449-453:

      “Interestingly, recent work has additionally indicated that targeting parietal regions with TMS led to alterations in hippocampal functional connectivity, thereby enhancing associa-tive memory (Nilakantan et al., 2017; Tambini et al., 2018; Wang et al., 2014), potentially shedding light on the underlying mechanisms involved.”

      Next, we have integrated crucial modifications essential for establishing a conclusive infer-ence of causality in our study. Moreover, we now explore the potential mediation of the effects observed from angular gyrus stimulation through other brain regions, like the hip-pocampus. In addition, we have highlighted prior work where such stimulation coincided with alterations in associative memory. For the updated discussion section, please see pag-es 23-24, lines 538-562:

      “Although our study provided evidence suggesting a causal role of the angular gyrus in in-sight-driven memory reconfigurations – highlighted by behavioral changes after cTBS to the angular gyrus, neural changes in left parietal regions, and relevant brain-behavior associa-tions – it is important to acknowledge the limitations imposed by the spatial resolution of EEG. Consequently, the precise source of the observed signal changes in the parietal re-gions remains uncertain, potentially tempering the definitive nature of these findings. Fur-thermore, the differential impact of cTBS to the angular gyrus on neural reconfigurations between events linked via imagination and those linked via observation may be attributed to its crucial role in imaginative processes (Ramanan et al., 2018; Thakral et al., 2017). An-other intriguing aspect to consider is that the stimulated site was situated in the more ven-tral portion of the angular gyrus, recognized for its stronger connectivity to the episodic hippocampal memory system in contrast to its more dorsal counterpart (Seghier, 2013; Ud-din et al., 2010). This stronger connectivity between the ventral angular gyrus and the hip-pocampus may shed light on the greater impact of cTBS to the angular gyrus on imagina-tion-based insight. Given the angular gyrus’s robust connectivity with other brain regions, including the hippocampus (Seghier, 2013), it is plausible that the observed changes might not solely stem from alterations within the angular gyrus itself, but could also originate from these interconnected regions. This notion may bear particular importance given the re-quired accessibility to the hippocampus during imaginative processes (Benoit & Schacter, 2015; Grob et al., 2023a; Zeidman & Maguire, 2016). Interactions between the angular gyrus and the hippocampus may give rise to rich memory representations (Ramanan et al., 2018). In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      We further replaced terms that imply inhibition of the angular gyrus with a more operation-ally descriptive phrase:

      “cTBS to the angular gyrus”

      2) The authors frequently claim that cTBS is "inhibitory stimulation" and that inhibition of the angular gyrus caused their effects. There is a common misconception within the cognitive neuroscience literature that stimulation is either "inhibitory" or "excitatory," but there is no such thing as either. The effects of rTMS are dependent on many physiological, state, and trait-specific variables and the location of stimulation. For example, while cTBS does repro-ducibly inhibit behavior supported by the motor cortex (Wilkinson et al., 2010, Cortex; Rosenthal et al., 2009, J Neurosci), cTBS of the posterior parietal cortex reproducibly en-hances hippocampal network functional connectivity and episodic memory (Hermiller et al., 2019, Hippocampus; Hermiller et al., 2020, J Neurosci). The authors reference the Huang et al. (2005) paper as evidence of its inhibitory effects but work in this paper is not sufficient to broadly categorize cTBS as inhibitory. First, Huang et al. stimulated the motor cortex and measured the effects on corticospinal excitability, which is significantly different from what the current authors are measuring. Furthermore, this oft-cited study only included 9 sub-jects. Other studies have found that the effects of theta-burst are significantly more varia-ble when more subjects are used. For example, intermittent theta-burst, which is assumed to be excitatory based on the Huang paper, was found to produce unreliable excitatory ef-fects when more subjects were examined (Lopez-Alonso, 2014, Brain Stimulation). Thus, the a priori assumption that stimulation would be inhibitory is weak and cTBS should not be dis-cussed as "inhibitory."

      We agree and included now a statement in the methods section that explicitly states that cTBS effects may be region-specific on page 33, lines 817-819:

      “Nonetheless, the effects of cTBS appear to vary based on the targeted region, with cTBS to parietal regions demonstrating the capability to enhance hippocampal connectivity (Hermiller et al., 2019, 2020).”

      We further substituted all terminology suggestive of an inhibitory effect with the phrase:

      “cTBS to the angular gyrus”.

      However, it is important to note, that while other studies (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014) found increased hippocampal connectivity after rTMS to a parie-tal region as well as enhanced associative memory, we observed impaired memory for the linked events. We included this clarification in the discussion on page 24, lines 558-562:

      “In line with this, recent studies have demonstrated that cTBS to the angular gyrus resulted in enhanced hippocampal connectivity and improved associative memory (Hermiller et al., 2019; Tambini et al., 2018; Wang et al., 2014). However, it should be noted that our study detected impaired associative memory following cTBS to the angular gyrus.”

      3) The hypothesis at the end of the introduction did not strike me as entirely clear. From this hypothesis, it seems that the authors are just comparing the differences in memory and re-configuration during imagination-based insight links. However, the authors also include ob-servation-based links and a non-linking condition, which seem ancillary to the main hy-pothesis. Thus, I am confused about why these extra factors were included and exactly what statistical results would confirm the authors' hypothesis.

      We agree, and have clarified our hypotheses on pages 4-5, lines 107-115:

      “Considering this involvement of the angular gyrus in imaginative processes, we expected that the effect of cTBS on the change in representational similarity from pre- to post-insight will differ based on the mode of insight – whether this insight was gained via imagination or observation. Specifically, we expected a more pronounced impairment in the neural recon-figurations when insight is gained via imagination, as this function may depend more on an-gular gyrus recruitment than insight gained via observation. Additionally, we expected cTBS to the left angular gyrus to reduce the increase in neural similarity for linked events and in-crease of neural dissimilarity for non-linked events.”

      4) Many of the distributions throughout the paper do not look normal. Was normality checked? Are non-parametric stats warranted?

      We evaluated and reported the normality assumption in our behavioral anal-yses. Despite the non-normal distribution of our data, we chose to utilize linear-mixed models due to their robust performance even in case of deviations from normal distribu-tions. This update in our methods section can be found on page 36, lines 890-896:

      “After outlier correction, we identified non-normality in our data using a Shapiro-Wilk test (narrative-insight task: W = 0.92, p < 0.001; multi-arrangements task: W = 0.94, p < 0.001; forced-choice recognition: W = 0.50, p < 0.001; free recall details: W = 0.85, p < 0.001; free recall naming of linking events: W = 0.94, p < 0.001). However, we mitigated this by employ-ing linear-mixed models (LMMs), recognized for their robustness even with non-normally distributed data (Schielzeth et al., 2020).”

      We recalculated the correlational analysis between the RSA data and the behavioral recall of linking events by using the Spearman method on page 13, lines 306-308:

      “Furthermore, to address a deviation from the normality assumption, the correlational analysis was repeated using the Spearman method, which indicated an even stronger cor-relation (r(59) = 0.32, p = 0.012).”

      We further recalculated the correlation between the change in coherence for linked events and the recall of details for events linked via imagination on page 16, lines 376-378:

      “Please note that for addressing a deviation from the normality assumption, the correla-tional analysis was repeated using the Spearman method, which yielded a significant corre-lation of similar strength (r(59) = 0.31, p = 0.015).”

      Our EEG analyses , including RSA and coherence analyses, utilized a cluster-based permuta-tion test (Fieldtrip; Oostenveld et al., 2011). These tests do not assume a normal distribu-tion by utilizing empirical sampling for statistical inference. This approach ensures robust-ness without constraints imposed by specific distributional assumptions. Subsequent t-tests, stemming from significant clusters identified in the initial non-parametric analyses, were extensions of the robust non-parametric approach and did not require additional normality testing.

      5) Can the authors include more detail about the sham coil? Was it subthreshold? Did the EMF cross the skull?

      The sham coil, also obtained from MAG & More GmbH, München, Germany, provided a similar sensory experience; however, the company did not specify any field strength (n.a.) as this coil was purposefully designed to prevent the induction of an elec-tromagnetic field (EMF) capable of penetrating the skull, thereby ensuring it had no impact on the brain. We clarified on this point in the methods section on pages 31-32, lines 772-778:

      “Two identically looking but different 70 mm figure-of-eight-shaped coils were used de-pending on the TMS condition: The PMD70-pCool coil (MAG & More GmbH, München, Germany) with a 2T maximum field strength was used for cTBS, while the PMD70-pCool-SHAM coil (MAG & More GmbH, München, Germany), with minimal magnetic field strength, was employed for sham, providing a similar sensory experience, with stimulation pulses being scattered over the scalp and not penetrating the skull.”

      6) There are differences between exclusion criteria in pre-registration and report. For example, BMI is an exclusion factor in the report, but not in the pre-registration. Can the authors provide a reason for this deviation?

      This discrepancy is due to (partial) participant recruitment from previous fMRI studies conducted in our lab that involved a stress induction protocol (as a structural MRI image was needed for the ‘neuronavigated’ TMS). Owing to the distinct cortisol stress reac-tivity observed in individuals with varying body mass indices (BMIs), participants with a BMI below 19 or above 26 kg/m² were excluded from these studies. To maintain consistency within our sample, only participants meeting these criteria were included. We elaborated on this point in the methods section on page 25, lines 586-592:

      “Participants were screened using a standardized interview for exclusion criteria that com-prised a history of neurological and psychiatric disease, medication use and substance abuse, cardiovascular, thyroid, or renal disease, evidence of COVID-19 infection or expo-sure, and any contraindications to MRI examination or TMS. Additionally, participants with a body mass index (BMI) below 19 or above 26 kg/m² were excluded. This decision stemmed from recruiting some participants from prior studies that incorporated stress induction pro-tocols, which imposed this specific criterion (Herhaus & Petrowski, 2018; Schmalbach et al., 2020).”

      7) Were impedances monitored and minimized during EEG?

      Yes, they were monitored. We clarified this point in the methods section on page 34, lines 845-847:

      “We maintained impedances within a range of ± 20 μV using the common mode sense (CMS) and driven right leg (DRL) electrodes, serving as active reference and ground, re-spectively”

      8) I think there may be a typo related to the Thakral coordinates. I believe Thakral used MNI coordinates -48,-64, 30, whereas the authors stated they used -48,-67,30. Is this a mistake?

      Upon reevaluation of our study coordinates, we identified a slight deviation in our stimulation coordinates compared to those reported by Thakral et al. (2017; +3mm on the y-axis). This variance resulted from the required MNI to Talairach (TAL) transformations necessary for utilizing the neuronavigation software Powermag View! (MAG & More GmbH, München, Germany). Notably, this deviation was consistent across all participants in our study. While TMS is more precise than tDCS, its focality is not as fine-grained down to the millimeter level. Despite this, our electric field simulations, adopting a 10mm radius, ef-fectively encompassed the original coordinates specified by Thakral et al. (2017). This radius ensured coverage over the intended target area, mitigating the impact of this minor devia-tion on the overall study outcomes. We updated the methods section accordingly on page 33, lines 800-806:

      “Based on the individual T1 MR images, we created 3D reconstructions of the participants' heads, allowing us to precisely locate the left angular gyrus coordinate (MNI: -48, -67, 30), initially derived from previous work (Thakral et al., 2017), for TMS stimulation. Despite a mi-nor deviation in coordinates due to necessary MNI to Talairach transformations for soft-ware compatibility (Powermag View! by MAG & More GmbH, München, Germany), our methodology ensured precise localization of the angular gyrus target area.”

      9) How was the tail of the coil positioned during stimulation? Was it individualized so that the lobes of the coil are perpendicular to the nearest gyrus, as is commonly done?

      The coil handle always pointed upwards to maintain optimal positioning with the coil holder. We followed the positioning procedure in the neuronavigation software Powermag View!, which did not indicate any positioning of the coil handle but specified the position and angle of the coil itself. To incorporate this aspect, we updated the legend of figure 2 on page 11, lines 260-261:

      “Please note that in the study, the coil handle was oriented upwards; however, in this illus-tration, it has been intentionally depicted as pointing downwards for better visibility pur-poses.”

      We further updated the method section on page 33, lines 723-824:

      “The coil was positioned tangentially on the head and mechanically fixed in a coil holder, with its handle pointing upwards to maintain its position”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This work describes a new and powerful approach to a central question in ecology: what are the relative contributions of resource utilisation vs interactions between individuals in the shaping of an ecosystem? This approach relies on a very original quantitative experimental set-up whose power lies in its simplicity, allowing an exceptional level of control over ecological parameters and of measurement accuracy.

      In this experimental system, the shared resource corresponds to 10^12 copies of a fixed single-stranded target DNA molecule to which 10^15 random single-stranded DNA molecules (the individuals populating the ecosystem) can bind. The binding process is cycled, with a 1000x-PCR amplification step between successive binding steps. The composition of the population is monitored via high-throughput DNA sequencing. Sequence data analysis describes the change in population diversity over cycles. The results are interpreted using estimated binding interactions of individuals with the target resource, as well as estimated binding interactions between individuals and also self-interactions (that can all be directly predicted as they correspond to DNA-DNA interactions). A simple model provides a framework to account for ecosystem dynamics over cycles. Finally, the trajectory of some individuals with high frequency in late cycles is traced back to the earliest cycles at which they are detected by sequencing. Their propensities to bind the resource, to form hairpins, or to form homodimers suggest how different interaction modes shape the composition of the population over cycles.

      The authors report a shift from selection for binding to the resource to interactions between individuals and self-interactions over the course of cycles as the main drivers of their ecosystem. The outcome of the experiment is far from trivial as the individual resource binding energy initially determines the relative enrichment of individuals, and then seems to saturate. The richness of the population dynamics observed with this simple system is thus comparable to that found in some natural ecosystems. The findings obtained with this new approach will likely guide the exploration of natural ecosystems in which parameters and observables are much less accessible.

      My review focuses mainly on the experimental aspects of this work given my own expertise. The introduction exposes very convincingly the scientific context of this work, justifying the need for such an approach to address questions pertaining to ecology. The manuscript describes very clearly and rigorously the experimental setup. The main strengths of this work are (i) the outstanding originality of the experimental approach and (ii) its simplicity. With this setup, central questions in ecology can be addressed in a quantitative manner, including the possibility of running trajectories in parallel to generalize the findings, as reported here. Technical aspects have been carefully implemented, from the design of random individuals bearing flanking regions for PCR amplification, binding selection and (low error) amplification protocols, and sequencing read-out whose depth is sufficient to capture the relevant dynamics.<br /> :<br /> We thank the reviewer for summarizing our work and the main findings in a very clear and effective manner.

      One missing aspect in the data analysis is the quantification of the effect of PCR amplification steps in shaping the ecosystem (to be modeled if significant). In addition, as it stands the current work does not fully harness the power of the approach. For instance, with this setup, one can tune the relative contributions of binding selection vs amplification for instance (to disentangle forces that shape the ecosystem). One can also run cycles with new DNA individuals, designed with arbitrarily chosen resource binding vs self-binding, that are predicted to dominate depending on chosen ecological parameters. I have three main recommendations to the authors:

      1) PCR amplification steps (and not only binding selection steps) should be taken into account when interpreting the outcome of experiments.

      2) More generally, a systematic analysis of the possible modes of propagation of a DNA molecule from one cycle to the next, including those considered as experimental noise, would help with interpreting the results.

      3) Testing experimentally the predictions from the analysis and the modelling of results would strengthen the case for this approach.

      Despite its conceptual simplicity, our approach has indeed a few experimental handles that enable exploring a relevant variety of conditions much beyond those described in this paper, of which we are very aware. These involve selection vs. amplification or set the stage to explore competition, parasitism or cooperation among specific species, as the reviewer points out, but also introduce mutations and explore the kinetics of evolution in static or dynamic environments. Ongoing experiments are considering some of these conditions. We modified the text to mention more explicitly these possibilities, which are now mentioned in p11 lines 376-378 and lines 416-417. The three points raised by the reviewer helped us to further improve and clarify strengths and limitations of our work, as detailed below.

      Regarding the first point, here are my suggestions :

      • Run one cycle of just amplification vs 'binding + amplification', or simply increase the number of PCR cycles (and subsample the product) to check whether it impacts the population composition, in particular for sequences with predictions derived from the current analysis.

      The point raised by the reviewer is indeed very relevant and not discussed in our manuscript. Prompted by the reviewer’s comment, we performed two new experiments to distinguish resource-binding selection from PCR amplification effects.

      First, we performed a negative control experiment in which we performed the “selection step” with bear beads, i.e. beads without with no DNA grafted on them. We then compared the results with the corresponding results of the original experiments on Oligo 1 and 2.

      After 6 cycles, the most abundant sequence in the negative dataset has a relative occurrence of 0.05%, whereas the dominant strand in Oligo 1 and Oligo 2 has an abundance of 8% and 16%, respectively, i.e. 40-80 times larger.

      This indicates that the drift due to non-specific binding + PCR amplification is at least two orders of magnitude smaller than the selection induced by the affinity with the resource.

      This results are now cited in p14 lines 468-470, and described in Appendix 1, Experimental controls.

      Second, we tested the effect of PCR amplification on the selection process. We exploited the fact that we have aliquots for each generation of our evolution experiment, which we sampled and saved after PCR and before sequencing. We thus chose a specific generation - specifically generation 9 from Oligo 1 experiments - and performed another PCR round we proceeded directly to sequencing with no beadsselection step. We then compared the ensemble of oligos obtained in this way, which we named Oligo 1 “cycle 9 replica”, with both the original Oligo1 cycle 9, and with Oligo1 cycle 10.

      We sampled 20 times 4 x 10^5 sequences from the cycle 9 dataset, from cycle 9 replica and from cycle 10 with a bootstrap approach. To compare the three systems we extracted the fraction of the population of each covered by the 10 most abundant individuals. The results are shown in Figure 2 - Figure Supplement 4. In the figure caption further details on the analysis can be found. The similarity between cycle 9 and cycle 9 replica and the marked difference between cycle 9 replica and cycle 10

      indicates that the relevant part of the selection is indeed performed by the resourcebinding mechanism, while drifts induced by PCR play a secondary role.

      As a further check, we compared the specific sequences across the 20 samples in cycle 9 and cycle 9 replica datasets and found that the 10 most abundant sequences are almost always the same. In particular, the first 8/9 are always the same, possibly shuffled.

      These new pieces of evidence are now cited in p14 lines 483-484 and described in Appendix 1, Experimental controls.

      • Sequencing read-out includes the same PCR protocol as the one used for amplification steps, so read-out potentially has an effect on the composition of the ecosystem. Again, varying the number of PCR cycles is a direct way to test this.

      The PCR amplification involved in the read-out might have a minor effect on the sequencing outcome but not on the composition of the ecosystem. In fact, the sample that undergoes sequencing is taken from the pool at each cycle, and not inserted back into it. Thus, it does not participate in the following selection steps. This is specified in the text at p3 line 104

      • Could self-interactions (hairpins of homodimers) benefit individuals during amplification steps? The role of self-interactions during binding selection steps could also be tested directly over one cycle (again varying the relative weight of the binding vs amplification to disentangle both).

      Our choice of conditions for PCR amplification were thought to minimize effects of this type. PCR amplification is carried out at 68 C, a temperature at which, given the level of self and mutual complementarity in the sequences analyzed in the text, hairpins or homodimers should be melted and thus have no effect. This is specified in the text at p. 14 lines 479-480 However, if an effect is present, it gives a disadvantage (rather than an advantage) to self-interacting individuals. For the amplification step we used Q5® Hot Start HighFidelity DNA Polymerase, which does not possess strand displacement activity. Therefore, in theory, if during amplification the polymerase encounters a double strand portion, it stops and synthesizes only a truncated product, which will be then lost during the purification step. In other words, sequences with secondary and/or tertiary structures are less likely to be amplified during the polymerization step. As a consequence, a DNAi that is characterized by this kind of structures, will be negatively selected even in the case of optimal binding to the resource, and will be underrepresented in the pool.

      About the second point:

      • Regarding the effect of sampling (sequencing read-out), PCR amplification errors: explicitly check the consistency of observations with the expected outcome, in the methods section (right now these aspects are only briefly mentioned in the main text), which would highlight again the level of control and accuracy of the system.

      Hoping to have well interpreted the request, we performed a technical replicate sequencing Oligo 1 cycle 9 again and analyzed the sequences that have at least 100 reads (corresponding to 27.42% of the total reads). We find that among the 800 DNA species that have at least 100 reads, 93.6% are found in both replicates. All the nonoverlapping sequences have very low abundance, close to 100.

      Moreover, we compare the population size of each DNA species between the two replicas, after having equalized the database sizes. The results are now cited in p14 lines 509-510, In Appendix 1, Experimental Controls and shown in Figure 2-figure supplement 3, where we plot the ratio of the number of reads in the two replicates for each sequence as a function of the number of reads in one. We found an average of 0.965 with a standard deviation of 0.119. High fluctuations are found in the most rare species, as expected.

      We think this evaluation indeed strengthens the solidity of our results.

      • I have a small concern about target resource accessibility: is there any spacer between the ssDNA and the bead? The methods section does not mention any, and I would expect such a proximity between the target DNA and the bead to yield steric repulsion that impedes interactions with random DNA individuals.

      Yes, there is a 12-carbon spacer between the bead and the resource, which was inserted exactly to make the resource more accessible. This information is now available in Table 1 of Supplementary Information detailing the sequences used in the experiment. However, as now described in the text (p8 lines 284-286), we observe that the interaction with the resource is always shifted to the 3', the terminal furthest from the bead, indicating some residual issue of accessibility to the resource sections closest to the bead.

      • Regardless of the existence of a spacer, binding of random DNA molecules to beads instead of the target DNA constitutes a potential source of noise (described for now as '1-x' in the IBEE model), which can be probed by swapping targets, selecting without target etc.

      This issue is addressed by the test with bare beads described above, in which we found little effects, corresponding to small 1−𝑥 value.

      • Is there any recombination potentially occurring during amplification steps? This could be tested with a set of known molecules amplified over 24 amplification steps in a row (no binding step).

      It is possible for recombination to occur during the amplification steps. In Appendix 2, the section "By-Product Formation from PCR Amplification", discusses PCR byproducts as aberrant forms of amplification, such as recombination events. We adopted several strategies to limit by-product formation, such as: i) use of “blockers” characterized by a phosphate group at 3’ end (thus inhibiting their usage during the amplification and allowing a better control of the reaction conditions over the PCR cycles), ii) a high annealing temperature (to limit the possibility of a spurious primer annealing to the random region), iii) fewer PCR cycles, iv) a high primer concentration, v) a very short elongation step (all these strategies have been implemented to avoid a possible mispriming event between different DNAi, and the formation of concatemers). However, the formation of by-products is a problem inherent to the technique: in fact, it is a known issue for classical SELEX technology (Tolle et al. 2014), mainly due to the random region within the DNAi. Q5® Hot Start High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, MA, USA) has an error rate of <0.44 x 10-6/base.

      In classic SELEX technology, the average number of selection cycles is 10. This limitation is partly due to the increase in PCR by-products. As we can see from Figure 2 Supplementary Figure 1, the percentage of PCR by-products is less than 20% at cycle 12, and then increases dramatically in the following cycles. We are performing a series of experiments with known and limited sequences to verify and better understand the phenomenon for future applications of the SEDES platform. On this issue we decided not to modify the manuscript since we think it is already well discussed in Appendix 1.

      And the third point:

      • Perform one cycle (or a few cycles) with random DNA individuals, the most frequent individuals at the end of the current experiment, newly designed individuals with higher binding affinity to the target than currently dominating individuals, newly designed individuals with higher propensity to form hairpins or to form homodimers. Such experimental testing of predictions from the data analysis/modeling, typical of a physics approach, would illustrate the level of understanding one can reach with a simple yet powerful experimental setup.

      We perfectly agree that the approach we propose and the set of results we obtained call for further investigations that could strengthen analysis and modeling. The final aim we envisage is the understanding, within this simplified approach, of key evolutionary factors such as fitness. Indeed, becoming able to write an explicit fitness function would be a significant new contribution to the understanding of evolutionary processes, even within the limited settings of the ADSE approach, as discussed in the conclusions of the manuscript.

      However, undergoing such an analysis is a long and expensive job, which we have started and will be completed in a not immediate future. For this reason, given the already significant body of results we are presenting here, we prefer to keep this paper confined to the study of the evolution of a random DNAi population and discuss in a future contribution the behavior of smaller designed sets of competing, collaborating or parasitic individuals.

      Looking ahead, additional stages of investigations will also include mutations - to investigate the kinetics of speciation, and, in an even further stage, the interplay between evolution kinetics and dynamical mutation of resources.

      I have a few smaller points:

      • It would be very useful to provide the expected dynamic range of binding free energies (in terms of DeltaG and omega): what is the maximum binding free energy for the perfect complement?

      The NUPACK-computed binding free energy of a 20 basis-long oligomer complementary to the resource (𝜔=20) is -24.36 Kcal/mol for Oligo1 and -23.08 Kcal/mol for oligo 2. This is the best answer we can offer to the reviewer’s request, since the maximum binding free energy of DNAi individuals (much longer than the target strand) would include contributions from the unpaired bases. Indeed, the values give above are approached by the left tail of the distribution of Fig. 3a, which however includes DNAi self-energies.<br /> The perfect complement binding free energy is now cited in the text as a reference for the dynamical range of DeltaG (p4 lines 151-152).

      • How is the number of captured DNA molecules quantified? Is 10^12 measured, estimated, or hypothesized?

      The number of sequences was calculated from data obtained from 260 nm absorbance quantification. We have now added this information in the Methods, Selection Phase” section.

      Reviewer #2:

      Summary:

      In this manuscript, the authors introduced ADSE, a SELEX-based protocol to explore the mechanism of emergency of species. They used DNA hybridization (to the bait pool, "resources") as the driving force for selection and quantitatively investigated the factors that may contribute to the survival during generation evolution (progress of SELEX cycle), revealing that besides individual-resource binding, the inter- and intra-individual interactions were also important features along with mutualism and parasitism.

      Strengths:

      The design of using pure biochemical affinity assay to study eco-evolution is interesting, providing an important viewpoint to partly explain the molecular mechanism of evolution.

      Weaknesses:

      Though the evidence of the study is somewhat convincing, some aspects still need to be improved, mostly technical issues.

      Major:

      1) There are a few technical issues that the authors should clarify in the manuscript to make the analysis more transparent:

      1.1) To my understanding, it is difficult to guarantee the even distribution of different species (individuals) in the initial individual pool. Even though the authors have shown in Fig. 2a that the top 10 sequences take up ~ 0% in the pool, it remains unclear how abundant these top and bottom representative sequences are, given the huge number of the pool (10E15). Can the author show the absolute number of these sequences in different quantiles? Please show both Oligo sets.<br /> : First, we thank the reviewer for both positive and critical comments that have guided us in reformulating or clarifying some messages of our work.

      As for this specific point: 10E15 is a small number compared to 4^50 = 10E30, the number of possible sequences of length 30. Thus, we don’t expect more than one individual per sequence in the initial pool. However, sequencing requires a preparation amplification, which may lead to detecting a few sequences with more than one individual.

      Specifically, in the initial pool of Oligo 1, the most abundant individual (of sequence GAACTAAAGGGGCGGTGTCCACTTGCCTGTAGTGGTTATCAGTCCGGTTG)has 3 copies. The 0.7% of the sequences has 2 copies, while the vast majority of strings (99.3% on a sample of about 1.5 x 10E6 sequenced DNAi) is present in one copy only. A similar situation holds for Oligo 2, with 4 DNAi present in 3 copies and the 0.8% of the sequences (in a pool of 2 x 10E6 DNAi) in 2 copies.

      It is worth noticing that none of the 10 most abundant species in the last cycle is present in the sample. Indeed, the fraction of the pool which is sequenced is removed from the population that undergoes evolution (as now specified in p2, line 104). We specified in the text (p2, lines 69-70, p3 lines 94-96) the fact that in the initial pool no sequence is expected to be present in more than one individual.

      1.2) The author claimed that they used two different oligo sets (Oligo1 and Oligo2) in this study. It is unclear which data was used in the presentation. How reproducible are they? Similar to this concern, how reproducible if the same oligo set was used to repeat the experiment?

      The oligo used in the main text was declared in Methods, Replica section. It is now declared also in the main text (p3 lines 106-108 and in the captions of Figure 2, Figure 3 and Figure 4). Reproducibility is addressed in: Figure 2-figure supplement 5; Figure 2-figure supplement 6; Appendix 2: Results of the experimental replica.

      It should also be noted that two starting pools of random 50mers are necessarily disjoint sets for the same reason discussed in the previous answer: the probability of common sequences in two 10E15 selections from a 10E30 is negligibly small. Thus, it is expected that each time a new evolution experiment is started, different dominant sequences are found. However, the statistical properties of the DNAi pool during the evolution process of Oligo1 and Oligo2 are similar as discussed in Appendix 2 of the paper.

      1.3) PCR and illumina sequencing itself introduced selection bias. How would the analysis eliminate them? The authors only discussed the errors created during PCR cycles (page 3, lines 115-122). However the PCR itself would prefer to amplify some sequences over the others (e.g. with high GC content). Similarly, the illumina sequencing would be difficult to sequence the low complexity sequences. How would this be circumvented?

      Yes, both PCR and Illumina sequencing have some known biases in the amplification process (e.g. sequencing of homopolymers or amplification of GC-rich sequences) that are intrinsic to the used techniques. Regarding PCR, we implemented a thermal protocol optimized for our chosen experimental setup, characterized by very short denaturation, annealing and amplification steps performed at high temperatures. Regarding Illumina sequencing, we can’t rule out a bias against specific sequences (e.g, homopolymers), which however should not be captured during the selection step, due to the design of the resource. Also, the libraries subjected to sequencing are characterized by a low complexity: according to the experimental design, the first and last 25 nucleotides are the same for all DNAi, the only differences being in the central 50 nt-long sequence. It is known that a low complexity library might encounter problems during sequencing due to the design of Illumina instruments: nucleotide diversity, especially in the first sequencing cycles, is critical for cluster filtering, optimal run performance and high-quality data generation. To overcome this limitation, the obtained libraries were run together with more complex and diverse library preparations: the ADSE sequences were about 1-2% of the total reads per run, corresponding to only a few million reads.

      This discussion is now in Appendix 1, Intrinsic limitations of the molecular approach.

      1.4) Some DNA sequences would bind to the beads instead of the resource sequence coated on them. Should the author run the experiment using bead alone as a control?<br /> : We performed a negative control experiment in which we performed the “selection step” with bear beads, i.e. beads without with no DNA grafted on them. We then compared the results with the corresponding results of the original experiments on Oligo 1 and 2.

      After 6 cycles, the most abundant sequence in the negative dataset has a relative occurrence of 0.05%, whereas the dominant strand in Oligo 1 and Oligo 2 has an abundance of 8% and 16%, respectively, i.e. 40-80 times larger.

      This indicates that the drift due to non-specific binding (+ PCR amplification) is at least two orders of magnitude smaller than the selection induced by the affinity with the resource.<br /> This part is now discussed in Appendix 1, Experimental controls.

      2) It would be interesting to study the impact of environmental factors, for example, changing pH, salt concentration, and detergent. Would these factors accelerate/decelerate the evolution?

      We agree that the approach we propose and the set of results we obtained call for further investigations. However, performing these additional experiments, which would require a minimum of 6 generations each, is a long and expensive job, which we have started and will not be completed in the near future. For this reason, given the already significant body of results we are presenting here, we prefer to keep this paper confined to the study of the evolution of a random DNAi population in the selected conditions and leave the exploration of new conditions, potentially opening new evolutionary scenarios, to a future contribution. In fact, our aim was to show that through our platform we can indeed observe fundamental elements of evolution in a non-biological system, which, in the set of chosen parameters, we do.

      3) The concentration of individual oligo is apparently one of the most important factors in determining the interactions. In later cycles, some oligos become dominant, namely with extremely higher concentrations compared to their concentration in earlier cycles. This would definitely affect its interaction with resources, or self-interaction, or interaction with other oligos in the pool. However, the authors failed to discuss this factor, which may explain the exponential enrichment in later cycles.

      We agree with the reviewer that this is an important point, but we disagree that we have not discussed it. We introduce the topic at the end of the “Null Model and Eco-evolutionary Algorithm”, where we comment on the change of the gamma parameter by saying that there must be a shift in the evolution process, first dominated by the interactions with the resources, and in later stages by some other factors (lines 227230) that we then discuss in “Self and mutual DNAi interactions are evolutionary drivers”. In this latter chapter and in the following, we indeed discussed the effects of mutual and self interactions between DNAi.

      Indeed, a key point in our paper is the change in the gamma parameter necessary to match the IBEE model to experiments, as it is now more openly stated (p5 lines 217218 where we also mention figure 2-supplement 8 which clearly shows the necessity of a variable gamma). The two regimes enlightened by the gamma value must reflect a change in the competition for the resources and interactions among species. In the first generations, where the diversity of species is large (there are few strings for each species) and binding to the resources generally very week (small <omega>), the affinity with the resource is the main driving force (fast growth of <omega>), while mutual interactions remain too random to favor any species in particular. In the later cycles instead, when <omega> becomes large enough to provide a significant stability to the resource-binding of the majority of species, the dominating species compete more intensively on the basis of their structure and capacity of self-defense, parasitism and mutualism, a condition in which evolution affects more modifications in sequences than in <omega>.

      Certainly, our understanding of this shift is based on statistical behavior and it is inferential, based on the study of specific DNAi described in the last part of the manuscript. For a better molecular model, more experiments with selected DNAi competing, cooperating or being parasitic would be necessary, with the final aim of defining a predictive fitness function. Alas, this requires months of further investigation. :

      4) The author observed the different behaviors of medium 𝜔 in early and late cycles, referring to Fig 2h. Using the IBEE model, they found out it is the change of gamma. However, the authors did not further discuss the molecular mechanism. It could be very interesting to understand the evolutionary change of these individuals.

      This comment might be related to the previous one. It is true that our discussion and understanding of the whole process is statistical, and misses a molecular model to predict the value of gamma.

      However, the specific behavior that the reviewer asks about (those in Fig. 2h) is not related to the change in gamma. Even if gamma remains as in the first part of the evolution (gamma = 3), the species with overlap between 6 and 10 would first grow in number and later decrease. Indeed, during the first cycles they have an advantage with respect to the majority of species with lower maximum overlap, a condition that favors their amplification. However, in the second stage of the evolution dominant species with a larger affinity emerge and outcompete the individuals of this class. We added a sentence in the text to clarify this point (p7 lines 227-229).

      5) In Figure 2f, some high w become quite missing. Should the authors give some interpretation? It is not observed in cycle 12 though (panel e).

      Such an effect is just due to under-sampling. In a pool of 10^n oligomers, any sequence with a given 𝜔 with P(omega) < 10E-n will have a vanishing probability to appear in that sample.<br /> At cycle 12 the overall number of sequenced strands is larger than at cycle 24, due to the growing presence of PCR by-products. Thus, the right tail of the cyan distribution at the last cycle is sampled with less accuracy than at cycle 12. We have added a sentence in the revised manuscript (p5 lines 177-178) to clarify this point.

      6) It would be interesting to further explore if another type of selection resource is used, for example protein that binds to particular sequences, i.e. transcription factors. Previous studies have used a large amount of sequence-specific transcription factors to run SELELX. Since the data have existed there, why not explore?

      This is an interesting suggestion: can we use data from “ordinary” SELEX favoring specific sequences to explore sequence evolution? Two limitations make us a bit skeptical on this path: first, the consensus sequences of DNA-binding proteins are rather short and typically target dsDNA rather than ssDNA; second, the free energy of interaction is known only for the consensus sequence but not for sequences with all possible mutations with respect to the consensus sequence, making very hard to develop any molecular understanding of the process.

      Minor:

      1) There is no figure legend or in-text citation of Figure 2b.

      2) Please correct "⁃C" with "{degree sign}C" in lines 470, 471, 472, 477 et al.

      3) Typos and grammar issues should be corrected. Examples are shown below (but not limited to these only):

      • mixed use of past and present tense.

      • Line 152, "basis" should be "bases".

      • Line 277, "a impediment" should be "an impediment"

      • Line 278, "a major deadly threats" should be "major deadly threats"<br /> :<br /> We are sorry for the mistakes, and we have corrected them. Many thanks to the reviewer!

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. While there are many strengths in this study, there is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions. The authors only used males in this study and do not discuss whether they would also expect to sex differences in stimulation-evoked BBB changes in the healthy brain. Another minor limitation is the authors did not address the potential impact of anesthesia which can impact neurovascular coupling in rodent studies. The authors could have also better integrated the RNAseq findings into mechanistic experiments, including testing whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain.

      While there are many strengths in this study, there is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions. The authors only used males in this study and do not discuss whether they would also expect to sex differences in stimulation-evoked BBB changes in the healthy brain.

      We agree with the reviewer regarding the importance of examining sex differences on stimulation-evoked BBB changes. To address this issue we have: (1) clarified in the methods section that the human study involved both males and females; (2) added a section to the discussion highlighting the male bias as a key limitation of our animal experiments; and (3) stated that future work should examine whether stimulation-evoked BBB changes differ between makes and females.

      Another minor limitation is the authors did not address the potential impact of anesthesia which can impact neurovascular coupling in rodent studies.

      We are grateful for this comment and agree with the reviewer that the potential effects of anesthesia should be discussed. We have added the following discussion paragraph:

      “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can affect various receptors within the NVU, potentially altering neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketamine-xylazine anesthesia, which unlike other anesthetics, was shown to generate robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018).”

      Reviewer #2 (Public Review):

      Summary:

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the bloodbrain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium, and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood-brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity, and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a breakdown in the blood-brain barrier following activation of brain areas through a motor task.

      Strengths:

      This study is novel and the results are potentially important as they demonstrate an unexpected breakdown of the blood-brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity.

      The strengths of the study are:

      1) The use of an in vivo model with multiple methods to investigate the blood-brain barrier response to a forelimb stimulation.

      2) The determination of a potential functional role for the observed leakage of the blood-brain barrier from both a genetic and electrophysiological viewpoint.

      3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.

      4) Preliminary experiments demonstrating a similar observation of activity-dependent breakdown of the blood-brain barrier in humans.

      Weaknesses:

      There are both conceptual and experimental weaknesses.

      1) The stimulation is in an animal anesthetized with ketamine, which can affect critical receptors (ie NMDA receptors) in synaptic plasticity.

      We agree that the potential effects of anesthesia should be considered. The Discussion was revised to address this point: “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can affect various receptors within the NVU, potentially altering neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketamine-xylazine anesthesia, which unlike other anesthetics, was shown to generate robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018)”

      2) The stimulation protocol is prolonged and it would be helpful to know if briefer stimulations have the same effect or if longer stimulations have a greater effect ie does the leakage give a "readout" of the stimulation intensity/length.

      Thank you for this important comment. We are also very curious about the potential relationship between stimulation magnitude/duration and subsequent leakage and have added the following statement to the discussion:

      “Future studies should also explore the effects of stimulation magnitude/duration on BBB modulation, as well as the stimulation threshold between physiological and pathological increase in BBB permeability.”

      Our current findings indicate that a one-minute stimulation does not affect vascular permeability or SEP and we aim to test additional stimulation paradigms in future studies.

      3) For some of the experiments (see below), the numbers of animals are low and the statistical tests used may not be the most appropriate, making the results less clear cut.

      We appreciate this comment and have revised the statistical analysis of Figure 1J,K. We now use a nested t-test to test for differences between rats (as opposed to sections). The differences remain significant (EB, p=0.0296; Alexa, p=0.0229). The text was modified accordingly.

      4) The experimental paradigms are not entirely clear, especially the length of time of drug application and the authors seem to try to detect enhancement of a blocked SEP.

      Thank you for pointing this out. Figures 2&3 were revised for clarification and a ‘Drug Application’ subsection was added to the methods section.

      5) It is not clear how long the enhancement lasts. There is a remark that it lasts longer than 5 hours but there is no presentation of data to support this.

      Thank you for this comment. As the length of experiments differed between animals, the exact length could not be specifically stated. To clarify this point, we revised the text to indicate that LTP was recorded until the end of each experiment (between 1.5-5 hours, depending on the condition the animal was in). We also added a panel to figure 2 (Figure 2d) with exemplary data showing potentiation 60, 90, and 120 min post stimulation.

      6) The spatial and temporal specificity of this effect is unclear (other than hemispheric in rats) and even less clear in humans.

      Our animal experiments (using both in vivo imaging and histological analysis) showed no evidence of BBB modulation outside the cortical somatosensory area corresponding to the limbs. We looked at the entirety of the coronal section of the brain and found enhancement solely in the somatosensory area corresponding to limb. The right side of panels h and i in Figure 1 show an x20 magnification of the section, focusing on the enhanced area. The whole section was not shown, as no fluorescence was found outside the magnified area. Moreover, our quantification showed that the enhancement was specific to the contralateral and not ipsilateral somatosensory cortex (Figure 1 j-k).

      We agree that temporal specificity needs to be further explored, and we have now stated that in the discussion: “Future studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that may differentiate between physiological and pathological BBB modulation.”

      We also agree that larger studies are needed to better understand the specificity of the observed effect in humans, and to account for potential inter-human variability in vascular integrity and brain function due to different schedules, diets, exercise habits, etc.

      8) The experimenters rightly use separate controls for most of the experiments but this is not always the case, also raising the possibility that the application of drugs was not done randomly or interleaved, but possibly performed in blocks of animals, which can also affect results.

      Thank you for pointing out this lack of clarity. We have now highlighted that drug application was done randomly.

      9) Methyl-beta-cyclodextrin clears cholesterol so the effect on albumin transport is not specific, it could be mediating its effect through some other pathway.

      We agree that the effect of mβCD may not be specific. To mitigate this issue, we used a very low mβCD concentration (10uM). Notably, this is markedly lower than the concentrations reported by Koudinov et al, showing that cholesterol depletion is observed at 5mM mβCD and not at 2.5mM/5mM (Koudinov & Koudinova, 2001). This point was added to the discussion.

      10) Since the breakdown of the blood-brain barrier can be inhibited by a TGF-beta inhibitor, then this implies that TGFbeta is necessary for the breakdown of the blood-brain barrier. This does not sit well with the hypothesis that TGF-beta activation depends upon blood-brain barrier leakage.

      Thank you for pointing out this lack of clarity. We have added a discussion paragraph that clarifies our hypothesis: “As mentioned above, albumin is a known activator of TGF-β signaling, and TGF-β has a well-established role in neuroplasticity. Interestingly, emerging evidence suggests that TGF-β also increases cross-BBB transcytosis (Betterton et al., 2022; Kaplan et al., 2020; McMillin et al., 2015; Schumacher et al., 2023). Hence, we propose the following two-part hypothesis for the TGF-β/BBB-mediated synaptic potentiation observed in our experiments: (1) prolonged stimulation triggers TGF-β signaling and increased caveolae-mediated transcytosis of albumin; and (2) extravasated albumin induces further TGF-β signaling, leading to synaptogenesis and additional cross-BBB transport – in a self-reinforcing positive feedback loop. Future research is needed to examine the validity of this hypothesis.

      Reviewer #3 (Public Review):

      Summary:

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood-brain barrier (BBB) was opened by prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity-related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggested that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB.

      Strengths:

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves the opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data.

      Weaknesses:

      The conclusions are not compelling however because of a lack of explanation of methods and quantification. It also is not clear whether the prolonged stimulation in the rat was normal conditions. To their credit, the authors recorded the neuronal activity during stimulation, but it seemed excessive excitation. Since seizures open the BBB this result calls into question one of the conclusions. that the results reflect a normal brain. The authors could either conduct studies with stimulation that is more physiological or discuss the caveats of using a supraphysiological stimulus to infer healthy brain function.

      The conclusions are not compelling however because of a lack of explanation of methods and quantification.

      Thank you for this comment. In the revised paper, we expanded the Methods section to better describe the procedures and approaches we used for data analysis.

      It also is not clear whether the prolonged stimulation in the rat was normal conditions.

      We believe that the used stimulation protocol is within the physiological range (and relevant to plasticity, learning and memory) for the following reasons:

      1) In our continuous electrophysiological recordings, we did not observe any form of epileptiform or otherwise pathological activity.

      2) Memory/training/skill acquisition experiments in humans often involve similar training duration or longer (Bengtsson et al., 2005), e.g., a 30 min thumb training session performed by (Classen et al., 1998).

      3) The levels of SEP potentiation we observed are similar to those reported in:

      a) Rats following a 10-minute whisker stimulation (one hour post stimulation, (Mégevand et al., 2009)).

      b) Humans following a 15 min task (McGregor et al., 2016).

      This important point is now presented in the discussion.

      Reviewer #1 (Recommendations For The Authors):

      The discussion would benefit from additional discussion of the potential impacts of sex and anesthesia in their findings.

      We agree with the reviewer and have added the following paragraph to the discussion:

      “A key limitation of our animal experiments is the fact they were performed under anesthesia, due to the complex nature of the experimental setup (i.e., simultaneous cortical imaging and electrophysiological recordings). Anesthetic agents can potentially alter neuronal activity, SEPs, CBF, and vascular responses (Aksenov et al., 2015; Lindauer et al., 1993; Masamoto & Kanno, 2012). To minimize these effects, we used ketaminexylazine anesthesia, which unlike other anesthetics, was shown to maintain robust BOLD and SEP responses to neuronal activation (Franceschini et al., 2010; Shim et al., 2018). Another limitation of our animal study is the potentially non-specific effect of mβCD – an agent that disrupts caveola transport but may also lead to cholesterol depletion (Keller & Simons, 1998). To mitigate this issue, we used a very low mβCD concentration (10uM), orders of magnitude below the concentration reported to deplete cholesterol (Koudinov et al). Lastly, our animal study is limited by the inclusion of solely male rats. While our findings in humans did not point to sex-related differences in stimulation-evoked BBB modulation, larger animals and human studies are needed to examine this question.”

      The figure text is quite small.

      Thank you for pointing this out, we revised all figures and increased font size for clarity.

      Including pharmacological concentrations within the figure legends would improve the readability of the manuscript.

      Thank you for this suggestion, the figure legends were modified accordingly.

      In methods for immunoassays the 5 groups could be more clear by stating that there are 3 timepoints for stimulation experiments. There is a typo in this section where the 24-hour post is stated twice in the same sentence.

      Thank you for pointing this out, the text was modified accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1) In Figure 1, J and K seem to indicate that in these experiments the statisitics were done per slice and not per animal. This is not a reasonable approach, a repeat measure ANOVA or averaging for each animal are more appropriate statistical approaches.

      We thank the reviewer for pointing this out. The statistical analysis for Figure 1j,k was modified. We now use a nested ttest to test for differences between rats and not sections. The differences are still significant (EB, p=0.0296; Alexa, p=0.0229). The manuscript was modified accordingly.

      2) In Figure 2, the protocol does not seem to give much idea about time course. There was a stimulation test for 1 minute before and then 1 minute after the 30-minute stimulation train. How was potentiation assessed for the next 5 hours and where are the data?

      Potentiation was assessed by repeating 1min test stim every 30 min for the duration of the experiment, we added a panel to show late potentiation, see response above.

      3) In Figure 2, there is a notable lack of controls eg the effect of sham stimulation and application of saline. These are important as the drift of response magnitude can be a problem in long experiments.

      We did test for the potential presence of response drift, by examining whether SEPs of non-stimulated animals change over time (at baseline, 30 or 60 minutes of recording; n=6). No statistical differences were found. Our analysis focused on using each animal as its own control (i.e., comparing baseline SEP to SEP post albumin perfusion), because SEP studies highlight the importance of comparing each animal to its own baseline, due to the large inter-animal variability (All et al., 2010; Mégevand et al., 2009; Zandieh et al., 2003).

      4) Figure 3 a is not clear – were the drugs applied throughout?

      Thank you for pointing this out. We have revised Figure 3 a to show that the drugs were applied for 50 min before the stimulation.

      5) In Figure 3 panel d is repeated in panel j. This needs correcting

      Thank you. This mistake was fixed.

      6) In LTP-type experiments usually the antagonist is applied during the stimulation and then washed out. This avoids the problem in this figure in which CNQX effectively blocks transmission and so it is not possible to detect any enhancement if it were there. Eg in panel e, CNQX block transmission, and then the assessment is performed when the AMPA receptors are blocked after 30 minutes of stimulation. If receptors are blocked no enhancement will be detectable. Moreover, surely the question is the ratio of the effect of 30-minute stimulation on the SEP in the presence of CNQX and so the statistics should be done on the fold change in the SEP following 30-minute stimulation in the presence of CNQX.

      Thank you. The protocol might have been misrepresented in the original figure. We modified Fig 3a to clarify that the antagonists were indeed washed out upon stimulation start to make sure the receptors are not blocked during the test stimulation following the 30 min stimulation. In addition, we tested for the difference in fold change between 30 min stim, and 30 min stimulation following antagonists wash-in (Fig 3f and Fig S2a).

      7) Interesting in Figure f, stimulation, albumin, and AP5 all seem to have the same enhancement of the SEP. Is the lack of effect of 30-minute stimulation in the presence of AP5, a ceiling effect ie AP5 has enhanced the SEP, and no further enhancement from stimulation is possible.

      This is a very interesting point that will require further research.

      8) SJN seems to block neurotransmission. What is the mechanism? The same analysis as for CNQX should be performed ie what is the fold change not compared to baseline but in the presence of SJN.

      Our quantification showed that SJN did not significantly reduce the SEP max amplitude, and we therefore did not include this graph in the figure.

      9) Please acknowledge that the effect of mbetaCD is non-specific. There is a large literature on the effects of cholesterol depletion on LTP.

      We agree that the effect of mβCD may not be specific. To mitigate this issue, we used a very low mβCD concentration (10µM). Notably, this is markedly lower than the concentrations reported by Koudinov et al, showing that cholesterol depletion is only observed at a concentration of 5mM (Koudinov & Koudinova, 2001). This point is now discussed under the discussion paragraph describing the study’s limitations.

      10) k&l seem to have used the same control in which case they should not be analysed separately (they are all part of the same experiment).

      We agree with the reviewer and have revised the figure accordingly.

      11) The difference in gene expression in Figure 4 would be more convincing if it could be prevented by for example a TGFbeta inhibitor.

      We agree and acknowledge the impact such experiments could provide. We plan to incorporate these experiments into our future studies.

      12) Figure 5 seems to indicate bilateral and widespread BBB modulation arguing that this may be a non-specific effect. Panel g should look at other neocortical regions eg occipital cortex.

      We agree and thank the reviewer for this comment. We revised the figure to include other cortical areas, such as the frontal and occipital cortices (Figure 5g)

      Minor comments

      1) Paired data eg in Fig 2D are better represented by pairing the dots usually with a line.

      2) Please correct the %fold baseline in axes in graphs which show % change for baseline.

      3) Figure 4 is not correctly referred to in the text.

      We agree with all the points raised by the reviewer and revised the figures and text accordingly.

      Reviewer #3 (Recommendations For The Authors):

      The conclusions are not compelling however because of a lack of explanation of methods and quantification. It also is not clear whether the prolonged stimulation in the rat was normal conditions. To their credit, the authors recorded the neuronal activity during stimulation, but it seemed excessive excitation. Since seizures open the BBB this result calls into question one of the conclusions. that the results reflect a normal brain. The authors could either conduct studies with stimulation that is more physiological or discuss the caveats of using a supraphysiological stimulus to infer healthy brain function.

      Major concerns:

      Methods need more explanation. Rationales need more justification. Examples are provided below.

      Throughout many sections of the paper, sample sizes and stats are often missing. For stats, please provide p-values and other information (tcrit, U statistic, F, etc.)

      Thank you, we added the relevant information where it was missing throughout the manuscript.

      For transcriptomics, they might have found changes in BBB-related genes if they assayed vessels but they assayed the cortex.

      We agree with the reviewer that this would be a very interesting future direction. The present study could not include this kind of analysis due to lack of access to vasculature isolation methods or single-cell RNA seq.

      What were the inclusion/exclusion criteria for the subjects?

      Thank you for pointing out this lack of clarity. The methods section (under ‘Magnetic Resonance Imaging’ – ‘Participants’) was expanded to include the following:

      “Male and female healthy individuals, aged 18-35, with no known neurological or psychiatric disorders were recruited to undergo MRI scanning while performing a motor task (n=6; 3 males and 3 females). MRI scans of 10 sex- and age- matched individuals (with no known neurological or psychiatric disorders) who did not perform the task were used as control data (n=10; 5 males and 5 females.

      Were they age and sex-matched?

      They were, indeed, age and sex-matched. This was now clarified in the relevant Methods section.

      Were there other factors that could have influenced the results?

      Certainly. Human subjects are difficult to control for due to different schedules, diets, exercise habits, and other factors that may impact vascular integrity and brain function. Larger multimodal studies are needed to better understand the observed phenomenon.

      Fig. 1. Images are very dim. Text here and in other figures is often too small to see. Some parts of the figures are not explained.

      Our apologies. Figures and legends were revised accordingly.

      Fig 2a, f. I don't see much difference here- do the authors think there was?

      We agree that the difference may not be visually obvious. The quantification of trace parameters (amplitude and area under curve) does, however, reveal a significant SEP difference in response to both stimulation (panels X and y) and albumin (panels z and q).

      Fig 3 d and j seem the same.

      We thank the reviewer for noticing. This was a copy mistake that was now rectified.

      Lesser concerns and examples of text that need explana9on:

      Introduction

      Insulin-like growth factor is transported. From where to where?

      The text was edited to clarify that this was cross-BBB influx of insulin-like growth factor-I.

      RMT that underlies the transport of plasma proteins was induced by physiological or non-physiological stimulation.

      This was shown without stimulation, in normal physiology of young and aged healthy mice. The text was edited to clarify this point.

      What was the circadian modulation that was shown to implicate BBB in brain function?

      The text was edited for clarity.

      Results

      When the word stimulation is used please be specific if whiskers are moved by an experimenter, an electrode is used to apply current, etc.

      We have now moved the ‘Stimulation protocol’ section closer to beginning of the Methods and emphasized that we administered electrical stimulation to the forepaw or hindlimb using subdermal needle electrodes.

      Please explain how the authors are convinced they localized the vascular response.

      The vascular response was localized via: (1) visual detection of arterioles that dilated in response to stimulation (due to functional hyperemia / neurovascular coupling) [figure 1 d]; and (2) quantitative mapping of increased hemoglobin concentration (Bouchard et al., 2009) [Figure 1 b]. This is now mentioned in the methods (under ‘In vivo imaging’) and results (under the ‘Stimulation increases BBB permeability’).

      "30 min of limb stimulation" means what exactly? 6 Hz 2mA for 30 min?

      Thank you. The text was revised for clarity (Methods under ‘Stimulation protocol’):

      “The left forelimb or hind limb of the rat was stimulated using Isolated Scmulator device (AD Instruments) attached with two subdermal needle electrodes (0.1 ms square pulses, 2-3 mA) at 6 Hz frequency. Test stimulation consisted of 360 pulses (60 s) and delivered before (as baseline) and after long-duration stimulation (30 min, referred throughout the text as ‘stimulation’). In control and albumin rats, only short-duration stimulations were performed. Under sham stimulation, electrodes were placed without delivering current.”

      Histology that was performed to confirm extravasation needs clarification because if tissue was removed from the brain, and fixed in order to do histology, what is outside the vessels would seem likely to wash away.

      Thank you for pointing out the need to clarify this point. The Histology description in the Methods section was revised in the following manner:

      “Albumin extravasacon was confirmed histologically in separate cohorts of rats that were anesthetized and stimulated without craniotomy surgery. Assessment of albumin extravasacon was performed using a well-established approach that involves peripheral injection of either labeled-albumin (bovine serum albumin conjugated to Alexa Flour 488, Alexa488-Alb) or albumin-labeling dye (Evans blue, EB – a dye that binds to endogenous albumin and forms a fluorescent complex), followed by histological analysis of brain tissue (Ahishali & Kaya, 2020; Ivens et al., 2007; Lapilover et al., 2012; Obermeier et al., 2013; Veksler et al., 2020). Since extravasated albumin is taken up by astrocytes (Ivens et al., 2007; Obermeier et al., 2013), it can be visualized in the brain neuropil after brain removal and fixation (Ahishali & Kaya, 2020; Ivens et al., 2007; Lapilover et al., 2012; Veksler et al., 2020). Five rats were injected with Alexa488-Alb (1.7 mg/ml) and five with EB (2%, 20 mg/ml, n=5). The injections were administered via the tail vein. Following injection, rats were transcardially perfused with…”

      It is not clear why there was extravasacon contralateral but not ipsilateral if there are cortical-cortical connections.

      Interpersonally, we also did not observe ipsilateral SEP in response to limb stimulation, with evidence of SEP and BBB permeability only in the contralateral sensorimotor region. This finding is consistent with electrophysiological and fMRI studies showing that peripheral stimulation results in predominantly contralateral potentials (Allison et al., 2000; Goff et al., 1962).

      After injection of Evans blue or Alexa-Alb, how was it shown that there was extravasacon?

      Extravasalon in cortical sections was visualized using a fluorescent microscope (Figure 1 h-i). Since extravasated albumin is taken up by astrocytes, fluorescent imaging can be used for visualizing and quantifying labeled albumin (Ahishali & Kaya, 2020; Ivens et al., 2007; Knowland et al., 2014). Here is the relevant methods excerpt:

      “Coronal sections (40-μm thick) were obtained using a freezing microtome (Leica Biosystems) and imaged for dye extravasacon using a fluorescence microscope (Axioskop 2; Zeiss) equipped with a CCD digital camera (AxioCam MRc 5; Zeiss).”

      How is a sham control not stimulated - what is the sham procedure?

      In the sham stimulation protocol electrodes were placed, but current was not delivered. A section titled ‘Stimulation protocol’ was added to the methods to clarify this point.

      What was the method for photothrombosis-induced ischemia?

      The procedure for photothrombosis-induced ischemia is described under the Methods section ‘Immunoassays’ – ‘Enzyme-linked immunosorbent assay (ELISA) for albumin extravasalon’:

      “Rats were anesthetilzed and underwent … photothrombosis stroke (PT) as previously described (Lippmann et al., 2017; Schoknecht et al., 2014). Briefly, Rose Bengal was administered intravenously (20 mg/kg) and a halogen light beam was directed for 15 min onto the intact exposed skull over the right somatosensory cortex.”

      Fig 1d. All parts of d are not explained.

      Thank you for pointing this out. In the revised manuscript, the panels of this figure were slightly reordered, and we made sure all panels are explained in the legend.

      e. Is the LFP a seizure? How physiological is this- it does not seem very physiological.

      Thank you for your comment. We believe that this activity is not a seizure because it lacks the typical slow activity that corresponds to the “depolarizalon shir” observed during seizures (Ivens et al., 2007; Milikovsky et al., 2019; Zelig et al., 2022).

      f. Permeability index needs explanation. How was the area chosen for each rat? Randomly? Was it the same across rats?

      We have now revised the Methods section to provide a clearer description of the permeability index calculation and the choice of the imaging area:

      “Across all experiments, acquired images were the same size (512 × 512 pixel, ~1x1 mm), centered above the responding arteriole. Images were analyzed offline using MATLAB as described (Vazana et al., 2016). Briefly, image registration and segmentation were performed to produce a binary image, separating blood vessels from extravascular regions. For each extravascular pixel, a time curve of signal intensity over time was constructed. To determine whether an extravascular pixel had tracer accumulation over time (due to BBB permeability), the pixel’s intensity curve was divided by that of the responding artery (i.e., the arterial input function, AIF, representing tracer input). This ratio was termed the BBB permeability index (PI), and extravascular pixels with PI > 1 were identified as pixels with tracer accumulation due to BBB permeability.”

      g. For Evans blue and Alexa-Alb was the sample size rats or sections?

      Thank you for this question. We revised the statistical analysis for Figure 1j,k to appropriately asses the differences between rats. We used a nested t-test to test for differences between rats (and not sections). The differences remained significant (EB, p=0.0296; Alexa, p=0.0229) and the text was modified accordingly.

      h, i, j need more contrast and/or brightness to appreciate the images. Arrows would help. The text is too small to read.

      Thank you. This issue was addressed in the revised paper.

      To induce potentiation, 6 Hz 2 mA stimuli were used for 30 min. Please justify this as physiological.

      Thank you for the comment. We believe that the used stimulation protocol is within the physiological range (and relevant to plasticity, learning and memory) for the following reasons:

      1. In our continuous electrophysiological recordings, we did not observe any form of epileptiform or otherwise pathological activity.

      2. Memory/training/skill acquisition experiments in humans often involve similar training duration or longer (Bengtsson et al., 2005), e.g., a 30 min thumb training session performed by (Classen et al., 1998).

      3. The levels of SEP potentiation we observed are similar to those reported in:

      a. Rats following a 10-minute whisker stimulation (one hour post stimulation, (Mégevand et al., 2009)).

      b. Humans following a 15 min task (McGregor et al., 2016).

      We have revised the Discussion of the paper to clarify this important point.

      The test stimulus to evoke somatosensory evoked potentials was 1 min. Was this 6 Hz 2 mA for 1 min? Please justify.

      Yes. We chose these parameters as these ranges were shown to induce the largest changes in blood flow (with laserdoppler flowmetry) and summated SEP (Ngai et al., 1999), corresponding with our findings. We also show that these stimulation parameters do not induce changes in BBB permeability nor synaptic potentiation, therefore served as test control.

      How long after the 30 min was the test stimulus triggered- immediately? 30 sec afterwards?

      The test stimulus was applied 5 min afterwards to allow for BBB imaging protocol (now explained in the Methods section).

      How were amplitude and AUC measured? Baseline to peak? For AUC is it the sum of the upward and downward deflections comprising the LFP?

      Yes, and yes. This is now clarified in the ‘Analysis of electrophysiological recordings’ section in the Methods.

      How was the same site in the somatosensory cortex recorded for each animal?<br /> Potentiation was said to last >5 hrs. How often was it measured? Was potentiation the same for the amplitude and the AUC?

      The location of the cranial window over the somatosensory cortex was the same in all rats. The location of the specific responding arteriole may change between animals, but the recording electrode was places around the responding arteriole in the same approaching angle and depth for all animals.

      As the length of experiments differed between animals, the exact length could not be specifically stated. We therefore revised the text to clarify that LTP was recorded until the end of each experiment (depending on the animal condition, between 1.5-5 hours) and added a panel to figure 2 (Figure 2f) with exemplary data showing potentiation 120 min (2hr) post stimulation.

      Why was 25% of the serum level of albumin selected- does the brain ever get exposed to that much? Was albumin dissolved in aCSF or was aCSF chosen as a control for another reason?

      Yes, albumin was dissolved in aCSF and the solution was allowed to diffuse through the brain. The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration:

      1. The low diffusion rate of albumin (Tao & Nicholson, 1996).

      2. The likelihood of albumin to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).

      Figure 2.

      a. Please show baseline, the stimulus, and aftier the stimulus.

      Please point out when there was stimulacon.

      What is the inset at the top?

      The inset on top is the example trace of the stimulus waveform, the legend of the figure was modified for clarity.

      b. Please show when the stimulus artifact occurred. The end of the 1-minute test stimulus period is fine. Why are the SEPs different morphologies? It suggests the different locations in the cortex were recorded.

      What is shown is the averaged SEP response over 1min test stimulus, each SEP is time locked to each stimulus. Regarding SEP waveform, it does indeed show different morphology between animals, as sometimes different arterioles respond to the stimulation, and we localize the recording to the responding vessel in each rat. However, in each rat the recording is only from one location. Once the electrode was positioned near the responding arteriole it was not moved.

      d, e. What are the stats?

      h, i. Add stats. Are all comparisons Wilcoxon? Please provide p values.

      The comparisons were performed with the Wilcoxon test. We now state that and provide the exact p values.

      j. What was selected from the baseline and what was selected during Albumin and how long of a record was selected?

      What program was used to create the spectrogram?

      What is meant by changes at frequencies above 200 Hz, the frequencies of HFOs?

      The Method section (under ‘Electrophysiology – Data acquisition and analyses’) has been revised for clarification. Spectrogram was created with MATLAB and graphed with Prism. For analysis, we selected a 10 min recorded segment before starting albumin perfusion, and 10 min after terminating albumin perfusion.

      When the cortial window was exposed to drugs, what were concentrations used that were selective for their receptor? How long was the exposure?

      Was the vehicle tested?

      We have revised the Methods section (under ‘Animal preparation and surgical procedures - Drug application’) to clarify the duration and concentration used and justification. All blockers were exposed for 50 min. The vehicle was an artificial cerebrospinal fluid solution (aCSF).

      For PSD-95, what was the area of the cortex that was tested?

      Were animals acutely euthanized and the brain dissected, frozen, etc?

      We have revised the Methods section (under ‘Immunoassays’) for clarity.

      What is mbetaCD?

      The full term was added to the results section. It is also mentioned in the Methods.

      Is SJN specific at the concentration that was chosen? Did it inhibit the SEP?

      In the concentration used in our experiments, SJN is a selective TGF-β type I receptor ALK5 inhibitor (see (Gellibert et al., 2004)).

      Fig. 3b. It looks like CNQX increased the width of the vessels quite a bit. Please explain.

      For AP5, very large vessels were imaged, making it hard to compare to the other data.

      The vascular dilation in response to the stimulation under CNQX was similar to that seen under “normal” conditions (i.e. aCSF). As for AP5, in some experiments the responding arteriole was in close proximity to a large venule that cannot be avoidable while imaging. For quantification we always measured arterioles within the same diameter range.

      e. Sometimes CNQX did not block the response after 30 min stimulation. Why?

      CNQX is washed out before the 30 min stimulation starts, so it is not expected to block the response to stimulation. However, in some cases the response to stimulation was lower in amplitude, likely due to residual CNQX that did not wash out completely.

      Regarding DEGs, on the top of p 10 what are the percentages of?

      In this analysis we tested in each hemisphere how many genes expressed differentially between 1 and 24 hours post stimulation (either up- or down- regulated). The results were presented as the percentages of differentially expressed genes in each hemisphere (13.2% contralateral, and 7.3% ipsilateral). The text was rephrased for clarity.

      Please add a ref for the use of the JSD metric methods and support for its use as the appropriate method. Other methods need explanation/references.

      References were added to the text to clarify. The Jensen-Shannon Divergence metric is commonly used to calculate the statistical pairwise distance among two distributions (Sudmant et al., 2015). From comparing a few different distance metric calculations including JSD, our results were similar irrespective of the distance metric applied. Therefore, we demonstrate the variability between paired samples of stimulated and non-stimulated cortex of each animal at two time points following stimulation (24 h vs. 1 h) using JSD.

      What synaptic plasticity genes were selected for assay and what were not?

      What does "largely unaffected" mean? Some of the genes may change a small amount but have big functional effects.

      The selected genes of interest were taken from a large list compiled from previous publications (see (Cacheaux et al., 2009; Kim et al., 2017)) and are well documented in gene ontology databases and tools (e.g., Metascape, (Zhou et al., 2019)).

      We agree that the term ‘largely unaffected’ is suboptimal, and we rephrased this section of the results to indicate that “No significant differences were found in BBB or inflammation related genes between the hemispheres”. We also agree that a small number of genes can have big functional effects. Future studies are needed to better understand the genes underlying the observed BBB modulation.

      Please note that Slc and ABCs are not only involved in the BBB.

      Thank you. We modified the text to no longer specify that these are BBB-specific transporters.

      Please explain the choice of the stress ball squeeze task, and DCE.

      DCE is a well-established method for BBB imaging in living humans, and it is cited throughout the manuscript. The ball squeeze task was chosen as it is presumed to involve primarily sensory motor areas, without high-level processing (Halder et al., 2005). This is now stated in the discussion.

      What is Gd-DOTA?

      Gd-DOTA is a gadolinium-based contrast agent (gadoterate meglumine, AKA Dotarem). Text was revised for clarity. Please see the Methods section under ‘Magnetic Resonance Imaging’ - ‘Data Acquisition’.

      What does a higher percentage of activated regions mean- how was activacon defined and how were regions counted?

      Higher percentage of activated regions refers to regions in which voxels showed significant BOLD changes due to the motor task preformed. The statistical approaches and analyses are detailed in the Methods section under ‘Magnetic Resonance Imaging - Preprocessing of functional data, and fMRI Localizer Motor Task’.

      Figure. 4

      Was stimulation 1 min or 30 min.?

      30 min, Text has been revised for clarity.

      What is the Wald test and how were p values adjusted-please add to the Stats section.

      The Methods section under ‘Statistical analysis’ was revised to clarify this point.

      Is there a reason why p values are sometimes circles and otherwise triangles?

      The legend was revised to explain that ”Circles represent genes with no significant differences between 1 and 24 h poststimulation. Upward and downward triangles indicate significantly up- and down- regulated genes, respectively.”

      How can a p-value be zero? Please explain abbreviations.

      The p-value is very low (~10-10) and therefore appears to be zero due to the scale of the y-axis.

      Fig. 5b.

      There are unexplained abbreviations.

      The x on the ball and hand is not clear relative to the black ball and hand.

      Thank you for noticing. We revised the figure for clarity.

      c. What was the method used to make an activator map and what is meant by localizer task?

      The explanation of the “fMRI Localizer Motor Task” section in the methods was revised for added clarity.

      f. What is the measurement "% area" that indicates " BBB modulation"?

      Is it in f, the BBB permeable vessels (%)? f. Please explain: "Heatmap of BBB modulated voxels percentage in motor/sensory-related areas of task vs. controls."

      The %area measurement indicates the percentage of voxels within a specific brain region that have a leaky BBB. See Methods.

      Is Task - the control?

      Yes.

      Supplemental Fig. 2.

      Why is AUC measured, not amplitude?

      The amplitude, and now also the AUC are shown in Figure 3.

      b. There is no comparison to baseline. The arrowhead points to the start of stimulation but there is no arrowhead marking the end.

      In the revised paper we added a grey shade over the stimulation period to better visualize the difference to baseline. In this panel we wanted to show that NMDA receptor antagonist did not block the SEP, while AMPA receptor antagonist did.

      c. In the blot there are two bands for PSD95- which is the one that is PSD95? There is no increase in PSD95 uncl 24 hrs but in the graph in d there is. In the blot, there is a strong expression of PSD95 ipsilateral compared to contralateral in the sham-why?

      What is the percent change fold?

      The PSD-95 is the top and larger band. The lower band was disregarded in the analysis. The example we show may not fully reflect the group statistics presented in panel d. Upon quantification of 8 animals, PSD-95 is significantly higher 30 min and 24 hours post stimulation in the contralateral hemisphere. No significant changes were found in sham animals. The % change fold refers to the AUC change compared to baseline. This panel was now incorporated in Figure 3 (panel h), and the title was corrected to “|AUC|, % change from baseline”.

      Supplemental Fig. 4.

      a. If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral?

      Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments.

      Supplemental Fig. 5 includes many processes not in the results. Examples include dorsal cuneate and VPL, dynamin, Kir, mGluR, etc. The top right has numbers that are not mentioned. If the drawings are from other papers they should be cited.

      The drawings of Figure 5 are original and were not published before. This hypothesis figure points to mechanisms that may drive the phenomena described in the paper. The legend of the figure was revised to include references to mechanisms that were not tested in this study.

      Papers referenced in this letter:

      Ahishali, B., & Kaya, M. (2020). Evaluation of Blood-Brain Barrier Integrity Using Vascular Permeability Markers: Evans Blue, Sodium Fluorescein, Albumin-Alexa Fluor Conjugates, and Horseradish Peroxidase. Methods in Molecular Biology, 2367, 87–103. https://doi.org/10.1007/7651_2020_316

      Aksenov, D. P., Li, L., Miller, M. J., Iordanescu, G., & Wyrwicz, A. M. (2015). Effects of anesthesia on BOLD signal and neuronal activity in the somatosensory cortex. Journal of Cerebral Blood Flow and Metabolism, 35(11), 1819–1826. https://doi.org/10.1038/jcbfm.2015.130

      All, A. H., Agrawal, G., Walczak, P., Maybhate, A., Bulte, J. W. M., & Kerr, D. A. (2010). Evoked potential and behavioral outcomes for experimental autoimmune encephalomyelitis in Lewis rats. Neurological Sciences, 31(5), 595–601. https://doi.org/10.1007/s10072-010-0329-y

      Allison, J. D., Meador, K. J., Loring, D. W., Figueroa, R. E., & Wright, J. C. (2000). Functional MRI cerebral activation and deactivation during finger movement. Neurology, 54(1), 135–142. https://doi.org/10.1212/wnl.54.1.135

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. https://doi.org/10.1038/nn1516

      Betterton, R. D., Abdullahi, W., Williams, E. I., Lochhead, J. J., Brzica, H., Stanton, J., Reddell, E., Ogbonnaya, C., Davis, T. P., & Ronaldson, P. T. (2022). Regula/on of Blood-Brain Barrier Transporters by Transforming Growth Factor-β/Activin Receptor-Like Kinase 1 Signaling: Relevance to the Brain Disposition of 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase Inhibitors (i.e., Sta/ns). Drug Metabolism and Disposition, 50(7), 942–956. https://doi.org/10.1124/dmd.121.000781

      Bouchard, M. B., Chen, B. R., Burgess, S. A., & Hillman, E. M. C. (2009). Ultra-fast multispectral optical imaging of cortical oxygenation, blood flow, and intracellular calcium dynamics. Optics Express, 17(18), 15670. https://doi.org/10.1364/oe.17.015670

      Cacheaux, L. P., Ivens, S., David, Y., Lakhter, A. J., Bar-Klein, G., Shapira, M., Heinemann, U., Friedman, A., & Kaufer, D. (2009). Transcriptome profiling reveals TGF-β signaling involvement in epileptogenesis. Journal of Neuroscience, 29(28), 8927–8935. https://doi.org/10.1523/JNEUROSCI.0430-09.2009

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. https://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Franceschini, M. A., Radhakrishnan, H., Thakur, K., Wu, W., Ruvinskaya, S., Carp, S., & Boas, D. A. (2010). The effect of different anesthetics on neurovascular coupling. NeuroImage, 51(4), 1367–1377. https://doi.org/10.1016/j.neuroimage.2010.03.060

      Gellibert, F., Woolven, J., Fouchet, M. H., Mathews, N., Goodland, H., Lovegrove, V., Laroze, A., Nguyen, V. L., Sautet, S., Wang, R., Janson, C., Smith, W., Krysa, G., Boullay, V., De Gouville, A. C., Huet, S., & Hartley, D. (2004). Identification of 1,5-naphthyridine derivatives as a novel series of potent and selective TGF-β type I receptor inhibitors. Journal of Medicinal Chemistry, 47(18), 4494–4506. https://doi.org/10.1021/jm0400247

      Goff, W. R., Rosner, B. S., & Allison, T. (1962). Distribution of cerebral somatosensory evoked responses in normal man. Electroencephalography and Clinical Neurophysiology, 14(5), 697–713. https://doi.org/10.1016/0013-4694(62)90084-6

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. https://doi.org/10.1111/J.1460-9568.2005.04045.X

      Ivens, S., Kaufer, D., Flores, L. P., Bechmann, I., Zumsteg, D., Tomkins, O., Seiffert, E., Heinemann, U., & Friedman, A. (2007). TGF-β receptor-mediated albumin uptake into astrocytes is involved in neocortical epileptogenesis. Brain, 130(2), 535–547. https://doi.org/10.1093/brain/awl317

      Kaplan, L., Chow, B. W., & Gu, C. (2020). Neuronal regulation of the blood–brain barrier and neurovascular coupling. In Nature Reviews Neuroscience (Vol. 21, Issue 8, pp. 416–432). Nature Research. https://doi.org/10.1038/s41583-020-0322-2

      Keller, P., & Simons, K. (1998). Cholesterol is required for surface transport of influenza virus hemagglutinin. Journal of Cell Biology, 140(6), 1357–1367. https://doi.org/10.1083/jcb.140.6.1357

      Kim, S. Y., Senatorov, V. V., Morrissey, C. S., Lippmann, K., Vazquez, O., Milikovsky, D. Z., Gu, F., Parada, I., Prince, D. A., Becker, A. J., Heinemann, U., Friedman, A., & Kaufer, D. (2017). TGFβ signaling is associated with changes in inflammatory gene expression and perineuronal net degradation around inhibitory neurons following various neurological insults. Scientific Reports, 7(1), 7711. https://doi.org/10.1038/s41598-017-07394-3

      Knowland, D., Arac, A., Sekiguchi, K. J., Hsu, M., Lutz, S. E., Perrino, J., Steinberg, G. K., Barres, B. A., Nimmerjahn, A., & Agalliu, D. (2014). Stepwise Recruitment of Transcellular and Paracellular Pathways Underlies Blood-Brain Barrier Breakdown in Stroke. Neuron, 82(3), 603–617. https://doi.org/10.1016/j.neuron.2014.03.003

      Koudinov, A. R., & Koudinova, N. V. (2001). Essen/al role for cholesterol in synaptic plasticity and neuronal degeneration. The FASEB Journal, 15(10), 1858–1860. https://doi.org/10.1096/r.00-0815re

      Lapilover, E. G., Lippmann, K., Salar, S., Maslarova, A., Dreier, J. P., Heinemann, U., & Friedman, A. (2012). Periinfarct blood-brain barrier dysfunction facilitates induction of spreading depolarization associated with epileptiform discharges. Neurobiology of Disease, 48(3), 495–506. htttts://doi.org/10.1016/j.nbd.2012.06.024

      Lindauer, U., Villringer, A., & Dirnagl, U. (1993). Characterization of CBF response to somatosensory stimulation: Model and influence of anesthetics. American Journal of Physiology - Heart and Circulatory Physiology, 264(4 33-4), 223–1228. https://doi.org/10.1152/ajpheart.1993.264.4.h1223

      Lippmann, K., Kamintsky, L., Kim, S. Y., Lublinsky, S., Prager, O., Nichtweiss, J. F., Salar, S., Kaufer, D., Heinemann, U., & Friedman, A. (2017). Epileptiform activity and spreading depolarization in the bloodbrain barrier-disrupted peri-infarct hippocampus are associated with impaired GABAergic inhibition and synaptic plasticity. Journal of Cerebral Blood Flow and Metabolism, 37(5), 1803–1819. https://doi.org/10.1177/0271678X16652631

      Masamoto, K., & Kanno, I. (2012). Anesthesia and the quantitative evaluation of neurovascular coupling. In Journal of Cerebral Blood Flow and Metabolism (Vol. 32, Issue 7, pp. 1233–1247). SAGE PublicationsSage UK: London, England. https://doi.org/10.1038/jcbfm.2012.50

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. https://doi.org/10.1016/j.cub.2016.01.064

      McMillin, M. A., Frampton, G. A., Seiwell, A. P., Patel, N. S., Jacobs, A. N., & DeMorrow, S. (2015). TGFβ1 exacerbates blood-brain barrier permeability in a mouse model of hepatic encephalopathy via upregulation of MMP9 and downregulation of claudin-5. Laboratory Investigation, 95(8), 903–913. https://doi.org/10.1038/labinvest.2015.70

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. https://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Milikovsky, D. Z., Ofer, J., Senatorov, V. V., Friedman, A. R., Prager, O., Sheintuch, L., Elazari, N., Veksler, R., Zelig, D., Weissberg, I., Bar-Klein, G., Swissa, E., Hanael, E., Ben-Arie, G., Schefenbauer, O., Kamintsky, L., Saar-Ashkenazy, R., Shelef, I., Shamir, M. H., … Friedman, A. (2019). Paroxysmal slow cortical activity in Alzheimer’s disease and epilepsy is associated with blood-brain barrier dysfunction. Science Translational Medicine, 11(521), eaaw8954–eaaw8954. https://doi.org/10.1126/scitranslmed.aaw8954

      Ngai, A. C., Jolley, M. A., D’Ambrosio, R., Meno, J. R., & Winn, H. R. (1999). Frequency-dependent changes in cerebral blood flow and evoked potentials during somatosensory stimulation in the rat. Brain Research, 837(1–2), 221–228. https://doi.org/10.1016/S0006-8993(99)01649-2

      Obermeier, B., Daneman, R., & Ransohoff, R. M. (2013). Development, maintenance and disruption of the blood-brain barrier. In Nature Medicine (Vol. 19, Issue 12, pp. 1584–1596). Nature Publishing Group. https://doi.org/10.1038/nm.3407

      Schoknecht, K., Prager, O., Vazana, U., Kamintsky, L., Harhausen, D., Zille, M., Figge, L., Chassidim, Y., Schellenberger, E., Kovács, R., Heinemann, U., & Friedman, A. (2014). Monitoring stroke progression: In vivo imaging of cortical perfusion, blood-brain barrier permeability and cellular damage in the rat photothrombosis model. Journal of Cerebral Blood Flow and Metabolism, 34(11), 1791–1801. https://doi.org/10.1038/jcbfm.2014.147

      Schumacher, L., Slimani, R., Zizmare, L., Ehlers, J., Kleine Borgmann, F., Fitzgerald, J. C., Fallier-Becker, P., Beckmann, A., Grißmer, A., Meier, C., El-Ayoubi, A., Devraj, K., Mittelbronn, M., Trautwein, C., & Naumann, U. (2023). TGF-Beta Modulates the Integrity of the Blood Brain Barrier In Vitro, and Is Associated with Metabolic Alterations in Pericytes. Biomedicines, 11(1), 1–19. https://doi.org/10.3390/biomedicines11010214

      Shim, H. J., Jung, W. B., Schlegel, F., Lee, J., Kim, S., Lee, J., & Kim, S. G. (2018). Mouse fMRI under ketamine and xylazine anesthesia: Robust contralateral somatosensory cortex ac/va/on in response to forepaw stimulation. NeuroImage, 177, 30–44. https://doi.org/10.1016/J.NEUROIMAGE.2018.04.062

      Sudmant, P. H., Alexis, M. S., & Burge, C. B. (2015). Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biology, 16(1), 287. https://doi.org/10.1186/s13059-015-0853-4

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. https://doi.org/10.1016/0306-4522(96)00303-X

      Vazana, U., Veksler, R., Pell, G. S., Prager, O., Fassler, M., Chassidim, Y., Roth, Y., Shahar, H., Zangen, A., Raccah, R., Onesti, E., Ceccanti, M., Colonnese, C., Santoro, A., Salvati, M., D’Elia, A., Nucciarelli, V., Inghilleri, M., & Friedman, A. (2016). Glutamate-mediated blood–brain barrier opening: Implications for neuroprotection and drug delivery. Journal of Neuroscience, 36(29), 7727–7739. https://doi.org/10.1523/JNEUROSCI.0587-16.2016

      Veksler, R., Vazana, U., Serlin, Y., Prager, O., Ofer, J., Shemen, N., Fisher, A. M., Minaeva, O., Hua, N., SaarAshkenazy, R., Benou, I., Riklin-Raviv, T., Parker, E., Mumby, G., Kamintsky, L., Beyea, S., Bowen, C. V., Shelef, I., O’Keeffe, E., … Friedman, A. (2020). Slow blood-to-brain transport underlies enduring barrier dysfunction in American football players. Brain, 143(6), 1826–1842. https://doi.org/10.1093/brain/awaa140

      Zandieh, S., Hopf, R., Redl, H., & Schlag, M. G. (2003). The effect of ketamine/xylazine anesthesia on sensory and motor evoked potentials in the rat. Spinal Cord, 41(1), 16–22. https://doi.org/10.1038/sj.sc.3101400

      Zelig, D., Goldberg, I., Shor, O., Ben Dor, S., Yaniv-Rosenfeld, A., Milikovsky, D. Z., Ofer, J., Imtiaz, H., Friedman, A., & Benninger, F. (2022). Paroxysmal slow wave events predict epilepsy following a first seizure. Epilepsia, 63(1), 190–198. https://doi.org/10.1111/epi.17110

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. https://doi.org/10.1016/S01655728(01)00242-9

      Zhou, Y., Zhou, B., Pache, L., Chang, M., Khodabakhshi, A. H., Tanaseichuk, O., Benner, C., & Chanda, S. K. (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications, 10(1), 1–10. https://doi.org/10.1038/s41467-019-09234-6

    1. Author Response

      Reviewer #1 (Public Review)

      Midbrain dopamine neurons have attracted attention as a part of the brain's reward system. A different line of research, on the other hand, has shown that these neurons are also involved in higher cognitive functions such as short-term memory. However, these neurons are thought not to encode short-term memory itself because they just exhibit a phasic response in short-term memory tasks, which cannot seem to maintain information during the memory period. To understand the role of dopamine neurons in short-term memory, the present study investigated the electrophysiological property of these neurons in rodents performing a T-maze version of a short-term memory task, in which a visual cue indicated which arm (left or right) of the T-maze was associated with a reward. The animal needed to maintain this information while they were located between the cue presentation position and the selection position of the T-maze. The authors found that the activity of some dopamine neurons changed depending on the information while the animals were located in the memory position. This dopamine neuron modulation was unable to explain the motivation or motor component of the task. The authors concluded that this modulation reflected the information stored as short-term memory.

      I was simply surprised by their finding because these dopamine neurons are similar to neurons in the prefrontal cortex that store memory information with sustained activity. Dopamine neurons are an evolutionally conserved structure, which is seen even in insects, whereas the prefrontal cortex is developed mainly in the primate. I feel that their findings are novel and would attract much attention from readers in the field. But the authors need to conduct additional analyses to consolidate their conclusion.

      We thank reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Reviewer #1 (Recommendations to The Authors)

      (1) The authors found the dopamine neuron modulation that reflected the memory information during the delay period. Here the dopamine neuron activity was aligned by the position, not by time, in which the animals needed to maintain the information. Usually, the activity was aligned by time, and many studies found that dopamine neurons exhibited a short duration burst in response to rewards and behaviorally relevant stimuli including visual cues presented in short-term memory tasks. For comparison, I (and probably other readers) want to see the time-aligned dopamine neuron modulation that reflected the memory information. Did the modulation still exist? Did it have a long duration? The authors just showed the time-aligned "population" activity that exhibited no memory-dependent modulation.

      We agree that the point raised by the reviewer is important. To address this question, we added a new paragraph to the Methods section titled “Methodological considerations” (in line 793 of the revised manuscript), where we explain the caveats of using time alignment in the T-maze task study. We also created a new sup figure 5 to clarify our argument. As the figure shows, we did not observe major differences in the firing rates when they were arranged by position or time. More importantly, we did not detect brief bursts of activity in response to the visual cue which could reflect an RPE signaling scheme. Our interpretation is that in the T-maze task, DA neurons encode “miniature” RPE signals between successive states in the T-maze, which are hard to detect, especially when neurons receive a continuous sensory input during trials.

      (2) Several studies have reported that dopamine neurons at different locations encode distinct signals even within the VTA or SNr. Were the locations of dopamine neurons maintaining the memory information different from those of other dopamine neurons?

      We thank the reviewer’s comment. Indeed, there is evidence from recent studies demonstrating that DA neurons form functional and anatomical clusters in the VTA and SN. Following the reviewer’s advice, we report the anatomical structure of memory and non-memory-specific neurons in the revised manuscript. You can read these results in the paragraph “Anatomical organization of trajectory-specific neurons.” in the “Results” section (in line 383 of the revised manuscript) and in the new sup figure 11. We only observed a clear functional-anatomical segregation in GABA neurons, but not in DA neurons. But we should note that the absence of segregation in the DA neurons could be accounted for by the fact that we recorded mostly from the lateral VTA, therefore we do not have any numbers from the medial VTA.

      (3a) Did the dopamine neurons maintaining the memory information respond to reward?

      We believe that we have already provided the data that can partially answer this question by correlating the firing rate difference between the reward and memory delay sections. This result was described in the “Neuronal activities in delay and reward are unrelated.” paragraph and in Figure 6. Moreover, motivated by the reviewer’s question, we also performed additional analysis, which is included in the revised manuscript. Briefly, we clustered significant responses between the memory delay and reward sections (Category 1: Left-signif, R-signif or No-signif / Category 2: Memory delay or Reward). We discovered that only a very small number of neurons showed the same significant trajectory preference in the memory delay and reward sections (i.e., significant preference for left trials in the memory delay and significant preference for the left reward). In fact, more significant neurons showed a preference for opposite trajectories (i.e. significant preference for left trials in memory delay and a significant preference for right rewards). A description of the new results is included in the “Neuronal activities in delay and reward are unrelated.” paragraph (in line 349 of the revised manuscript) and in the new supplementary Figure 11.

      (3b) Did they encode reward prediction error? The relationship between the present data and the conventional theory may be valuable.

      We understand that the readers of this study will come up with the question of how memory-specific activities are related to RPE signaling. However, the T-maze task we used in this research was designed for studying working memory and was not adequate to extract information about the RPE signaling of DA neurons.

      RPE signaling is mainly studied in Pavlovian conditioning. These are low-dimensional tasks with usually four (4) states (state1: ITI, state2: trial start, state3: stimulus presentation, state4: reward delivery). Evidence of RPE signaling is extracted from the firing activity of states 3 and 4 (which is theorized to be related to the difference in the values for states 3 and 4).

      However, in the T-maze task, the number of states is hard to define and practically countless. In these conditions, it has been suggested that numerous small RPEs are signaled while the mice navigate the maze; Thus, they are very difficult to detect. To our knowledge, only Kim et al 2020, Cell, vol183, pg1600, managed to detect the RPE signaling activity of DA neurons while mice were teleported in a virtual corridor.

      Another confounding factor in extracting RPE signals in the T-maze task is that the environment is high-dimensional and DA neurons are multitasking. Therefore, it is likely that RPE signaling could be masked by other parallel encoding schemes.

      We have added these descriptions in the “Methodological considerations” (in line 793 of the revised manuscript).

      (4) Did the dopamine neurons maintaining the memory information (left or right) prefer a contralateral direction like neurons in the motor cortex?

      We thank the reviewer for this comment. Indeed, the majority of the memory-specific DA neurons showed a preference for the contralateral direction. We report this result in the legend of the new sup fig 10 (in line 1668 of the revised manuscript).

      (5) As shown in Table S2, the proportion of GABA neurons maintaining the memory information (left or right during delay) was much larger than that of dopamine neurons. It seems to be strange because the main output neurons in the VTA are dopaminergic. What is the role of these GABA neurons?

      We thank the reviewer for pointing this out. The present study shows that in both populations a sizeable portion of neurons show memory-specific encoding activities. However, the percentage of memory-encoding GABA neurons is more than twice as large as in the DA neurons. Moreover, we show that GABA neurons are functionally and anatomically segregated.

      From this evidence, one could raise the hypothesis that the GABA neurons have a primary role and that the activity of DA neurons is a collateral phenomenon, triggered in a sequence of events within the VTA network. To characterize the (1) role and (2) importance of GABA neurons in memory-guided behavior, one should first identify the afferent and efferent projections of these cells in great detail. Unfortunately, we do not provide anatomical evidence.

      So far, with the electrophysiological data we have collected (unit and field recordings), we can address an alternative hypothesis. It has been reported earlier (but we have also observed) that the VTA circuit engages in behaviorally related network oscillations which range from 0.4Hz up to 100Hz. Converging evidence from different brain regions, in vitro preparations but also in vivo recordings agree that local networks of inhibitory neurons are crucial for the generation, maintenance, and spectral control of network oscillations. Ongoing analysis, which we hope will lead to a publication, is looking for the behavioral correlates of network oscillations on the T-maze task, as well as the correlation of single-unit firing activity to the field oscillations. We expect to detect a higher field-unit coherence in GABA neurons, which could explain their stronger engagement in memory-specific encoding activity.

      The potential role of GABA neurons in network oscillations is discussed in the revised manuscript in a newly added paragraph in line 564.

      Reviewer #2 (Public Review)

      The authors phototag DA and GABA neurons in the VTA in mice performing a t-maze task, and report choice-specific responses in the delay period of a memory-guided task, more so than in a variant task w/o a memory component. Overall, I found the results convincing. While showing responses that are choice selective in DA neurons is not entirely novel (e.g. Morris et al NN 2006, Parker et al NN 2016), the fact that this feature is stronger when there is a memory requirement is an interesting and novel observation.

      I found the plots in 3B misleading because it looks like the main result is the sequential firing of DA neurons during the Tmaze. However, many of the neurons aren't significant by their permutation test. Often people either only plot the neurons that are significant, or plot with cross-validation (ie sort by half of the trials, and plot the other half).

      Relatedly, the cross-task comparisons of sequences (Fig, 4,5) are hampered by the fact that they sort in one task, then plot in the other, which will make the sequences look less robust even if they were equally strong. What happens if they swap which task's sequences they use to order the neurons? I do realize they also show statistical comparisons of modulated units across tasks, which is helpful.

      We thank reviewer #2 for the valuable and constructive comments on our manuscript. If, as the reviewer commented, the rate differences between left and right trajectories were only the result we want to claim, there may be a way to show only those whose left and right are significant. However, the sequential activity is also one of the points we wanted to display. We did not emphasize this result because it has already been shown by Engelhard et al. 2019. However, after reading the reviewer's comments, we decided to add a few lines in the "Results" (in lines 205 - 215 of the revised manuscript) and "Discussion" (in line 453 of the revised manuscript) describing the sequential activity of the VTA circuit. In those lines, we explained that DA activity is position-specific (resulting in sequential activity) and that a fraction of them also have left-right specificity.

      Overall, the introduction was scholarly and did a good job covering a vast literature. But the explanation of t-maze data towards the end of the introduction was confusing. In Line 87, I would not say "in the same task" but "in a similar task" because there are many differences between the tasks in question.

      We thank the reviewer for pointing out this mistake. In the revised manuscript, we replaced “in the same task” with “in a similar task” (in line 85 of the revised manuscript).

      And not clear what is meant by "by averaging neuronal population activities, none of these computational schemes would have been revealed. " There was trial averaging, at least in Harvey et al. I thought the main result of that paper related to coding schemes was that neural activity was sequential, not persistent. I think it would help the paper to say that clearly.

      We admit that this sentence leaves room for misunderstanding. We were mainly referring to DA studies using microdialysis or fiber photometry techniques. We decided to delete this sentence in the revised manuscript.

      Also, I'm not aware it was shown that choice selectivity diminishes when the memory demand of the task is removed - please clarify if that is true in both referenced papers.

      The reviewer’s remark is correct. None of these reports show explicitly that memory-specific activities are diminished without the memory component. Therefore, we deleted this sentence in the revised manuscript.

      If so, an interpretation of this present data could be found in Lee et al biorxiv 2022, which presents a computational model that implies that the heterogeneity in the VTA DA system is a reflection of the heterogeneity found in upstream regions (the state representation), based on the idea that different subsets of DA neurons calculate prediction errors with respect to different subsets of the state representation.

      We thank the reviewer for sharing this interpretation. We agree that this theory would support our results. In the revised manuscript we briefly discuss the Lee et al. report (in line 460 of the revised manuscript).

      I am surprised only 28% of DA neurons responded to the reward - the reward is not completely certain in this task. This seems lower than other papers in mice (even Pavlovian conditioning, when the reward is entirely certain). It would be helpful if the authors comment on how this number compares to other papers.

      In Pavlovian conditioning, neuronal responses to rewards are compared to a relatively quiet period of firing activity (usually the inter-trial interval epoch). As the reviewer pointed out, in the present study, the number of DA neurons responding to reward is smaller compared to the earlier studies. We hypothesize that this is due to our comparison method. We compared the post-reward response to an epoch when the animal was running along the side arms and the majority of neurons were highly active, instead of comparing it to a quiescent baseline epoch.

      Reviewer #2 (Recommendations to The Authors)

      Can you clarify what disparity you are referring to here? "Disparities between this 438 and our study in the proportions of modulated neurons could be attributed to the 439 different recording techniques applied as well as the maze regions of interest; for 440 example, Engelhard et al. analyzed neuronal firing activities in the visual-cue period 441 (Engelhard et al., 2019), whereas we focused on memory delay.". Is it the fact that Engelhard et al did not report choice-selective activity? They did report cue-side-selective activity, with some neurons responsive to cues on one side, and other neurons responsive to cues on the other side. Because there are more cues on the left when the mouse turns left, these neurons do indeed have choice-selective responses.

      We thank the reviewer for this comment. We agree that we need to clarify further our argument. As the reviewer pointed out, Engelhard et al identified choice-specific DA neurons. However, they reported the encoding properties of DA neurons only in the visual-cue period and the reward period. Remarkably, although the task has a memory delay, they did not report the neuronal firing activities for this delay period. Instead, in the present study we dedicated most of our analysis to characterizing the firing properties of VTA neurons in the delay period.

      Also, in response to your comment, we edited the paragraph where we describe the disparities between our study and Engelhard et al (in line 466 in the revised manuscript).

      I don't think this sentence of intro is needed since it doesn't really contain new info: "Therefore, we looked for hints 116 of memory-related encoding activities in single DA and GABA neurons by 117 characterizing their firing preference for opposite behavioral choices.".

      We agree with the reviewer. Therefore, we deleted this sentence in the revised manuscript.

      I didn't understand this line of discussion: "Our evidence does not question the validity of this computational model, since we do not provide evidence of how the selective preference for one response over the other translates into the release site.".

      The gating theory is based on experimental evidence of neuronal firing activities of DA neurons but also takes into consideration (to a lesser degree) the pre- and post-synaptic processes at the DA release sites (inverted U-shape of D1R activity). We thought that the reader may come to the conclusion that we question the validity of the gating theory. But this is not our intention, especially when we do not provide important evidence such as (1) the projection sites of DA and GABA neurons and (2) the sequence of events that take place at the synaptic triads following the DA and GABA release.

      After reading your comment we came to the conclusion that this sentence should be omitted because it is not within the scope of this study to question the validity of the gating theory. Instead, we dedicated a few lines of text to explaining which components of the gating theory (“update”, “maintenance & manipulation” and “motor preparation”) could be attributed to the trajectory-specific activities in the memory delay of the T-maze task. (section “Activities of midbrain DA neurons in short-term memory” in line 417 of the revised manuscript).

      In 1B, please illustrate when the light pulses are on & off?

      Following the reviewer’s instruction, we added colored bars on top of the raster plots in Figure 1B, indicating the light induction conditions.

      In legend for 6C, please clarify it's a correlation between the difference in R and L choice activity across the epochs (if my understanding is correct).

      The reviewer’s understanding is correct. We took this advice into consideration to further clarify the methods of analysis that led to the plot in Figure 6C (in line 1246 in the revised manuscript).

    1. Author Response

      eLife assessment

      The important work by Aballay et al. significantly advances our understanding of how G protein-coupled receptors (GPCRs) regulate immunity and pathogen avoidance. The authors provide convincing evidence for the GPCR NPR-15 to mediate immunity by altering the activity of several key transcription factors. This work will be of broad interest to immunologists.

      The authors express their sincere appreciation to Timothy Behrens (Senior Editor), the Reviewing Editor, and the original reviewers for their considerate and favorable assessment of our manuscript.

      Reviewer #1 (Public Review):

      Summary:

      Otarigho et al. presented a convincing study revealing that in C. elegans, the neuropeptide Y receptor GPCR/NPR-15 mediates both molecular and behavioral immune responses to pathogen attack. Previously, three npr genes were found to be involved in worm defense. In this study, the authors screened mutants in the remaining npr genes against P. aeruginosa-mediated killing and found that npr-15 loss-of-function improved worm survival. npr-15 mutants also exhibited enhanced resistance to other pathogenic bacteria but displayed significantly reduced avoidance to S. aureus, independent of aerotaxis, pathogen intake and defecation. The enhanced resistance in npr-15 mutant worms was attributed to upregulation of immune and neuropeptide genes, many of which were controlled by the transcription factors ELT-2 and HLH-30. The authors found that NPR-15 regulates avoidance behavior via the TRPM gene, GON-2, which has a known role in modulating avoidance behavior through the intestine. The authors further showed that both NPR-15-dependent immune and behavioral responses to pathogen attack were mediated by the NPR-15-expressing neurons ASJ. Overall, the authors discovered that the NPR-15/ASJ neural circuit may regulate distinct defense mechanisms against pathogens under different circumstances. This study provides novel and useful information to researchers in the fields of neuroimmunology and C. elegans research.

      The authors are grateful for the thoughtful and insightful comments on our manuscript. Your feedback has been instrumental in refining our work, and we appreciate the time and expertise you have invested in evaluating our study.

      Strengths:

      1) This study uncovered specific molecules and neuronal cells that regulate both molecular immune defense and behavior defense against pathogen attack and indicate that the same neural circuit may regulate distinct defense mechanisms under different circumstances. This discovery is significant because it not only reveals regulatory mechanisms of different defense strategies but also suggests how C. elegans utilize its limited neural resources to accomplish complex regulatory tasks.

      The authors express gratitude to the reviewer for recognizing that the present study revealed specific molecules and neuronal cells involved in regulating both molecular immune defense and behavioral defense against pathogen attacks. Additionally, the acknowledgment that the same neural circuit may oversee distinct defense mechanisms under different circumstances is appreciated.

      2) The conclusions in this study are supported by solid evidence, which are often derived from multiple approaches and/or experiments. Multiple pathogenic bacteria were tested to examine the effect of NPR-15 loss-of-function on immunity; the impacts of pharyngeal pumping and defecation on bacterial accumulation were ruled out when evaluating defense; RNA-seq and qPCR were used to measure gene expression; gene inactivation was done in multiple strains to assess gene function.

      The authors thank the reviewer for appreciating that this study is supported by solid evidence.

      3) Gene differential expression, gene ontology, and pathway analyses were performed to demonstrate that NPR-15 controls immunity by regulating immune pathways.

      The authors thank the reviewer for appreciating the Gene differential expression, gene ontology, and pathway analyses performed in the study.

      4) Elegant approaches were employed to examine avoidance behavior (partial lawn, full lawn, and lawn occupancy) and the involvement of neurons in regulating immunity and avoidance (the use of a diverse array of mutant strains).

      The author thanks the reviewer for appreciating the approaches used in this study.

      5) Statistical analyses were appropriate and adequate.

      The authors thank the reviewer for appreciating the Statistical analyses used in this study.

      Reviewer #2 (Public Review):

      Summary:

      The authors are studying the behavioral response to pathogen exposure. They and others have previously describe the role that the G-protein coupled receptors in the nervous system plays in detecting pathogens, and initiating behavioral patterns (e.g. avoidance/learned avoidance) that minimize contact. The authors study this problem in C. elegans, which is amenable to genetic and cellular manipulations and allow the authors to define cellular and signaling mechanisms. This paper extends the original idea to now implicate signaling and transcriptional pathways within a particular neuron (ASJ) and the gut in mediating avoidance behaviour.

      Strengths:

      The work is rigorous and elegant and the data are convincing. The authors make superb use of mutant strains in C. elegans, as well tissue specific gene inactivation and expression and genetic methods of cell ablation. to demonstrate how a gene, NPR15 controls behavioral changes in pathogen infection. The results suggest that ASJ neurons and the gut mediate such effects. I expect the paper will constitute an important contribution to our understanding of how the nervous system coordinates immune and behavioral responses to infection.

      The authors sincerely thank the reviewer for the thoughtful and positive review of our manuscript. We greatly appreciate the time and effort you dedicated to evaluating our work, and we are pleased that you find our study to be a rigorous and elegant contribution to the understanding of behavioral responses to pathogen exposure.

      Reviewer #1 (Recommendations For The Authors):

      The authors have adequately addressed my concerns and questions. I have no more comments or recommendations for the authors.

      The authors thank the reviewer for the constructive comments on the manuscript

      Reviewer #2 (Recommendations For The Authors):

      The authors have adequately addressed my concerns.

      The authors express their appreciation to the reviewer for the valuable and constructive comments provided on the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript entitled 'Safb1 regulates cell fate determination in adult neural stem cells by enhancing Drosha cleavage of NFIB mRNA' by Iffländer et al, represents a solid piece of work addressing a non-canonical function of Drosha on NFIB mRNA processing via a newly identified Drosha partner, Safb1. The authors provide particularly systematic and convincing evidence on the biochemical interactions among the key players in this cascade. However, the significance of these interactions for NSC fate determination is not adequately supported by the data, hence, I have some remarks that would need to be addressed in order to clarify the impact of these events on NSC biology.

      1) One of my main concerns is related to the nature of the DG NSCs used in all in vitro assays. The authors refer to their previous work on how these cells are isolated using a Hes5 mouse reporter line. However, both recent scRNAseq data (http://linnarssonlab.org/dentate/ from Hochgerner et al) and the authors' own immunostainings (Fig. 7A), clearly show that Hes5 does not label only adult NSCs in the DG, but also (if not primarily) astrocytes. Considering that the initial cultures could contain a high proportion of mature astrocytes, most of the major conclusions and hypotheses should be reformulated.

      We thank the reviewer for their comment. We think that there is a misunderstanding about how the DG neural stem cells were isolated and cultured. In this manuscript we did not use the Hes5::GFP allele to isolate the stem cells. We isolated DG neural stem cells from C57Bl6 mice according to the protocol of Babu et al. (Babu et al. 2007 doi: 10.1371/journal.pone.0000388) and maintained and differentiated these according to our previous manuscripts (Ronaldo et al. 2016). This was not clear in the methods section of the original manuscript and, therefore, we have added the reference Babu et al. In order to address potential contamination with astrocytes, we have added images of the stem cells and their progeny immunostained with astrocytic markers (GFAP and S100b) in undifferentiated and differentiated states. These new data show that these neurogenic cells and their progeny do not express astrocytic markers until differentiation is induced.

      2) Along these lines, Safb1 expression is quite widespread in the mouse DG (Fig. 7A) and does not display any specificity towards any type of progenitor cells compared to its expression in DGCs within the GCL. The authors should discuss this and integrate this expression information into their conclusions and interpretations, highlighting all pertinent limitations.

      We appreciate and agree with the reviewer’s comment. SAFB1 is indeed broadly expressed by most if not all cells in the hippocampus. We quantified levels of SAFB1 expression across progenitors, astrocytes and neurons in the adult DG and in the SVZ, and show that SAFB1 levels differ across different neural stem cell populations and neural cells. We believe that our data show both in vitro and in vivo that the levels of SAFB1 are critical for determining the function of SAFB1 in regulating neural stem cell fate. We also showed that elevating SAFB1 levels in SVZ-derived neural stem cells suppresses their differentiation into oligodendrocytes, This we have made clearer in the text. However, how cells sense the levels of SAFB1 remains to be shown and it is difficult to speculate on the mechanism.

    1. Author Response

      Reviewer #1 (Public Review):

      In this analysis derived from the BLADE study, a Phase IV investigation using the LHRH antagonist Degarelix, the authors revealed additional insights into the relationship between FSH and body composition.

      The primary strength of the study lies in its prospective nature and the utilization of human subjects.

      We thank the reviewer for the positive evaluation.

      However, some weaknesses exist in the study.

      First, the authors presented results from a simple correlation study without accounting for potential confounding factors in fat metabolism. Particularly, readers may be intrigued to understand how testosterone or estradiol interact with FSH in relation to fat mass.

      As for the evaluation of circulating levels of testosterone and estradiol, unfortunately the protocol did not include the dosage for these hormones. The evaluation of testosterone, in particular, would have required mass photometry as the values of testosterone during therapy with degarelix are reduced below the sensitivity of the methods used in clinical practice. Therefore, the correlation/association analysis between testosterone and body composition would not have been reliable and would not have been useful for the study. All patients were considered to have hypogonadism due to the significant decrease in PSA values and the limited testosterone data available.

      The inverse relationship between ALBI/FBM was previously documented in a paper by the same group (Palumbo et al, Prostate Cancer Prostatic Dis 2021). In that earlier publication, the authors reported no correlation between FSH and lean mass or ALBI, suggesting the significance of the correlation between FSH and ALBI/FBM arising from changes in fat body mass-a factor somehow not included in the prior paper, not necessarily from sarcopenia.

      The referee is correct, as there is no correlation between lean mass and FSH, nor between lean mass variations and FSH variations. The correlation between ALMI/FBM and FSH is mostly due to the effect on fat mass. The text now includes a statement that emphasizes this concept (see Discussion page 8, lines 19-22).

      Reviewer #2 (Public Review):

      This manuscript reports the results of an ancillary study of a prospective trial assessing the effects of androgen deprivation therapy (ADT) with Dagarelix (a GnRH antagonist) on body composition in patients with prostate cancer. An interesting relationship between FSH levels, that were suppressed by Dagarelix treatment, and body composition parameters (particularly fat body mass) was described after 12 months of therapy. Therefore, the authors conclude that FSH could be a promising marker to monitor the risk of sarcopenic obesity and cardiovascular complications in prostate cancer patients undergoing ADT. As acknowledged by the Authors the main limitation of the study is the limited sample of patients. However, since testosterone levels were not assessed it is not possible to firmly establish whether the changes in fat mass observed with treatment are directly or indirectly associated with a reduction in FSH (and therefore in the latter case mediated by testosterone). Moreover, it is not clear whether the effect of the change in FSH levels during the study and the body composition parameters achieved at 12 months was evaluated (instead of assessing the relationship between FSH changes and changes in body composition parameters). Finally, tests on bone muscle mass and strength were not performed, so the hypothesis that variation of FSH levels in prostate cancer patients in ADT may affect sarcopenia remains speculative.

      We appreciate the reviewer's positive assessment of our manuscript. We evaluated the correlation between FSH changes and body composition values after 12 months of Degarelix, as requested by the reviewer. No significant correlation was observed, see the attached table. Therefore we have decided not to insert this last statistical analysis in the revised paper.

    1. Author Response

      Reviewer #1 (Public Review):

      Using a HFD mouse model, the authors examined the H3K4me3 mark in sperm and placental tissues followed by correlation to the transcriptomic changes in the placental tissues of the male and female offspring. The hypothesis that the authors tried to test was that sperm histone epimutations affect placental function, thereby leading to metabolic disorders in offspring. The strength of this work includes the interesting idea and the initial data generated. However, the entire study remains purely correlative without any validation experiment to support the correlation. The conclusion needs to be further supported by bigger sample size and more functional analyses demonstrating the causal relationship among the histone epimutations detected, the dysregulated mRNA expression in the placenta, and the phenotypes in offspring.

      Functional data: We appreciate that we should have emphasized and written more clearly that we had indeed phenotyped the placentas and offspring metabolic health from the same model we derived the placenta tissue from as we reported in (Jazwiec et al., 2022)(PMID: 35377412). This was referenced in our submitted manuscript (Lines 105-107; 131-133; 135-139; 147-150; 232-235; 270-273; 297-300; 384-386; 433-435; 441-448; 507-514). We have made this more apparent in the manuscript by expanding our description of the offspring phenotypes in the introduction and clarified that it was from this model that the placenta’s used in this study were derived from (Jazwiec et al., 2022) (PMID: 35377412).

      Regarding effect and sample size: It appears that on review the animal numbers used for the ChIP-seq were confused with the number of replicates by the reviewers. These details were in Supplementary file 1a. There were 3 replicates per experimental group and each replicate contained sperm from pooled samples that was equalized in cell number and comprised of sperm from n=7 control males, or n=16 HFD males. For the RNA-seq n=4 placentas were used from each experimental group from both males and females for a total N of 16. Although the sample size is moderate, we followed the Canadian Council of Animal Care guideline which calls for the use of the lowest animal number that elicits significant effects (CCAC guidelines p6 “Consideration must also be given to reduction, to determine the fewest number of animals appropriate to provide valid information and statistical power, while still minimizing the welfare impact for each animal”).

      Validation: We used a high standard of computational validation and visualization strategies, to ensure confidence in genomic data. This also allowed for a comprehensive understanding of the biological and physiological impacts of paternal obesity on the sperm epigenome and placenta transcriptome. In our experimental design we also included biological and technical replicates. Together these methods provide robustness checks of the experimental data and support our conclusions. These are the validation strategies we used:

      Technical and experimental validation

      • We evaluated the quality of sequencing data using metrics of read quality, alignment and coverage. These are summarized in Supplementary file 1a.

      • Visualized and performed statistical analysis of data to check for anomalies and discrepancies, Pearson correlation analysis shown on heatmap to look for variance and patterns in samples- all here highly correlated (Figure 2 – Figure supplement 1 B and Figure 4 – Figure supplement 1 A). We checked for batch effects and normalized the data (Figure 4 – Figure supplement 1 B) we used PCA plot analysis as a second check for sample behaving oddly (Figure 2 – Figure supplement 1 C and Figure 4 – Figure supplement 1 C).

      • We used a deconvolution approach to improve the biological meaning of our bulk RNA-seq data (Figure 6, Figure 5 – Figure supplement 1 and 2).

      • Performed functional enrichment analysis to gain insight into biological functions, pathways, and genome ontology and visualized individual regions identified to be altered as a confirmation (Figure 2 D and 2 E; Figure 4 E and F; Figure 6, Figure 2 – Figure supplement 1 E; Figure 3 – Figure supplement 1). Comparison to external data sets:

      • We compared our data with external data sets using the same tissues and cell and to our prior studies: a) We compared ChIP-seq data from this obesity model with our former obesity ChIP-seq data (Figure 2 – Figure supplement 1); b) re-analyzed and compared placenta RNA-seq data from an in utero exposure hypoxia model that shared similar offspring and placenta phenotypes as we observed in the obesity model (Figure 6 and Figure 6 – Figure supplement 1).

      • We used a deconvolution approach to improve the biological meaning of our bulk RNA-seq data (Figure 6, Figure 5 – Figure supplement 1 and 2). Statistical Significance and False Discovery Rate (FDR):

      • We applied statistical tests and multiple testing corrections to reduce the likelihood of false positives (See also response 1 for additional testing added to the revised manuscript)

      Causation versus correlation: We agree that the relationship between the sperm epigenome and placenta transcriptome is correlative, however this is the current state of the field for studies of paternal epigenetic transmission of environmental information. To take this study to the point where causation can be implied would require the generation of a sperm epigenome edited mouse model where we target genes implicated in placental function. Indeed, this targeting approach is well underway in our research program.

      Reviewer #2 (Public Review):

      This study follows up on previous work from this group, and others, relating paternal diet to changes in sperm epigenetics, and offspring phenotypes. The authors focus on paternal diet (high-fat diet versus a control chow), sperm chromatin, and molecular changes in the placenta associated with offspring development.

      The text is well written and the figures are generally well presented and clear. The sperm epigenetic analyses and analysis of the placenta epigenetics and gene expression are generally well performed. The study provides new insight into how paternally mediated intergenerational epigenetic inheritance could involve placenta-embryo signaling.

      A major weakness is that the high-fat diet used was from a different manufacturer than the control (lower fat) diet. Therefore, it is difficult to judge whether the effects are due to a change in fat levels, or the many other molecules that are likely to differ in chow between different manufacturers. Other weaknesses include lack of methodological detail in parts, low n values for some experiments, and the need for more mechanistic data.

      Diets: It is worth reminding that we are studying the effects of obesity and not diet. Indeed, HFD induces metabolic dysfunction while the control does not. Although it is fair to point out that the composition of the control diet should be kept in mind, considering the desired outcomes within the scope of the study, the diets elicited the desired phenotypic effects serving as a model for obesity. We see this experimental design as a strength, as in this study we compared this model to our previous published obesity model (Pepin, Lafleur, Lambrot, Dumeaux, & Kimmins, 2022) (PMID: 35183795), and there was significant overlap in the regions of differential enrichment detected between both models even though they were conducted in different research settings, with different mouse substrain and different diet combinations. In our opinion this demonstrates that we are measuring robust effects of paternal obesity that can be replicated under different conditions. This comparative study design has been lacking in the field of epigenetic inheritance.

      Animal numbers and replicates: It appears that on review the animal numbers used for the ChIP-seq were confused with the number of replicates by the reviewers. These details were in Supplementary file 1a. There were 3 replicates per experimental group and each replicate contained sperm from pooled samples that was equalized in cell number and comprised of sperm from n=7 control males, or n=16 HFD males. For the RNA-seq n=4 placentas were used from each experimental group from both males and females for a total N of 16. Although the sample size is moderate, we followed the Canadian Council of Animal Care guideline which calls for the use of the lowest animal number that elicits significant effects (CCAC guidelines p6 “Consideration must also be given to reduction, to determine the fewest number of animals appropriate to provide valid information and statistical power, while still minimizing the welfare impact for each animal”).

      Whilst the authors may have achieved their aims, more data is needed to inform a potential mechanism.

      It is difficult in studies on paternal epigenetic inheritance to attribute a mechanism and we agree that the relationship between the obesity altered sperm epigenome and the placenta abnormalities are correlative. However, the novelty in our study is that we postulate a new mechanism for paternal transmission of metabolic disease that implicates the placenta and demonstrate this via an altered placenta transcriptome and placenta developmental abnormalities described here and in our previous paper on this model ((Jazwiec et al., 2022); PMID: 35377412). The next steps for the field to address causation/mechanism requires generation of a sperm epigenome edited mouse model where we induce and track histone methylation changes at specific genes to the tissues in the next generation. Indeed, this targeting approach is underway in our research program.

      Reviewer #3 (Public Review):

      This study represents a useful addition to the authors' previous study examining the effects of paternal high-fat diet on offspring metabolism and gene expression in offspring (PMID: 35183795). It differs from the previous study in some of the details of the experimental model (age of sire when exposed to the diet manipulation, mouse substrain, and the nature of the control diet) and the results are largely in line with previous findings. The major finding is that many genes at which sperm H3K4me3 signal is altered also have altered expression in the placenta; some of these genes are paternally imprinted, providing a paternal-specific epigenetic signature. Strengths of the study include establishment of an important dataset correlating the sperm epigenome with gene expression in placental tissue, leading to an interesting and provocative conclusion. Weaknesses include a relatively superficial analysis of the dataset, revealing broad patterns but few specific conclusions, reliance on correlative analysis to draw conclusions, and absence of validation studies. Deconvolution analysis of bulk RNA-seq data helps to account for differences in cell composition between placental datasets, but does not add additional insight toward the central question of how sperm epigenetic state contributes to offspring gene expression. Overall the advance over previous work is relatively small.

      Specific points:

      1) The analysis as it stands is limited. To compare sperm H3K4me3 and placental expression, numbers of overlapping genes are provided, but no statistical analysis is done to indicate the significance of the overlap.

      Fisher’s exact test to overlap paternal obesity-associated differentially enriched regions of H3K4me3 deH3K4me3) with female and male placenta differentially enriched genes (Figure 4 – Figure supplement 1 Di and ii).

      2) There is little direct connection to biological systems or validation of differential enrichment/expression analysis. Gene ontology enrichments for genes differentially enriched for H3K4me3 in sperm or differentially expressed in placenta (broken up by sex) are performed, but the biological significance of these categories is not clear.

      We used a high standard of computational validation and visualization strategies, to ensure confidence in genomic data. This also allowed for a comprehensive understanding of the biological and physiological impacts of paternal obesity on the sperm epigenome and placenta transcriptome. In our experimental design we also included biological and technical replicates. Together these methods provide robustness checks of the experimental data and support our conclusions. The validation strategies we used are detailed in response 17.

      We revised the text to expand discussion on the observed enriched gene ontology terms, as well as the biological significance and functions of the genes we refer to in this section:

      Lines 222-227: “The placenta is a rich source of hormone production, is highly vascularized, and secretes neurotransmitters (Hemberger, Hanna, & Dean, 2020; Rosenfeld, 2021). Disruption in these functions is suggested in the significantly enriched pathways that included genes involved in the transport of cholesterol, angiogenesis, and neurogenesis (Figure 4 C-D, Supplementary file 1e-f). Other significantly enriched processes included genes implicated in nutrient and vitamin transport (Figure 4 C-D).”

      Lines 441-463:“Many of the DEGs in the paternal obese-sired placentas were involved in the regulation of the heart and brain. This is in line with paternal obesity associated to the developmental origins of neurological, cardiovascular, and metabolic disease in offspring (Andescavage & Limperopoulos, 2021; Binder, Beard, et al., 2015; Binder et al., 2012; Chambers et al., 2016; Cropley et al., 2016; de Castro Barbosa et al., 2016b; T. Fullston et al., 2012; Tod Fullston et al., 2013; Grandjean et al., 2015; Huypens et al., 2016; Jazwiec et al., 2022; Mitchell, Bakos, & Lane, 2011; Ng et al., 2010; Pepin et al., 2022; Perez-Garcia et al., 2018; Terashima et al., 2015; Thornburg et al., 2016; Thornburg & Marshall, 2015; Ueda et al., 2022; Wei et al., 2014). The brain-placenta and heart-placenta axes refer to their developmental linkage to the trophoblast which produces various hormones, neurotransmitters, and growth factors that are central to brain and heart development (Parrettini, Caroli, & Torlone, 2020; Rosenfeld, 2021). This is further illustrated in studies where placental pathology is linked to cardiovascular and heart abnormalities (Andescavage & Limperopoulos, 2021; Thornburg et al., 2016; Thornburg & Marshall, 2015). For example, in a study of the relationship between placental pathology and neurodevelopment of infants, possible hypoxic conditions were a significant predictor of lower Mullen Scales of Early Learning (Ueda et al., 2022). A connecting factor between the neural and cardiovascular phenotypes is the neural crest cells which make a critical contribution to the developing heart and brain (Hemberger et al., 2020; Perez-Garcia et al., 2018). Notably, neural crest cells are of ectodermal origin which arises from the TE (Prasad, Charney, & García-Castro, 2019), which is in turn governed by paternally-driven gene expression. It is worth considering the routes by which TE dysfunction may be implicated in the paternal origins of metabolic and cardiovascular disease. First, altered placenta gene expression beginning in the TE could influence the specification of neural crest cells which are a developmental adjacent cell lineage in the early embryo. TE signaling to neural crest cells could alter their downstream function. Second, altered trophoblast endocrine function will influence cardiac and neurodevelopment (Hemberger et al., 2020).”

      3) The overall effect size is small. In most cases the magnitude of differences is minor, and it is not clear which of these changes are significant over noise. For example, the y-axis for the metagene plots in Figure 2B does not start at zero, so the total range of the difference in H3K4me3 is small. In Figure 6C, DEGs detected in hypoxic placenta after deconvolution analysis do not look very different compared to control.

      Thank-you for pointing out that the scales were different in Figure 2 Bi and ii. They have been revised to show the same Y axis scale beginning at zero for comparison of regions that gained and lost H3K4me3 making the differences in H3K4me3 more readily visible. The heatmap shown in Figure 6 C visualizes the DEGs in hypoxic vs control placenta where 1477 DEGS were identified in our re-analysis using a convolution approach applied to the bulk-seq data set from Chu et al., 2019. We do not share the view that they are not well visualized in the heat map.

      4) Deconvolution analysis was done on bulk RNA-seq data from placenta, and the numbers of DEGs identified with this analysis compared to the original analysis are shown, but is not clear how the deconvolution analysis changes the specific biological conclusions. In addition, the reference dataset for deconvolution is a published dataset generated in another lab, and it is unclear how comparable the reference sample is to the samples analyzed in this study, or how robust this analysis is when using a dataset generated under different conditions.

      The deconvolution analysis allows to infer cellular composition within a tissue and suggests that there are changes in cell-type proportion that could change placenta function and improves the detection of differentially expressed genes (Aliee & Theis, 2021; Campbell et al., 2023; Kuhn, Thu, Waldvogel, Faull, & Luthi-Carter, 2011) (PMID: 34293324; 36914823; 21983921).

      As per the published dataset used as a reference sample for the deconvolution analysis, it was ideal -we specifically chose this dataset for this analysis as the tissue of origin matched for the same mouse strain and developmental type points as our samples and those used in the Chu et al., 2019 analysis. We used the Chu et al., 2019 data set for comparative validation, and to further explore whether the biological effects of paternal obesity were like those of a hypoxic placenta. We have revised the text to more clearly show the biological relevance and interpretation of this analysis (see author response 12)

      We revised the text to clarify the biological implications of this analysis:

      Lines 282-290: “This reduction in the number of detected DEGs before versus after accounting for cellular composition suggests that changes in cell-type proportions at least partly drive tissue-level differential expression. This is consistent with the recent finding that preeclampsia-associated cellular heterogeneity in human placentas mediates previously detected bulk gene expression differences (Campbell et al., 2023). There were similarities between the bulk RNA-seq and deconvoluted analysis in that there was overlap of DEGs detected before and after adjusting for cell-type proportions (Figure 5 – Figure supplement 3 G and H, Fisher’s exact test P=1.8e-105 and P=0e+00, respectively). This differential gene expression analysis accounting for cellular composition provides insight into how paternal obesity may impact placental development and function and underscores the contribution of cellular heterogeneity in this process.”

      Reviewer #4 (Public Review):

      The members of the Kimmins lab perform a dietary study in mice to investigate the impact of obesity of fathers on the development of their offspring. To do so, they expose male mice to a high fat diet and determine the distribution and occupancy levels of the histone H3 lysine 4 trimethylation (H3K4me3) mark in spermatozoa and perform gene expression studies on placenta tissue obtained from mouse embryos during mid-gestation development. The authors report changes in H3K4me3 occupancy in sperm as well as in transcriptomes of placentas of male and female embryonic offspring. While the authors perform extensive computational analysis of the transcriptomic and chromatin immunoprecipitation data, the authors do not go much beyond making correlative statements at mainly the genome wide level between changes for H3K4me3 in sperm and transcriptional changes in placenta, the latter of which are in part related to changes in cellular composition (as deduced from transcriptional data). Given that both parental mice had the same genetic background, it was not possible to deduce parental specific contributions to transcriptional changes as observed in placentas of offspring. In all, the study falls short in increasing mechanistic insights into this important biological phenomenon.

      It is difficult in studies on paternal epigenetic inheritance to attribute a mechanism and we agree that the relationship between the obesity altered sperm epigenome and the placenta abnormalities are correlative. However, the novelty in our study is that we postulate a new mechanism for paternal transmission of metabolic disease that implicates the placenta and demonstrate this via an altered placenta transcriptome and placenta developmental abnormalities described here and in our previous paper on this model ((Jazwiec et al., 2022); PMID: 35377412). The next steps for the field to address causation/mechanism requires generation of a sperm epigenome edited mouse model where we induce and track histone methylation changes at specific genes to the tissues in the next generation. Indeed, this targeting approach is underway in our research program.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Benner et al. identify OVO as a transcriptional factor instrumental in promoting the expression of hundreds of genes essential for female germline identity and early embryo development. Prior data had identified both ovo and otu as genes activated by OVO binding to the promoters. By combining ChIP-seq, RNA-seq, and analysis of prior datasets, the authors extend these data to hundreds of genes and therefore propose that OVO is a master transcriptional regulator of oocyte development. They further speculate that OVO may function to promote chromatin accessibility to facilitate germline gene expression. Overall, the data compellingly demonstrate a much broader role for OVO in the activation of genes in the female germline than previously recognized. By contrast, the relationship between OVO, chromatin accessibility, and the timing of gene expression is only correlative, and more work will be needed to determine the mechanisms by which OVO promotes transcription.

      We fully agree with this summary.

      Strengths:

      Here Benner et al. convincingly show that OVO is a transcriptional activator that promotes expression of hundreds of genes in the female germline. The ChIP-seq and RNA-seq data included in the manuscript are robust and the analysis is compelling.

      Importantly, the set of genes identified is essential for maternal processes, including egg production and patterning of the early embryo. Together, these data identify OVO as a major transcriptional activator of the numerous genes expressed in the female germline, deposited into the oocyte and required for early gene expression. This is an important finding as this is an essential process for development and prior to this study, the major drivers of this gene expression program were unknown.

      We are delighted that this aspect of the work came across clearly. Understanding the regulation of maternal effect genes has been something of a black-box, despite the importance of this class of genes in the history of developmental genetics. The repertoire of essential oogenesis/embryonic development genes that are bound by and respond to OVO are well characterized in the literature, but nothing is known about how they are transcriptionally regulated. We feel the manuscript will be of great interest to readers working on these genes.

      Weaknesses:

      The novelty of the manuscript is somewhat limited as the authors show that, like two prior, well-studied OVO target genes, OVO binds to promoters of germline genes and activates transcription. The fact that OVO performs this function more broadly is not particularly surprising.

      Clearly, transcription factors regulate more than one or two genes. Never-the-less we were surprised at how many of the aspects of oogenesis per se and maternal effect genes were OVO targets. It was our hypothesis that OVO would have a transcriptional effect genome-wide, however, it was less clear whether OVO would always bind at the core promoter, as is with the case of ovo and otu. Our results strongly support the idea that core promoter proximal binding is essential for OVO function; a conclusion of work done decades ago, which has not been revisited using modern techniques.

      A major challenge to understanding the impact of this manuscript is the fact that the experimental system for the RNA-seq, the tagged constructs, and the expression analysis that provides the rationale for the proposed pioneering function of OVO are all included in a separate manuscript.

      This is a case where we ended up with a very, very long manuscript which included a lot of revisiting of legacy data. It was a tough decision on how to break up all the work we had completed on ovo to date. In our opinion, it was too much to put everything into a single manuscript unless we wanted a manuscript length supplement (we were also worried that supplemental data is often overlooked and sometimes poorly reviewed). We therefore decided to split the work into a developmental localization/characterization paper and a functional genomics paper. As it stands both papers are long. Certainly, readers of this manuscript will benefit from reading our previous OVO paper, which we submitted before this one. The earlier manuscript is under revision at another journal and we hope that this improved manuscript will be published and accessible shortly.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Benner et al. interrogate the transcriptional regulator OVO to identify its targets in the Drosophila germline. The authors perform ChIP-seq in the adult ovary and identify established as well as novel OVO binding motifs in potential transcriptional targets of OVO. Through additional bioinformatic analysis of existing ATAC-seq, CAGE-seq, and histone methylation data, the authors confirm previous reports that OVO is enriched at transcription start sites and suggest that OVO does not act as part of the core RNA polymerase complex. Benner et al. then perform bulk RNA-seq in OVO mutant and "wildtype" (GAL4 mediated expression of OVO under the control of the ovo promoter in OVO mutants) ovaries to identify genes that are differentially expressed in the presence of OVO. This analysis supports previous reports that OVO likely acts at transcription start sites as a transcriptional activator. While the authors propose that OVO activates the expression of genes that are important for egg integrity, maturation, and for embryonic development (nanos, gcl, pgc, bicoid), this hypothesis is based on correlation and is not supported by in vivo analysis of the respective OVO binding sites in some of the key genes. A temporal resolution for OVO's role during germline development and egg chamber maturation in the ovary is also missing. Together, this manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis but lacks important in vivo experimental evidence that would validate the high-quality datasets.

      We thank reviewer 2 for the appreciation of the genomics data and analysis. Some of the suggested in vivo experiments are clear next steps, which are well underway. These are beyond the scope of the current manuscript.

      Temporal analysis of ovo function in egg chamber development is not easy, as only the weakest ovo alleles have any egg chambers to examine. However, we will also point out the long-known phenotypes of some of those weak alleles in the text (e.g. ventralized chambers in ovoD3/+). We will need better tools for precise rescue/degradation during egg chamber maturation.

      Strengths:

      The manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis

      Thank you. We went to great lengths to do our highly replicated experiments in multiple ways (e.g. independent pull-down tags) and spent considerable time coming up with an optimized and robust informatic analysis.

      Weaknesses:

      1) The authors propose that OVO acts as a positive regulator of essential germline genes, such as those necessary for egg integrity/maturation and embryonic/germline development. Much of this hypothesis is based on GO term analysis (and supported by the authors' ChIP-seq data). However accurate interpretation of GO term enrichment is highly dependent on using the correct background gene set. What control gene set did the authors use to perform GO term analysis (the information was not in the materials and methods)? If a background gene set was not previously specified, it is essential to perform the analysis with the appropriate background gene set. For this analysis, the total set of genes that were identified in the authors' RNA-seq of OVO-positive ovaries would be an ideal control gene set for which to perform GO term analysis. Alternatively, the total set of genes identified in previous scRNA-seq analysis of ovaries (see Rust et al., 2020, Slaidina et al., 2021 among others) would also be an appropriate control gene set for which to perform GO term analysis. If indeed GO term analysis of the genes bound by OVO compared to all genes expressed in the ovary still produces an enrichment of genes essential for embryonic development and egg integrity, then this hypothesis can be considered.

      We feel that this work on OVO as a positive regulator of genes like bcd, osk, nos, png, gnu, plu, etc., is closer to a demonstration than a proposition. These are textbook examples of genes required for egg and early embryonic development. Hopefully, this is not lost on the readers by an over-reliance on GO term analysis, which is required but not always useful in genome-wide studies.

      We used GO term enrichment analysis as a tool to help focus the story on some major pathways that OVO is regulating. To the specific criticism of the reference gene-set, GO term enrichment analysis in this work is robust to gene background set. We will update the GO term enrichment analysis text to indicate this fact and add a table using expressed genes in our RNA-seq dataset to the manuscript and clarify gene set robustness in greater detail in the methods of the revision. We will also try to focus the reader’s attention on the actual target genes rather than the GO terms in the revised text.

      2) The authors provide important bioinformatic analysis of new and existing datasets that suggest OVO binds to specific motifs in the promoter regions of certain germline genes. While the bioinformatic analysis of these data is thorough and appropriate, the authors do not perform any in vivo validation of these datasets to support their hypotheses. The authors should choose a few important potential OVO targets based on their analysis, such as gcl, nanos, or bicoid (as these genes have well-studied phenotypes in embryogenesis), and perform functional analysis of the OVO binding site in their promoter regions. This may include creating CRISPR lines that do not contain the OVO binding site in the target gene promoter, or reporter lines with and without the OVO binding site, to test if OVO binding is essential for the transcription/function of the candidate genes.

      Exploring mechanism using in vivo phenotypic assays is awesome, so this is a very good suggestion. But, it is not essential for this work -- as has been pointed out in the reviews, in vivo validation of OVO binding sites has been comprehensively done for two target genes, ovo and otu. The “rules” appear similar for both genes. That said, we are already following up specific OVO target genes and the detailed mechanism of OVO function at the core promoter. We removed some of our preliminary in vivo figures from the already long current manuscript. We continue to work on OVO and expect to include this type of analysis in a new manuscript.

      3) The authors perform de novo motif analysis to identify novel OVO binding motifs in their ChIP-seq dataset. Motif analysis can be significantly strengthened by comparing DNA sequences within peaks, to sequences that are just outside of peak regions, thereby generating motifs that are specific to peak regions compared to other regions of the promoter/genome. For example, taking the 200 nt sequence on either side of an OVO peak could be used as a negative control sequence set. What control sequence set did the authors use as for their de novo motif analysis? More detail on this is necessary in the materials and methods section. Re-analysis with an appropriate negative control sequence set is suggested if not previously performed.

      We apologize for being unclear on negative sequence controls in the methods. We used shuffled OVO ChIP-seq peak sequences as the background for the de novo motif analysis, which we will better outline in the methods of the revision. This is a superior background set of sequences as it exactly balances GC content in the query and background sequences. We are not fond of the idea of using adjacent DNA that won’t be controlled for GC content and shadow motifs. Furthermore, the de novo OVO DNA binding motifs are clear, statistically significant variants of the characterized in vitro OVO DNA binding motifs previously identified (Lu et al., 1998; Lee and Garfinkel, 2000; Bielinska et al., 2005), which lends considerable confidence. We also show that the OVO ChIP-seq read density are highly enriched for all our identified motifs, as well as the in vitro motifs. We provide multiple lines of evidence, through multiple methods, that the core OVO DNA binding motif is 5’-TAACNGT-3’. We have high confidence in the motif data.

      4) The authors mention that OVO binding (based on their ChIP-seq data) is highly associated with increased gene expression (lines 433-434). How many of the 3,094 peaks (conservative OVO binding sites), and what percentage of those peaks, are associated with a significant increase in gene expression from the RNA-seq data? How many are associated with a decrease in gene expression? This information should be added to the results section.

      Not including the numbers of the overlapping ChIP peaks and expression changes in the text was an oversight on our part. The numbers that relate to this (666 peaks overlapping genes that significantly increased in expression, significant enrichment according to Fishers exact test, 564 peaks overlapping genes that significantly decreased in expression, significant depletion according to Fishers exact test) are found in figure 4C and will be added to the text.

      5) The authors mention that a change in endogenous OVO expression cannot be determined from the RNA-seq data due to the expression of the OVO-B cDNA rescue construct. Can the authors see a change in endogenous OVO expression based on the presence/absence of OVO introns in their RNA-seq dataset? While intronic sequences are relatively rare in RNA-seq, even a 0.1% capture rate of intronic sequence is likely to be enough to determine the change in endogenous OVO expression in the rescue construct compared to the OVO null.

      This is a good point. The GAL4 transcript is downstream of ovo expression in the hypomorphic ovoovo-GAL4 allele. We state in the text that there is a nonsignificant increase in GAL4 expression with ectopic rescue OVO, although the trend is positive. We calculated the RPKM of RNA-seq reads mapping to the intron spanning exon 3 and exon 4 in ovo-RA and found that there is also a nonsignificant increase in intronic RPKM with ectopic rescue OVO (we will add to the results in the revision). We would expect OVO to be autoregulatory and potentially increase the expression of GAL4 and/or intronic reads, but the ovoovo-GAL4>UASp-OVOB is not directly autoregulatory like the endogenous locus. It is not clear to us how the intervening GAL4 activity would affect OVOB activity in the artificial circuit. Dampening? Feed-forward? Is there an effect on OVOA activity? Regardless, this result does not change our interpretation of the other OVO target genes.

      6) The authors conclude with a model of how OVO may participate in the activation of transcription in embryonic pole cells. However, the authors did not carry out any experiments with pole cells that would support/test such a model. It may be more useful to end with a model that describes OVO's role in oogenesis, which is the experimental focus of the manuscript.

      We did not complete any experiments in embryonic pole cells in this manuscript and base our discussion on the potential dynamics of OVO transcriptional control and our previous work showing maternal and zygotic OVO protein localization in the developing embryonic germline. Obviously, we are highly interested in this question and continue to work on the role of maternal OVO. We agree that we are extended too far and will remove the embryonic germ cell model in the figure. We will instead focus on the possible mechanisms of OVO gene regulation in light of the evidence we have shown in the adult ovary, as suggested.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      This paper now provides a convincing presentation of valuable results of the drivers of nest construction for one termite species, and they briefly discuss possible relevance to other termite species. However, the authors have not yet addressed how their results may be important outside the field of termite nest construction. I could imagine the significance of the paper being elevated to important if there is a broader discussion about the impact of this work, e.g., the relevance of the results, the approach, and/or next steps to related fields outside of termite nest construction.

      Reading our manuscript again, we have to agree with the reviewer that we mostly focused the discussion of our results in the context of termite construction, without attempting to generalise to other systems. To some extent we still defend this choice, as we prefer not to make too many claims on the relevance of our results beyond what we can reasonably support with our own experimental results. However, we thought that it would be appropriate – as suggested by the reviewer – to add at least one paragraph to indicate how our results could be extrapolated to other systems. This new paragraph is now at the end of the discussion section.

      Here we elaborate a bit further on this point: first of all, while termites certainly build the most complex structures found in the natural world, there aren’t many other animals that are capable of collectively building complex structures. Typically, collective building activity is limited to highly social (typically eusocial) animals, but other social insects, such as ants and wasps, are phylogenetically distant from termites, their nests are often different (the large majority of ant nests only comprise excavated galleries with little construction, while wasp nests tend to comprise multiple repeated patterns that could be produced from stereotyped individual behaviour). Because of these differences, drawing a comparison between the mechanisms that regulate termite architecture and those that regulate other forms of animal architecture would be too speculative. One domain, however, where similar mechanisms to those that we describe here could operate is that of pattern formation at the cellular and tissue level, where surface curvature was shown to drive different phenomena from cell migration to tissue growth. A comment on this is now added in the manuscript at the very end of the discussion.

      Similarly, on a related note, as someone not directly in the field of termite nest construction but wanting to understand the system (and the results) presented here in a broader context, I found the additional information about species and natural habitat very helpful and interesting, though I was rather disappointed to find it relegated to supplementary material where most readers will not see it.

      We considered this suggestion to present more information about the natural nesting habits of the termites that we study into the main text, but eventually we decided to leave it as supplementary only. We feel that the nesting habits of the termites that we studied here are not too central to the problem that we want to focus on, of how they coordinate their building activity. In fact, there is a large variety of nesting habits across termite genera and species, but we believe that, at a basic level, the mechanisms that we describe here would also apply to species with different nesting habits, because our results are consistent with what is described in the scientific literature for other termite species. As our introduction is already a bit long, we left this description of Coptotermes nesting habits in the supplementary material, where, hopefully, it will still be accessible and useful to readers interested in finding this information.

      When providing responses to reviewers, please directly address the reviewers’ comments point-by-point rather than summarizing comments and responding to summaries.

      We apologize for our previous way to respond to comments and thanks the reviewer for his remark as we learn to navigate through the eLife reviewing system (where some comments are repeated in the overall assessment and in the feed-back of individual reviewers).

      Figure 2 colors: Panels A and E and maybe B do not seem colorblind-friendly. I suggest modifying the colormaps to address this.

      We have changed the colormaps of figures A,B and E which are now colorblind-friendly.

      Line 180: This system is not in equilibrium. Perhaps the authors mean "steady-state?" I suggest reviewing language to ensure that the correct technical terms are used.

      We have now corrected this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This valuable study, of interest for students of the biology of genomes, uses simulations in combination with published data to examine how many TADs remain after cohesin depletion. The authors suggest that a significant subset of chromosome conformations do not require cohesin, and that knowledge of specific epigenetic states can be used to identify regions of the genome that still interact in the absence of cohesin. The theoretical approaches and quantitative analysis are state-of-the-art, and the data quality and strength of the conclusions are solid. However, because "physical boundaries (of domains?)" in the model appear to be a consequence of preserved TADs, rather than the other way around, the functional insights are limited.

      Summary of the reviewer discussion for the authors:

      While the simulations are state of the art and the reviewers appreciated that the approaches used here might help to resolve apparent discrepancies between conclusions from single-cell and bulk/ensemble techniques to study chromosome conformation, the work would benefit from clarification of what precisely is meant with "physical boundaries" and from a comparison of CCM and HIPPS models to understand commonalities and differences between them. In addition, more discussion of the relation of the current work to previous studies, such as Schwarzer et al., 2017, and Nuebler et al., 2018, would elevate the work and make the key claims more compelling. Please see also the detailed comments from the expert reviewers.

      We thank the editor for the assessment and the reviewers for the incisive comments. We will address these comments one by one. In particular, we attempt to clarify the concept of “physical boundaries” and its relevance in our study. We hope our responses are satisfactory. We believe that our manuscript has benefitted substantially by revising the manuscript following the comments by the reviewers.

      Below is our point-by-point response to the comments:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Jeong et al. investigate the prevalence and cause of TADs that are preserved in eukaryotic cells after cohesin depletion. The authors perform an extensive analysis of previously published Hi-C data, and find that roughly 15% of TADs are preserved in both mouse liver cells and in HCT-116 cells. They confirm previous findings that epigenetic mismatches across the boundaries of TADs can cause TAD preservation. However, the authors also find that not all preserved TADs can be explained this way. Jeong et al. provide an argument based on polymer simulations that "physical boundaries" in 3D structures provide an additional mechanism that can lead to TAD preservation. However, in its current form, we do not find the argumentation and evidence that leads to this claim to be fully compelling.

      Strengths:

      We appreciate the extensive statistical analysis performed by the authors on the extent to which TAD's are preserved; this does seem like a novel and valuable contribution to the field.

      We thank the reviewer for a succinct and comprehensive summary of our work and for appreciating value of our work.

      Weaknesses:

      1) As the authors briefly note, the fact that compartmentalization due to epigenetic mismatches can cause TAD-like structures upon cohesin depletion has already been discussed in the literature; see for example Extended Data Figure 8 in (Schwarzer et al., 2017) or the simulation study (Nuebler et al., 2018). We are hence left with the impression that the novelty of this finding is somewhat overstated in this manuscript.

      It is unclear to us by studying the results in the Extended Data Figure 8 that the authors have shown that epigenetic mismatches cause TAD-like structures. As far as we can discern, the data, without a quantitative analysis, shows that may be new TAD-like structures that are not in the wild type appear when cohesin is deleted.

      The studies by Schwarzer et al 2017 and Nuebler et al 2018 are relevant to our own investigation, which we undertook after scrutinizing the experiments in Schwarzer et al 2017 and the related work by Rao et. al in 2017 on a different cell line. In the summary of the Reviewer discussion, it is suggested we discuss the relation to the experimental study by Schwarzer et al 2017 and the computational work by Nuebler et al 2018.

      (1) The results and the corresponding discussion in these two studies are different (may be complimentary) from our results. When referring to the Extended Data Figure 8 Schwarzer and co-authors state in the main text, “The finer compartmentalization explains most of the remaining or new domains and boundaries seen in Nipbl Hi-C maps”. We are not 100% sure what “remaining” means in this context. The Extended Data Fig. 8(a) shows the “common boundaries” is correlated with the eigenvectors of compartmentalization. If this indeed is what the reviewer is referring to, we believe that our study differs from theirs in two important ways: First, Extended Data Fig.8 (a) is a statistical analysis at the “ensemble” level. In our study, we examined the preservation of TADs at both individual and ensemble level with more detailed analysis. Second, in Extended Data Fig. 8(a), the “common boundaries” (incidentally we are uncertain how that was calculated) are compared to the eigenvectors of PCA analysis of the compartments (larger length scales). In contrast, in our study, we examined the correlation between TAD boundaries and the epigenetic profiles. We believe that this is an important distinction. The PCA analysis of compartments and “common boundaries” defined using (presumably) the insulation score are both derived from the Hi-C contact map. Epigenetic profile, on the other hand, is independent of Hi-C data. We believe our contribution, is to build the connection between epigenetic profiles with the preservation of TADs, and link it to 3D structures. For these reasons, we assert that our results are novel, and are not contained (or even implied) in the Schwarzer et al 2017 study.

      The simulations in Neubler et al 2018, which were undertaken to rationalize the experimenrs, revealed that compartmentalization of small segments is enhanced after cohesin depletion, while TADs disappear, which support the broad claims that are made in the experiments. They assert that the structures generated are non-equilibrium. They do not address the emergence of preserved nor the observation of TAD-like structures at the single cell level. However, our goal was to elucidate the reasons for of preservation of TADs. By that we mean, the reasons why certain TADs are present in both the wild and cohesin depleted cells? Through a detailed analyses of two cells, polymer simulations, we have provided a structural basis to answer the question. Finally, we have provided a plausible between TAD preservation and maintenance of enhancer-promoter interactions by analyzing the Micro-C data. For all these reasons, we believe that our study is different from the results in the Extended Figure 8 or the simulations described by Neubler.

      Let us summarize the new results in our study that are not contained in the studies referred to by this Reviewer. (1) We showed by analyzing the Hi-C data for mouse liver and HCT-16 that a non-negligible fraction of TAPs is preserved, which set in motion our detailed investigation. (2) Then, using polymer simulations on a different cell type, we generated quantitative insights (epigenetic mismatches as well as structural basis) for the preservation of TADs. Although not emphasized, we showed that deletion of cohesin in the GM12878 cells also give rise to P-TADs a prediction that suggests that the observations might be “universal”. (3) Rather than perform, time consuming polymer simulations, we calculated 3D structures directly from Hi-C data for the mouse liver and HCT-16 cells, which provided a structural basis for TAP preservation. (4) The 3D structures also showed how TAD-like features appear at the single cell level, which is in accord with imaging experiments. (5) Finally, we suggest that P-TADs may be linked to the maintenance of enhancer-promoter and promoter-promoter interactions by calculating the 3D structures using the recent Micro-C data.

      For the reasons given above, we assert that our results are novel, and bring new perspectives that are not in the aforementioned insightful studies cited by the Reviewer.

      2) It is not quite clear what the authors conceptually mean by "physical boundaries" and how this could offer additional insight into preserved TADs. First, the authors use the CCM model to show that TAD boundaries correlate with peaks in the single cell boundary probability distribution of the model. This finding is consistent with previous reports that TAD-like structures are present in single cells, and that specific TAD boundaries only arise as a population average.

      The finding based on the CCM simulations hence seems to be that preserved TADs also arise as a population average in cohesin-depleted cells, but we do not follow what the term "physical boundaries" refers to in this context. The authors then use the Hi-C data to infer a maximumentropy-based HIPPS model. They find that preserved TADs often have boundaries that correspond to peaks in the single cell boundary probabilities of the inferred model. The authors seem to imply that these peaks in the boundary probability correspond to "physical boundaries" that cause the preservation of TADs. This argument seems circular; the model is based on inferring interaction strengths between monomers, such that the model recreates the input Hi-C map. This means that the ensemble average of the model should have a TAD boundary where one is present in the input Hi-C data. A TAD boundary in the Hi-C data would then seem to imply a peak in the model's single-cell boundary probability. (The authors do display two examples where this is not the case in Fig.3h, but looking at these cases by eye, they do not seem to correspond to strong TAD boundaries.) "Physical boundaries" in the model are hence a consequence of the preserved TADs, rather than the other way around, as the authors seem to suggest. At the very least the boundary probability in the HIPPS model is not an independent statistic from the Hi-C map (on which their model is constrained), so we have concerns about using the physical boundaries idea to understand where some of the preserved TADs come from.

      There are many statements in this long comment that require us to unpack separately. First, using both the CCM simulations, and even more importantly using data-driven approach, we established that TAD-like structures are present in single cells with and without cohesin. The latter finding is fully consistent with imaging experiments. We are unaware of other computational efforts, before our work, demonstrating that this is the case. Perhaps, the Reviewer can point to the papers in the literature.

      Regarding the statement that our arguments are circular, and lack of clarity of the meaning of physical boundary, please allow us to explain. First, we apologize for the confusion. Let us clarify our approach. We first used CCM to understand the potential origin of substantial fraction of P-TADs in the GM. The simulations, allowed us to generate the plausible mechanisms, for the origin of P-TADs. Because the CCM does reproduce the Hi-C data, we surmised that the general mechanisms inferred from these simulations could be profitably used to analyze the experiments. The simulations also showed that knowledge of 3D structures produces a muchneeded structural basis of P-TADs , and potentially for emergence of new TADs upon cohesin depletion.

      Because 3D coordinates are needed to obtain structural insights into the role of cohesin, we use a novel method to obtain them without the need for simulations. In particular, we used the HIPPS method to obtain 3D coordinates with the Hi-C data as the sole input, which allowed us to calculate directly the boundary probabilities. The excellent agreement between the predicted 3D structures and imaging experiments suggests that meaningful information, not available in Hi-C, may be gleaned from the ensemble of calculated 3D structures.

      Although “physical boundary”, a notion introduced by Zhuang, is defined in in the method section, it is apparently unclear for which we apologize. Because this is an important technical tool, we have added a summary in the main text in the revision. We did not mean to imply that the physical boundaries cause the preservation of TADs, although we found that maintenance of the enhancer-promoter contacts (see Fig. 8 in the revision) could be one of the potential reasons for the emergence of physical boundaries. We agree with the reviewer that physical boundaries are structural evidence of preserved TADs (not the cause), that is when a TAD is preserved, we can detect it by prominent physical boundary. The purpose and benefit of physical boundary analysis and using HIPPS in general is to obtain three-dimensional structures of chromosomes. Although both CCM simulations and HIPPS use Hi-C contact maps, three-dimensional structures provide additional information that is not present in the Hi-C data.

      The arguments that the authors use to justify their claims could be clarified and strengthened. Here are some suggestions: -Explain the concept of "physical boundaries" more clearly in the main text.

      As explained above, we have revised the text to clarify the concept and purpose of physical boundaries analysis. See Page 7.

      • Justify why the boundary probabilities and the physical boundaries concept can be used to offer novel insight into where preserved TADs may come from.

      Boundary probabilities and physical boundaries provide previously unavailable 3D structural information on the TADs structures both at the single-cell and population level. This provides a direct structural basis for determining which TADs are preserved. But in order to understand where P-TADs may come from, physical boundaries analysis alone is not sufficient. As we have shown in the analysis of enhancer-promoter contact, using physical boundary analysis from 3D structures, we can conclude that conservation of enhancer-promoter contact could be one of the reasons for the P-TAD.

      • Explain more clearly what the additional value of using the HIPPS model to study TAD preservation is.

      Our goal, as announced in the title is to elucidate the structural basis for the emergence of PTADs. The HIPPS method, which avoids doing simulations (like CCM and other polymer models used in the literature) provides an ensemble of 3D conformations using averaged contact map generated in Hi-C experiments. Even more importantly, HIPPS produce an ensemble of structures, which can be the basis for predicting the outcomes at the single-cell level. The accuracy of the generated structures has been shown in our previous work (Shi and ThirumalaiPRX 2021). In ensemble-averaged Hi-C experiments, TADs appear to be relatively stable. However, imaging experiments (Bintu et. al, 2018) have revealed that TADs are not fixed structures present in every single cell, but instead exhibit variability at the single-cell level. TADlike structures with distinct boundaries are observed in individual cells, and the location of these boundaries varies from cell to cell. However, these TAD-like structures still show a preferential positioning in 3D structures. Interestingly, the preferential positioning often corresponds to TAD boundaries observed in population-averaged Hi-C data. This suggests that while cohesin is involved in establishing the overall organization of TADs, other factors and mechanisms could also contribute to TAD formation at the individual cell level. In this study, we showed some boundaries of P-TADs upon cohesin loss in the Hi-C maps, align with preferential boundaries in individual 3D structures of chromosomes. The makes the finding that a subset of TADs is preserved upon cohesin is robust.

      From a technical perspective, the use of HIPPS avoids time-consuming polymer simulations. The HIPPS is rapid and can be used to generate arbitrarily large ensemble of structures, allowing us calculate properties both at the single cell and ensemble level.

      In addition, we'd like to offer the following feedback to the authors.

      3) The discussion of enhancer-promoter loops as a cause of TAD preservation is interesting, but it would be interesting to know fraction of preserved TADs enhancer-promoter loops might explain.

      We thank the reviewer for the excellent suggestion. We have done the suggested calculation. The results are shown in a new Figure.8 in the main text. We also moved the results on enhancer-promoter to the main results section from the Discussion section.

      4) The last paragraph of the introduction seems to state that only the HIPPS model was used to find single-cell 3D structures and boundary probabilities. However, the main text suggests that the CCM model was also used for these purposes.

      We have revised the text to clarify this point on pages 3-4. Also please see the discussion on the utility of HIPPS above.

      5) When referring to the boundary probability, it would be useful if the authors always specified whether they refer to the boundary probability before or after cohesin depletion (or loop depletion in the CCM model). Statements such as "This implies that peaks in the boundary probabilities should correspond to P-TADs" are ambiguous; it is unclear if the authors mean that boundary probabilities before cohesin depletion predict that the boundary will be preserved, rather than that preserved TAD boundaries correlate with peaks in the boundary probability after cohesin depletion.

      We thank the reviewer for the suggestion. Indeed, it may be confusing. Hence, we have revised the text in numerous places to clarify this point.

      6) It would be interesting to analyze all TAD boundaries that are present after cohesin depletion, rather than just those that overlap with TAD boundaries in WT cells. This would give better statistics for answering the question what causes TAD-like structures in cells without cohesin.

      We thank the reviewer for this excellent suggestion. First, this would we believe this deviate from the primary goal of this study: what leads to TAD preservation after cohesin deletion? Second, this has to be done very systematically, as we did here for P-TADs, in order draw meaningful conclusions. This is a very useful study for another occasion.

      7) The use of a plethora of acronyms (P-TAD, CM, DM, CCM, HLM...) makes the paper difficult to read.

      We have revised the text to change CM to “contact map” and “DM” to “distance map”. For PTADs, CCM, and WLM, we would argue that P-TAD is rather a clear and intuitive abbreviation and CCM/WLM refers to specific methods/models and replacing them with full names would make text more difficult to read. We hope the reviewer is okay with us keeping these acronyms.

      Reviewer #2 (Public Review):

      Summary:

      Here Jeong et al., use a combination of theoretical and experimental approaches to define molecular contexts that support specific chromatin conformations. They seek to define features that are associated with TADs that are retained after cohesin depletion (the authors refer to these TADs as P-TADs). They were motivated by differences between single cell data, which suggest that some TADs can be maintained in the absence of cohesin, whereas ensemble HiC data suggest complete loss of TADs. By reananalyzing a number of HiC datasets from different cell types, the authors observe that in ensemble methods, a significant subset of TADs are retained. They observe that P-TADs are associated with mismatches in epigenetic state across TAD boundaries. They further observe that "physical boundaries" are associated with P-TAD maintenance. Their structure/simulation based approach appears to be a powerful means to generate 3D structures from ensemble HiC data, and provide chromosome conformations that mimic the data from single-cell based experiments. Their results also challenge current dogma in the field about epigenetic state being more related to compartment formation rather than TAD boundaries. Their analysis is particularly important because limited amounts of imaging data are presently available for defining chromosome structure at the single-molecule level, however, vast amounts of HiC and ChIP-seq data are available. By using HiC data to generate high quality simulated structural data, they overcome this limitation. Overall, this manuscript is important for understanding chromosome organization, particularly for contacts that do not require cohesin for their maintenance, and for understanding how different levels of chromosome organization may be interconnected. I cannot comment on the validity of the provided simulation methods and hope that another reviewer is qualified to do this.

      We appreciate the reviewer for a comprehensive summary of our work, and we are happy that the reviewer finds our work important, which provides valuable insights to the field.

      Specific comments

      • It is unclear what defines a physical barrier. From reading the text and the methods, it is not entirely clear to me how the authors have designated sites of physical barriers. It may help to define this on pg 7, second to last paragraph, when the authors first describe instances of PTAD maintenance in the absence of epigenetic mismatch.

      We thank the reviewer for the suggestions. The details of physical boundary designation are provided in the appendix data analysis. To make the concept and idea of physical boundary easy to understand, we have revised the text on page 7 in the revised main text.

      • Figure 7 adds an interesting take to their approach. Here the authors use microC data to analyze promoter-enhancer/promoter-promoter contacts. These data are included as part of the discussion. I think this data could be incorporated into the main text, particularly because it provides a biological context where P-TADs would have a rather critical role.

      We thank the reviewers for the suggestion. We also agree that results in Figure 7 provide novel insights on TAD formation and its possible preservation upon perturbation. We have followed the reviewer’s suggestion to move it to an independent section in the main results section as the last subsection.

      • Figure 3a- the numbers here do not match the text (page 6, second to last paragraph). The numbers have been flipped for either chromosome 10 or chromosome 13 in the text or the figures.

      We thank the reviewer for pointing out this error. In the revised main text, it has been corrected.

      Reviewer #3 (Public Review):

      This manuscript presents a comprehensive investigation into the mechanisms that explain the presence of TADs (P-TADs) in cells where cohesin has been removed. In particular, to study TADs in wildtype and cohesin depleted cells, the authors use a combination of polymer simulations to predict whole chromosome structures de novo and from Hi-C data. Interestingly, they find that those TADs that survive cohesin removal contain a switch in epigenetic marks (from compartment A to B or B to A) at the boundary. Additionally, they find that the P-TADs are needed to retain enhancer-promoter and promoter-promoter interactions.

      Overall, the study is well-executed, and the evidence found provides interesting insights into genome folding and interpretations of conflicting results on the role of cohesin on TAD formation.

      We are pleased with the reviewer’s positive assessment of our work.

      To strengthen their claims, the authors should compare their de-novo prediction approach to their data-driven predictions at the single cell level.

      We thank the reviewer for the very good suggestion. We are assuming that the Reviewer is asking us to compare the CCM simulations with HIPPS generated structures at the single cell level. We have shown, using the GM12878 cell data, that the polymer simulations reproduce the Hi-C contact maps (an average quantity) well (see Appendix Fig. 2 and Fig. 3). In addition, we show in Appendix Fig. 8 the comparison with ensemble averaged distance maps as well as at the single cell level for Chr 13 from the GM12878 cell. There are TAD-like structures at the single cell level just as we find for HCT-116 cell (Fig. 5 in the main text). Thus, the conclusions from de-novo prediction and data-driven predictions are consistent. In addition, in our previous publication introducing HIPPS in Phys Rev X 11: 011051 (2021), we showed that the method is quantitatively accurate in reproducing experimental data for all the interphase chromosomes.

      Having demonstrated this consistency, we used computationally simple data-driven predictions to analyze HCT-116 and mouse liver cell lines for which Hi-C data with and without cohesin rather than perform multiple laborious polymer simulations.

      Please see below for our response to specific comments.

      1) It is confusing that the authors change continuously their label for describing B-A and A-B switches. They should choose one expression. I think that the label "switch" between A and B is more precise than "mismatch".

      We have revised the text to make it consistent. Now it all reads “A-B”. Yes, the suggestion that we use switch is good but we think that mismatch is more concise. We trust that this Reviewer will indulge us on this point.

      2) In the Abstract, the authors mention HCT-116 cells but do not specify which cells are these.

      We have changed “HCT-116” in the abstract to “human colorectal carcinoma cell line”.

      3) In the Abstract, it is unclear what the authors mean by "without any parameters"

      In the theoretically based HIPPS method, there is no “free” parameter. In other words, the only parameter is uniquely determined. To avoid confusion, we have removed “without any parameters” from abstract.

      4) In Results, what do the authors mean by 16% (26%)?

      This refers the percentage of how many TADs are preserved after Nipbl and RAD21 removal in mouse and HCT-116 cells, respectively. Using TopDom method, we identified TAD boundaries in Wild and cohesin-depleted cells. There are 16% (959 out of 4176 – Fig. 1a) and 26% (1266 out of 4733 – Fig. 1b) of TADs are preserved after Nipbl and RAD21 removal in mouse and HCT-116 cells, respectively. We removed the percentages in the revised version.

      5) In Results, the authors mention "more importantly, we did tune the value of any parameter to fit the experimental CMs". Did they mean that instead they didn't tune any parameter?

      We apologize for the confusion. In the CCM, there is a single controlled parameter. We have changed the sentence to reflect this correctly.

      6) In Results, section "CCM simulations reproduce wild-type Hi-C maps", Kullback-Leibler (KL) divergence is used to assess the correlation between two loci, but it is unclear what the value 0.04 stands for; is it a good or a bad correlation?

      The value for Kullback-Leibler divergence can vary from 0 to infinity with 0 give the perfect correlation. Thus, 0.04 means that the correlation is excellent.

      7) The authors use two techniques to obtain 3D structures, one is CCM, which takes the cohesin as constraints, and another is HIPPS, which reconstructs from Hi-C maps. Both seem to have good agreement with the Hi-C contact maps. However, did the authors compare the CCM with the HIPPS 3D structures?

      This is detailed in response at the start of the reply to this Reviewer. As detailed in this response as well in the main text we used the CCM to generate hypotheses for the origin of P-TADs. In the process, we established the accuracy of CCM, which gives us confidence about the hypotheses. As explained above and emphasized in the revised version, CCM simulations are time consuming whereas generating 3D structures using HIPPS is computationally simple. Because HIPPS is also accurate, we used it to analyze the Hi-C data on mouse liver, HCT-116 as well as Micro-Data on mESC.

      In our paper in Phys Rev X 11: 011051 (2021) we showed that HIPPS reproduces Hi-C data. In the current manuscript, we showed in Appendix Fig. 2 and Fig. 3 as well as in a study in 2018 (Shi and Thirumalai, Nat Comm.) that CCM is accurate as well. Thus, there is little doubt about the accuracies of the methods that we have developed.

      8) In Results, section "P-TADs have prominent spatial domain boundaries", the authors constructed individual spatial distance matrices (DMs) using 10,000 simulated 3D structures. What are the differences among these 10,000 simulations? Do they start them with different initial structures?

      The structures are generated using HIPPS which is data-driven method that uses Hi-C contact map as constraints. The method, which uses the maximum entropy theory, samples from a distribution that describe the structural ensemble of chromosome. The 10,000 structures are randomly sampled and are independent from each other. The HIPPS method is not a simulation, and hence the issue of initial structures does not arise.

      9) In Methods, when the authors mention the "unknown parameter", do they use one parameter for all simulations (+/- cohesin) or is this parameter different for each system? Would this change the results?

      We apologize for the confusion. The “unknown parameter” is the energy scale 𝜖 that describes the interaction strength between chromosome loci. We have revised the text in the method (page 27) to clarify it. The same value of 𝜖 is used for all CCM simulation with or without cohesin.

      10) In Methods, when the authors perform DBSCAN clustering, they mention that they optimize the clustering parameters for each system. However, if they want to compare between different systems, the clustering parameters should be the same.

      The purpose of DBSCAN is to capture the spatial clustering topology of chromosome loci. However, different cell types and chromosomes may have different overall density, which will impact the average distance between loci. If using the same parameters, such global changes will impact the result of clustering most and the intended spatial clustering topology can be distorted. Hence, we tune the clustering parameter for each system in order to ignore the global effect but only capture the local and topology of clustering of chromosome loci.

      Grammar comments:

      1) "structures, with sharp boundaries are present, at.."

      We thank the reviewer for pointing out the error. We have fixed it.

      2) "Three headlines emerge from these studies are:"

      We have fixed it.

      3) "both the cell lines"

      We have fixed it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study explores the relationship between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 pathogenic strains. G4 structures were found to be non-randomly distributed within PAIs and conserved within the same strains. Positive correlations were observed between G4s and GC content across various genomic features, suggesting a link between G4 structures and GC-rich regions. Differences in GC content between PAIs and the core genome underscored the unique nature of PAIs. High-confidence G4 structures in Escherichia coli's regulatory regions were identified, influencing DNA integration within PAIs. These findings shed light on the molecular mechanisms of G4-PAI interactions, enhancing our understanding of bacterial pathogenicity and G4 structures in infectious diseases.

      Strengths:

      The findings of this study hold significant implications for our understanding of bacterial pathogenicity and the role of guanine-quadruplex (G4) structures. Molecular Mechanisms of Pathogenicity: The study highlights that G4 structures are not randomly distributed within pathogenicity islands (PAIs), suggesting a potential role in regulating pathogenicity. This insight into the uneven distribution of G4s within PAIs provides a basis for further research into the molecular mechanisms underlying bacterial pathogenicity.

      Conservation of G4 Structures: The consistent conservation of G4 structures within the same pathogenic strains suggests that these structures might play a vital and possibly conserved role in the pathogenicity of these bacteria. This finding opens doors for exploring how G4s influence virulence across different pathogens. Unique Nature of PAIs: The differences in GC content between PAIs and the core genome underscore the unique nature of PAIs. This distinction suggests that factors such as DNA topology and G4 structures might contribute to the specialized functions and characteristics of PAIs, which are often associated with virulence genes. Regulatory Role of G4s: The identification of high-confidence G4 structures within regulatory regions of Escherichia coli implies that these structures could influence the efficiency or specificity of DNA integration events within PAIs. This finding provides a potential mechanism by which G4s can impact the pathogenicity of bacteria.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Overall, the study provides fundamental insights into the pathogenicity island and conservation of G4 motifs.

      Thank you for your thorough review of our manuscript exploring the relationship between G4 structures and PAIs in 89 pathogenic strains. We appreciate your recognition of the strengths of our study and its potential implications for understanding bacterial pathogenicity. We are pleased that you highlighted the significance of our findings in revealing the non-random distribution and conservation of G4 structures within PAIs across various pathogenic strains.

      Your insightful comments about the molecular mechanisms of pathogenicity, the conservation of G4 structures, the unique nature of PAIs, and the regulatory role of G4s within Escherichia coli are invaluable. We are encouraged by your positive evaluation of these aspects, which underscores the potential impact of our work on advancing the understanding of bacterial pathogenicity.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript entitled "The Intricate Relationship of G-Quadruplexes and Pathogenicity Islands: A Window into Bacterial Pathogenicity" Bo Lyu explored the interactions between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 bacterial genomes through a rigorous computational approach. This paper handles an intriguing and complex topic in the field of pathogenomics. It has the potential to contribute significantly to the understanding of G4-PAI interactions and bacterial pathogenicity.

      Strengths:

      • The chosen research area.

      • The summarizing of the results through neat illustrations.

      Weaknesses:

      This reviewer did not find any significant weaknesses.

      Thank you for your positive and encouraging feedback on our manuscript. We appreciate your specific mention of the strengths, particularly highlighting the chosen research area and the effectiveness of our illustrations in summarizing the results. Your acknowledgment of these aspects is motivating, and we are pleased that the content and presentation resonated well with you.

      Reviewer #3 (Public Review):

      The main problem with the work is that the results are only descriptive and do not allow any inferences or conclusions about the importance of the function of G4 structures. The discussion and conclusions are poor. The results are preliminary and in order to try to make the analysis more interesting, it should be further extended and the data must be explored in a much greater depth.

      Thank you for your constructive feedback on our manuscript, and appreciate the time and effort you dedicated to evaluating our work. We acknowledge your concern regarding the descriptive nature of the results and the limitations in making inferences about the importance of G4 structures. To address this, we plan to enhance the depth of our analysis and provide more insightful interpretations in the discussion and conclusion sections. It's important to note that this study is intentionally a short report, emphasizing data mining findings rather than laboratory results. We understand the value of in-depth investigations and concur that our work lays the groundwork for more extensive studies in this area, aiming to provide a real-world scenario. We are committed to addressing your comments and refining our manuscript to contribute meaningfully to this field. Your insights are invaluable, and we look forward to presenting an improved version of our study.

      Reviewer #2 (Recommendations For The Authors):

      The authors could try a higher G-quadruplex score of 1.4 or higher values to substantiate their findings or pick up the bacterial genomes that relied on G4s for their pathogenecity.

      We acknowledge your recommendation to explore a higher G-quadruplex score, and we would like to assure you that we have already conducted analyses using thresholds of 1.4 and 1.6. The findings consistently support the observations presented in the manuscript. We have updated the text to reflect this additional analysis, and the results are included in the revised version of the manuscript (Figure S1).

      Reviewer #3 (Recommendations For The Authors):

      Minor points

      Introduction

      Q1. The introduction is shallow. The concept and the importance of PAIs is vague. Why should these genes be different from other genes?

      A1: Thank you for your valuable feedback and we have incorporated additional content to provide a more comprehensive understanding of PAIs and their distinctiveness from other genes in the Introduction section.

      Changes: Lines 44-49 “G4 structures are ...innovative technologies.” were added.

      Lines 51-55 “PAIs are distinct...such as plasmids.” were added.

      Lines 60-66 “PAIs typically contain...recipient genome” were added.

      Lines 77-80 “Growing evidence has...CpG islands, and PAIs” were added.

      Material and Methods

      Q2. It is not clear if the author used the TBTools or the G4Hunter software G4 structures. It would be interesting to include references to published articles that used this software.

      A2: Thank you! Corrected and added more references that used TBTools to extract sequences and G4Hunter to identify G4 structures.

      Q3. The statistical significance must not be based only on p-values. P-values are influenced by sample sizes. I strongly recommend the use of other parameters such as confidence interval and ROC analysis.

      A3: Thank you! We have incorporated confidence intervals and ROC analysis to complement p-values, enhancing the robustness of our statistical analysis.

      Changes: Lines 265-267 “The correlation's significance... sensitivity and specificity.” were added.

      Results and discussion

      Q4. The stability of G4 structures seems to be important for its function (doi:10.1111/febs.15065). Therefore it would be interesting if the analysis were carried out separating the G4 according to stability.

      A4: Thank you for highlighting the importance of G4 structure stability for its function and suggesting an analysis based on stability. We have carefully reviewed the referenced paper (doi:10.1111/febs.15065) and note that their study focused on the stability analysis of individual G4s. In our current study, we identified a large number of G4s, and while stability analysis for each G4 is indeed an interesting avenue, it goes beyond the scope of this particular investigation. However, we agree that exploring the relationship between G4 stability and function is a valuable topic. We plan to delve deeper into this aspect in future work, as discussed in our response to your previous comment.

      Changes: Lines 217-221 “Lastly, the stability of G4...molecular engineering.” were added.

      Q5. The quality of the figures is poor. Is not possible to read the correlation and p-values from Figure 2.

      A5: The revised figure is now submitted with enhanced clarity to ensure that correlation and p-values can be easily discerned.

      Q6. The analysis of promoter regions should be performed taking into account the distance between the G4 and the beginning of the gene.

      A6: Thank you and we have elaborated more in the revision.

      Changes: Lines 198-106 “Additionally, considering the distance...of G4 structures in promoters.” were added.

      Q7. The topic "Putative origin, transfer mechanisms, and functions of G4s in PAIs". The comments made on this topic are purely speculative and not backed up by data or any type of experimental analysis.

      A7: We appreciate the feedback and have revised the title to emphasize the focus on the functions of G4s in PAIs. We acknowledge that the content related to the putative origin and transfer mechanisms of G4s in PAIs is purely descriptive and speculative, we have made the adjustment to relocate this information to the discussion section for a more appropriate treatment.

      Q8. The supplemental material is hard to follow. The meaning of each column should be better explained. Why was the data divided into 10 parts?

      A8: Following your suggestion, we have revised the tables for better clarity. To address concerns about the division into 10 parts, we have decided to remove this data from the tables as it was deemed unnecessary for presentation.

      Q9. Why was the data of E. Coli strains 1 and 2 shown in Tables S3 and S4 and the other bacterial strains were not?

      A9: We appreciate your inquiry. The data of E. Coli strains 1 and 2 were specifically highlighted in Tables S3 and S4 as illustrative examples to demonstrate the putative functions of G4s in PAIs within the scope of our study. Given the extensive nature of function annotation analyses across various pathogenic strains, presenting additional tables for each strain would have resulted in an impractical volume of supplementary material.

      Q10. The Results and Discussion should be separated.

      A10: Thank you! Corrected as suggested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major changes:

      Removed any claim of label-free detection, clarifying that ADeS can predict apoptotic events without apoptotic probes

      Provided a github repository with the executable code ( https://github.com/mariaclaudianicolai/ADeS )

      Uploaded all imaging data used to train and benchmark ADeS on Zenodo ( https://zenodo.org/uploads/10260643 )

      Added supplementary movie showing degraded performance on noisy movie in vivo (Supplementary Movie 3)

      Generated a supplementary figure showing the effect of noise on prediction accuracy (Supplementary Figure 4)

      Minor changes:

      Line 6: added Benjamin Grädel and Mariaclaudia Nicolai to the list of authors

      Line 44: dynamics

      Line 54: updated reference to a published paper

      Line 65: fixed spelling of "chronic"

      Line 74: fixed spelling of "limitations"

      Line 76: changed “biochemical reporters” to “fluorescent probes”

      Line 77: changed “label-free” to “probe-free”

      Line 85: “can apply” to "can be applied"

      Line 109: The citation is updated to appear in the reference

      Lines 143-144: Fixed statement about apoptotic cells having non-significant displacement compared to arrested cells

      Line 156: Figure 3 is cited

      Line 185 and Fig 3 legends: “chore” to "core"

      Lines 187 and 248: “withouth” to "without"

      Lines 177-178: introduced acronyms for deep learning networks

      Lines 276-277: Added interval ranges to clarify subgroups observed in Figure 6F

      Line 284: substituted “SNR” with “signal-to-noise ratio”

      Line 286: mentioned “Supplementary Movie 3”

      Line 515: explicitly defined “field of view” instead of “FOVs”

      Lines 604-606: Added data availability section

      Line 822: modified caption of Figure 1D to explain the estimation of nuclear area over time

      Lines 911-912: Explained gray area in caption of figure 8B-C

      Supplementary figure 1: removed “Neu” and “Eos” acronyms from caption. Introduced definition of “FOV” and “SNR” acronyms

      Editorial assessment

      This valuable work by Pulfer et al. advances our understanding of spatial-temporal cell dynamics both in vivo and in vitro. The authors provide convincing evidence for their innovative deep learning-based apoptosis detection system, ADeS, that utilizes the principle of activity recognition. Nevertheless, the work is incomplete due to the authors' claim that their system is valid for non-fluorescently labeled cells, without evidence supporting this notion. After revisions, this work will be of broad interest to cell biologists and neuroscientists

      We acknowledge that the “label-free” claim was misleading, and in the revised manuscript we addressed this aspect by stating that ADeS is “probe-free”, not requiring any apoptotic marker. For this reason we kindly ask the editor to modify its assessment concerning the work being incomplete, as our tool was specifically meant for fluorescent microscopy.

      Reviewer #1 (Public Review):

      Summary:

      Pulfer et al., describe the development and testing of a transformer-based deep learning architecture called ADeS, which the authors use to identify apoptotic events in cultured cells and live animals. The classifier is trained on large datasets and provides robust classification accuracies in test sets that are comparable to and even outperform existing deep learning architectures for apoptosis detection. Following this validation, the authors also design use cases for their technique both in vitro and in vivo, demonstrating the value of ADeS to the apoptosis research space.

      Strengths:

      ADeS is a powerful tool in the arsenal of cell biologists interested in the spatio-temporal co-ordinates of apoptotic events in vitro, since live cell imaging typically generates densely packed fields of view that are challenging to parse by manual inspection. The authors also integrate ADeS into the analysis of data generated using different types of fluorescent markers in a variety of cell types and imaging modalities, which increases its adaptability by a larger number of researchers. ADeS is an example of the successful deployment of activity recognition (AR) in the automated bioimage analysis space, highlighting the potential benefits of AR to quantifying other intra- and intercellular processes observable using live cell imaging.

      Weaknesses:

      A major drawback was the lack of access to the ADeS platform for the reviewers; the authors state that the code is available in the code availability section, which is missing from the current version of the manuscript. This prevented an evaluation of the usability of ADeS as a resource for other researchers.

      We acknowledge that having access to the code is pivotal, and therefore in this revised version we deposited the Python code deploying our DL model on github (link). Moreover, we included in the revised manuscript the training datasets (in vitro and in vivo), as well as all the testing videos used to benchmark ADeS.

      The authors also emphasize the need for label-free apoptotic cell detection in both their abstract and their introduction but have not demonstrated the performance of ADeS in a true label-free environment where the cells do not express any fluorescent markers.

      The system was developed to primarily analyze data acquired via fluorescent microscopy, which relies on fluorescent staining to visualize cells. Therefore, it is not possible to evaluate our methodology in a 100% label-free environment. What we meant using the term “label-free” is that our method can detect apoptotic events based exclusively on morphological cues, without the use of fluorescent apoptotic reporters. We acknowledge that this terminology was misleading and we apologize for the misunderstanding. To amend this, in our revised paper we avoid using the term “label-free”, referring instead to “probe-free” detection.

      While Pulfer et al., provide a wealth of information about the generation and validation of their DL classifier for in vitro movies, and the utility of ADeS is obvious in identifying apoptotic events among FOVs containing ~1700 cells, the evidence is not as strong for in vivo use cases. They mention the technical challenges involved in identifying apoptotic events in vivo, and use 3D rotation to generate a larger dataset from their original acquisitions. However, it is not clear how this strategy would provide a suitable training dataset for understanding the duration of apoptotic events in vivo since the temporal information remains the same.

      One of the main challenges encountered in vivo was the difficulty of capturing rare events such as apoptosis in physiological conditions. Moreover the lack of publicly available datasets further prevented us from collecting an extended training dataset suitable for data-hungry techniques such as supervised deep learning. Resorting to 3D rotations was a strategy to exploit the visual information within acquisition volumes to train our classifiers for 2D detection. This approach is a common data augmentation technique that can naturally increment the size of a dataset by displaying the same object from different angles. However this technique does not explicitly address temporal aspects of the apoptotic events, such as their duration. The duration of the apoptotic events was empirically estimated to obtain a temporal window suitable for detection (Supplementary Figure 1K-L).

      The authors also provide examples of in vivo acquisitions in their paper, where the cell density appears to be quite low, questioning the need for automated apoptotic detection in those situations. In the use cases for in vivo apoptotic detection using ADeS (Fig 8), it appears that the location of the apoptotic event itself was obvious and did not need ADeS, as in the case of laser ablation in the spleen and the sparse distribution of GFP labeled neutrophils in the lymph nodes.

      Before addressing the need for these methodologies in vivo, we provide a proof of concept for their applicability. Accordingly, in vivo acquisitions present several visual artifacts and challenges that can hamper activity recognition techniques. Therefore, from a computer vision perspective, the successful implementation of ADeS in vivo is an achievement per se.

      Concerning its need, we showed in supplementary figure 3 that ADeS is robust to increasingly populated fields of view, and might be useful in detecting hindered apoptotic events as well as in reducing human-bias.

      Finally, the authors also mention that video quality altered the sensitivity of ADeS in vivo (Fig 6L) but fail to provide an example of ADeS implementation on a video of poor quality, which would be useful for end users to assess whether to adopt ADeS for their own live cell movies.

      In figure 6L we quantitatively showed that videos affected by low quality were negatively affecting the sensitivity of ADeS. In this revised version we included a supplementary movie (supplementary movie X) depicting ADeS performances in high signal-to-noise conditions. We also addressed this aspect in vitro, by generating a synthetic degradation of the movie quality and measuring the effect on the performances (supplementary figure 4).

      Reviewer #2 (Public Review):

      Summary:

      Pulfer A. et al. developed a deep learning-based apoptosis detection system named ADeS, which outperforms the currently available computational tools for in vitro automatic detection. Furthermore, ADeS can automatically identify apoptotic cells in vivo in intravital microscopy time-lapses, preventing manual labeling with potential biases. The authors trained and successfully evaluated ADeS in packed epithelial monolayers and T cells distributed in 3D collagen hydrogels. Moreover, in vivo, training and evaluation were performed on polymorphonucleated leukocytes in lymph nodes and spleen.

      Strengths:

      Pulfer A. et colleagues convincingly presented their results, thoroughly evaluated ADeS for potential toxicity assay, and compared its performance with available state-of-the-art tools.

      Weaknesses:

      The use of ADeS is still restricted to samples where cells are fluorescently labeled either in the cytoplasm or in the nucleus, which limits its use for in vitro toxicity assays that are performed on primary cells or organoids (e.g., iPSCs-derived systems) that are normally harder to transfect. In conclusion, ADeS will be a useful tool to improve output quality and accelerate the evaluation of assays in several research areas with basic and applied aims.

      As addressed in the answer to reviewer one, we primarily focused on fluorescent microscopy, which implies fluorescent labeling of the cells. The application to other imaging platforms was not the scope of our study. However, a model to infer apoptosis within other imaging solutions, e.g. brightfield, could be explored in future analogue studies.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their remarks. Please find our detailed answers bellow.

      1) The authors' continued refusal to acknowledge the other reports before the final sentence of the Discussion, which has been pointed out in two previous rounds of review as a major flaw, detracts from the manuscript significantly.

      We now acknowledge and discuss the other SIRT6-nucleosome reports in the introduction as requested by the reviewer.

      2) While some of the grammatical errors in previous versions have been corrected, many remain, especially in the Methods section

      We corrected the remaining grammatical errors.

      3) Multiple statements of fact not supported by data shown in this work continue to lack appropriate references.

      We added references where facts were not supported by our data.

    1. Author Response

      We appreciate the thoughtful comments from the reviewers. All reviewers express common support for the study’s meaningful contribution to understanding interoceptive neurocircuitry in health and in psychiatric disorders. Specifically, the reviewers highlight the strong theoretical backing and the novel combination of tasks and analytical methods. In turn, the reviewers identify several areas for improvement that we plan to address in our resubmission. These include a more detailed demographic characterization of the study participants, increased clarity when describing the statistics that support each conclusion, and additional discussion when interpreting the resting state findings, as we did not include a separate control condition for the effect of time. One reviewer commented that we largely cite our previous work with the isoproterenol paradigm; while we will provide an updated and broader view of the literature in our resubmission, there remains a limited number of comparable interoceptive perturbation studies. Finally, one comment referred to our reliance on ratings of interoceptive intensity without included additional behavioral measures. While our measures of interest were chosen for their relevance to our hypotheses, we will consider adding additional measures such as interoceptive accuracy (correspondence between heart rate and dial ratings) that were collected during the perturbation task, should they provide additional insight into the insular responses of the participants.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript presents the first evidence for a plastic enhancement in the response of pial cortical arterioles to external stimulation. Specifically, they show (p8; Figure 3A-C) that repeated application of a visual stimulus at 0.25 Hz, at the upper edge of the vasomotor response, leads to a greater change in the diameter of pial arterioles at that frequency. This adds to the earlier, referenced work of Mateo et al (2017) that showed locking - or entrainment of pial arteriole vasomotion - by stimuli at different (0.0 to 0.3 Hz) frequencies.

      We thank the reviewer for positively identifying the value of our manuscript.

      The manuscript has a major flaw. Much as there is plasticity that leads to an increase in the amplitude of vasomotion at the drive frequency, the authors need to show reversibility. This could possibly be accomplished by driving the visual system at a different frequency, say 0.15 Hz, and observing if the 0.25 Hz response is then diminished. The authors could then test if their observation is repeatable by again driving at 0.25 Hz. Unless I missed the presentation on this point, there is no evidence for reversibility.

      The reviewer has raised a very important point of view. In our experiments, the visually induced vasomotion (or visual stimulus-triggered vasomotion) was always entrained by repeated trials of the 0.25 Hz temporal frequency stimuli. When the visual stimulation stops, the vasomotion frequency lock to 0.25 Hz quickly dissipates. After saturated training with this stimulus, the parameters of the visual stimulus were switched, for example to 0.15 Hz. The animal quickly adapted to this new stimulus paradigm and the vasomotion was frequency-locked to 0.15 Hz. The adaptation to this new paradigm occurred well within 5 minutes. In Fig. 5, various paradigms were randomly tested. In some of the trials, 0.25 Hz stimulus was tested after 0.15 Hz. The vasomotion also quickly adapted back to the 0.25 Hz. We agree with the reviewer that this reversibility could have been explicitly documented in the manuscript.

      Drew, P. J., A. Y. Shih, J. D. Driscoll, P. M. Knutsen, D. Davalos, P. Blinder, K. Akassoglou, P. S. Tsai, and D. Kleinfeld. 2010. 'Chronic optical access through a polished and reinforced thinned skull', Nature Methods, 7: 981-84.

      Morii, S., A. C. Ngai, and H. R. Winn. 1986. 'Reactivity of rat pial arterioles and venules to adenosine and carbon dioxide: With detailed description of the closed cranial window technique in rats', Journal of Cerebral Blood Flow & Metabolism, 6: 34-41.

      Reviewer #2 (Public Review):

      Sasaki et al. investigated methods to entrain vasomotion in awake wild-type mice across multiple regions of the brain using a horizontally oscillating visual pattern which induces an optokinetic response (HOKR) eye movement. They found that spontaneous vasomotion could be detected in individual vessels of their wild-type mice through either a thinned cranial window or intact skull preparation using a widefield macro-zoom microscope. They showed that low-resolution autofluorescence signals coming from the brain parenchyma could be used to capture vasomotion activity using a macro-zoom microscope or optical fibre, as this signal correlates well with the intensity profile of fluorescently-labelled single vessels. They show that vasomotion can also be entrained across the cortical surface using an oscillating visual stimulus with a range of parameters (with varying temporal frequencies, amplitudes, or spatial cycles), and that the amplitude spectrum of the detected vasomotion frequency increases with repeated training sessions. The authors include some control experiments to rule out fluorescence fluctuations being due to artifacts of eye movement or screen luminance and attempt to demonstrate some functional benefit of vasomotion entraining as HOKR performance improves after repeat training. These data add in an interesting way to the current knowledge base on vasomotion, as the authors demonstrate the ability to entrain vasomotion across multiple brain areas and show some functional significance to vasomotion with regards to information processing as HOKR task performance correlates well with vascular oscillation amplitudes.

      We thank the reviewer for summarizing the value of our study and recognizing its significance.

      The aims of the paper are mostly well supported by the data, but some streamlining of the data presentation would improve overall clarity. The third aim to establish the functional significance of vasomotion in relation to plasticity in information processing could be better supported by the inclusion of some additional control experiments.

      We thank the reviewer for recognizing our vast amount of data supporting our findings. We agree that better data presentation could have improved the clarity of the manuscript.

      Specifically:

      1) The clarity and comprehensibility of the paper could be significantly enhanced by incorporating additional details in both the introduction and discussion sections. In the introduction, a succinct definition of the frequency range of vasomotion should be provided, as well as a better description of the horizontal optokinetic response (i.e. as they have in the results section in the first paragraph below the 'Entrainment of vasomotion with visual stimuli presentation' sub-heading). The discussion would benefit from the inclusion of a clear summary of the results presented at the start, and the inclusion of stronger justification (i.e. more citations) with regards to the speculation about vasomotion and neuronal plasticity (e.g. paragraph 5 includes no citations).

      We agree that a better description of vasomotion and horizontal optokinetic response could have been provided in the introduction. As the reviewer suggests, the discussion could also have started with the following summary of the results.

      “We show that visually induced vasomotion can be frequency-locked to the visual stimulus and can be entrained with repeated trials. The initial drive for the vasomotion, or the sensory-evoked hyperemia, must be coming from the neuronal activity in the visual system. The vasomotion is likely triggered by activation of the neurovascular interaction (Kayser, 2004; van Veluw et al., 2020). Surprisingly, the entrained vasomotion was observed not only in the visual cortex but also widely throughout the surface of the brain and deep in the cerebellar flocculus. The global entrainment could be realized through separate mechanisms from the local neurovascular coupling. What is also unknown is where the plasticity occurs. The neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated.”

      2) The novel methods for detecting vasomotion using low-resolution imaging techniques are discussed across the first four figures, but this gets a little bit confusing to follow as the authors jump back and forth between the different imaging and analysis techniques they have employed to capture vasomotion. The data presentation could be better streamlined - for instance by presenting only the methods most relevant for the functional dataset (in Figures 5-7), with the additional information regarding the various controls to establish the use of autofluorescence intensity imaging as a valid method for capturing vasomotion reduced to fewer figure panels, or moved to supplementary figures so as to not detract from the main novel findings contributed in this study.

      We apologize for the confusing presentation of the data. Many of the initial figures were technical; however, we feel that following these steps was necessary to logically conclude that shadow imaging of the autofluorescence could be used as an indicator of vasomotion. We do agree with the reviewer that going back and forth between different techniques can be confusing. We could have added separate supplementary figures to introduce the various methods used upfront before going into the findings.

      3) The authors heavily rely on representative traces from individual vessels to illustrate their findings, particularly evident in Figures 1-4. While these traces offer a valuable visualization, augmenting their approach by presenting individual data points across the entire dataset, encompassing all animals and vessels, would significantly enhance the robustness of their claims. For instance, in Figures 1 and 2, where average basal and dilated traces are depicted for a representative vessel, supplementing these with graphs showcasing peak values across all measured vessels would enable the authors to convey a more holistic representation of their data. Or in Figure 3, where the amplitude spectrum is presented for individual Texas red fluorescence intensity changes in V1 across novice, trained, and expert mice, incorporating a summary graph featuring the amplitude spectrum value at 0.25Hz for each individual trace (across animals/imaging sessions), followed by statistical analysis, would fortify the strength of their assertions. Moreover, providing explicit details on sample sizes for each individual figure panel (where not a representative trace), including the number of animals or vessels/imaging sessions, would contribute to transparency and aid readers in assessing the generalisability of the findings.

      We agree with the reviewer that summarization of the data across a number of vessels/imaging sessions would lead to more generalization of the findings. However, contrary to what the reviewer described, we did summarize the vessel diameter expansion events across multiple vessel observations in Fig. 1F, G. The vasomotion parameters were not summarized for observation in intact skull shown in Fig. 2. However, this figure was intended just to show that vessel boundary cannot be well defined in intact skull imaging and Texas Red intensity or autofluorescence intensity fluctuation would give a better indication of vessel diameter fluctuation. In Fig. 3G, the peak ratio of 0.25 Hz was calculated for individual animals at Novice, Trained, and Expert levels and summarized for n = 5 animals. Statistical analysis was also done. The variability between imaging sessions within individual animals was not analyzed; thus, this could have been indicated.

      4) In the experiments where mice are classed as "novice", "trained" or "expert", the inclusion of the specific range of the number of training sessions for each category would improve replicability.

      We agree with the reviewer that classification on the level of training should have been explicitly indicated. Mice experiencing the first visual training session were defined as “Novice”. The mice that have experienced 3 training sessions are the “Trained” mice and the performance of the “Trained” mice during the 4th training session was evaluated. Mice that experienced 8 to 11 rounds of visual training sessions are the “Expert” mice.

      5) The authors don't state whether mice were habituated to the imaging set-up prior to the first data collection, as head-fixation and restraint can be stress-inducing for animals, especially upon first exposure, which could impact their neurovascular coupling responses differentially in "novice" versus "trained" imaging sessions (e.g. see Han et al., 2020, DOI: https://doi.org/10.1523/JNEUROSCI.1553-20.2020). The stress associated with a tail vein injection prior to imaging could also partially explain why mice didn't learn very well if Texas Red was injected before the training session. If no habituation was conducted in these experiments, the study would benefit from the inclusion of some control experiments where "novice" responses were compared between habituated and non-habituated animals.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the Texas Red experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, Texas Red was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, Texas Red was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion.

      6) The experiments regarding the brain-wide vasomotion entrainment across the cortical surface would benefit from some additional information about how brain regions were identified (e.g. particularly how V1 and V2 were distinguished given how close together they are).

      The brain regions were identified by referring to the Mouse Brain Atlas. As the skull was intact, the location of bregma, lambda, and midline was clearly visible. We agree with the reviewer that strict separation of V1 and V2 could be difficult if we rely on the brain atlas alone. However, what we wanted to emphasize was that there was no specific localization of the vasomotion entrainment effect.

      7) Whilst the authors show that HOKR task performance and vasomotion amplitude are increased with repeated training to provide some support to their aim of investigating the functional significance of vasomotion with regards to information processing plasticity, the inclusion of some additional control experiments would provide stronger evidence to address this aim. For instance, if vasomotion signalling is blocked or reduced (e.g. using optogenetics or in an AD mouse model where arteriole amyloid load restricts vasomotion capacity), does flocculus-dependent task performance (e.g. HOKR eye movements) still improve with repeated exposure to the external stimulus.

      We agree that experimental intervention to vasomotion is ideal to test the functional significance of vasomotion. As pharmacological intervention lacks specificity, we are currently exploring the optogenetic approach. We have never thought of using the AD mouse as a model of restricted vasomotion by amyloid, and we agree this would be an interesting model to study. However, the AD mouse model would also have deficits other than the restricted vasomotion. On the other hand, we could test whether the repeated presentation of slowly oscillating visual stimuli can have beneficial effects in improving the cognitive abilities of AD model mice.

      Reviewer #3 (Public Review):

      Summary:

      Here the authors show global synchronization of cerebral blood flow (CBF) induced by oscillating visual stimuli in the mouse brain. The study validates the use of endogenous autofluorescence to quantify the vessel "shadow" to assess the magnitude of frequency-locked cerebral blood flow changes. This approach enables straightforward estimation of artery diameter fluctuations in wild-type mice, employing either low magnification wide-field microscopy or deep-brain fibre photometry. For the visual stimuli, awake mice were exposed to vertically oscillating stripes at a low temporal frequency (0.25 Hz), resulting in oscillatory changes in artery diameter synchronized to the visual stimulation frequency. This phenomenon occurred not only in the primary visual cortex but also across a broad cortical and cerebellar surface. The induced CBF changes adapted to various stimulation parameters, and interestingly, repeated trials led to plastic entrainment. The authors control for different artefacts that may have confounded the measurements such as light contamination and eye movements but found no influence of these variables. The study also tested horizontally oscillating visual stimuli, which induce the horizontal optokinetic response (HOKR). The amplitude of eye movement, known to increase with repeated training sessions, showed a strong correlation with CBF entrainment magnitude in the cerebellar flocculus. The authors suggest that parallel plasticity in CBF and neuronal circuits is occurring. Overall, the study proposes that entrained "vasomotion" contributes to meeting the increased energy demand associated with coordinated neuronal activity and subsequent neuronal circuit reorganization.

      We thank the reviewer for providing a thorough summarization of our manuscript.

      Strengths:

      • The paper describes a simple and useful method for tracking vasomotion in awake mice through an intact skull.

      • The work controls for artefacts in their primary measurements.

      • There are some interesting observations, including the nearly brain-wide synchronization of cerebral blood flow oscillations to visual stimuli and that this process only occurs after mice are trained in a visual task.

      • This topic is interesting to many in the CBF, functional imaging, and dementia fields.

      We thank the reviewer for positively recognizing the strength of the paper.

      Weaknesses:

      • I have concerns with the main concepts put forward, regarding whether the authors are actually studying vasomotion as they state, as opposed to functional hyperemia which is sensory-induced changes in blood flow, which is what they are actually doing. I recommend several additional experiments/analyses for them to explore. This is mostly further characterizing their effect which will benefit the interpretations.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia induced with slow oscillating visual stimuli was called “visually induced vasomotion”.

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed.

      • Neuronal calcium imaging would also benefit the study and improve the interpretations.

      In our paper, we mainly studied the visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. To identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required.

      • The plastic effects in vasomotion synchronization that occur with training are interesting but they could use an additional control for stress. Is this really a plastic effect, or is it caused by progressively decreasing stress as trials and progress? I recommend a habituation control experiment.

      As also pointed out by reviewer #2, we agree that, whether stress would affect visually induced vasomotion or not could be studied. Studying the visually induced vasomotion in mice well-habituated to the experimental apparatus would give an idea of whether stress could truly be a profounding factor affecting vasomotion. On the other hand, whether acutely induced stress can interfere with the already entrained vasomotion could also be studied. In the experiments presented in Fig. 3, Texas Red was injected via the tail vein, which would be quite stressful for the mouse. However, in the trained mouse, visually induced vasomotion could be observed regardless of the stress. It is likely that stress can interfere with the acquisition of vasomotion entrainment, but the already acquired entrainment will not be canceled with acute stress induced by tail-vein injection. We agree that further relationship between stress and vasomotion and plasticity related to vasomotion entrainment could be investigated.

      Appraisal

      I think the authors have an interesting effect that requires further characterization and controls. Their interpretations are likely sound and additional experiments will continue to support the main hypothesis. If brain-wide synchrony of blood flow can be trained and entrained by external stimuli, this may have interesting therapeutic potential to help clear out toxic proteins from the brain as seen in several neurodegenerative diseases.

      We thank the reviewer for the positive evaluation of our manuscript. Strong entrainment of visually induced vasomotion was observed with a simple presentation of slowly oscillating visual stimuli for several days. This is a totally non-invasive method to train the vasomotion capacity. As the reviewer recognizes, potential benefits for the treatment of dementia and neurodegenerative diseases could be evaluated with further studies.

    1. Author Response:

      We thank the reviewers and editor for their careful analysis of our manuscript and their appreciation of its strengths. Our plans to address the reviewers’ concerns regarding the weaknesses of the study are outlined below.

      Reviewing Editor (Public Review):

      “Weaknesses mainly concern the experiments and arguments leading to the authors' notion that Cav3 channels may partially compensate for the loss of Cav1.4 calcium currents in cone synapses. It is possible that the non-conducting Cav1.4 variant supports synapse development and the Cav3 channel then provides the calcium influx. However, in its current state, the study does not unequivocally assess Cav3 expression in wild-type cones, it lacks direct evidence of Cav3 expression and upregulation, e.g. via single cell transcriptomics, immunolabeling, or an elaboration on electrophysiology, and it does not test the authors' earlier idea that Cav1.4 might couple to intracellular calcium stores at photoreceptor synapses.”

      Current transcriptomic studies indicate that Cav3 transcripts are present at extremely low levels compared to that for Cav1.4 in cones of young mice (PMID 26000488, summarized in PMID 35650675), adult mice (PMID: 36807640), macaque (PMID 30712875), and human (PMID 31075224). Thus, it was somewhat surprising that Davison et al reported the presence of low voltage activated (LVA) Cav3-like currents with amplitudes that were ~50% of that for the Cav1 current in mouse cones at -40 mV (PMID 35803735). Using similar pharmacological criteria as Davison et al, we did not find functional evidence for a LVA current in cones of wild-type (WT) mouse retina: the Ca2+ current in our recordings was suppressed by the Cav1 antagonist isradipine (Fig 3a) but minimally affected in the expected voltage range by the Cav3 antagonist ML218 (Fig 3b). In WT mouse, voltage clamp steps from -90 mV to more depolarized voltages failed to show a transient inward current at onset (Fig 2e), which is a hallmark of LVA calcium currents. In addition, by standard physiological and pharmacological critera, we could not identify LVA currents in cones of ground squirrel (Fig.3c,d) and macaque retina (Supp. Fig.S3). Our results argue against a significant role for LVA currents in mammalian cones.

      A problem that we discovered (as did Davison et al, their Fig.2C) was that Cav3 blockers (e.g., ML218 and Z944) have non-specific actions on the high voltage activated (HVA) Ca2+ current (presumably mediated by Cav1.4) in WT mouse cones. This is clearly shown in our Supp. figure S1a-b where ML218 causes a dose-dependent negative shift in the I-V relationship but also inhibition of current density in HEK293T cells transfected with Cav1.4. We are planning a second study to thoroughly characterize these actions of ML218 and Z944 on Cav1 channels as the results are important for understanding the actions of these drugs in cell-types with mixed populations of Cav1 and Cav3 channels.

      A second problem is that dihydropyridines (DHP) used in both our study and that of Davison et al (e.g., isradipine, nifedipine) incompletely and slowly block Cav1 channels at negative membrane potentials (PMID: 12853422). Due to the slow kinetics of DHP block, Cav1 currents in the presence of such blockers can appear to inactivate rapidly (see Fig.6A in PMID 11487617). Thus, the Cav current recorded in the presence of DHP blockers in WT mouse cones may represent unblocked Cav1.4-mediated currents that appear rapidly inactivating, and therefore misconstrued as being mediated by Cav3 channels.

      Given the caveats of the pharmacological approach, we agree that stronger evidence is needed to rule out a small contribution of Cav3 channels in WT mouse cones. As mentioned in our text, we have found that currently available Cav3 antibodies produce similar patterns of immunofluorescence in WT and corresponding Cav3 KO retina so analysis at the level of Cav proteins is not possible. Thus, we are planning to compare the relative expression of Cav channel genes in cones using drop-seq experiments of G369i KI and WT mouse retina. We also plan to elaborate on our electrophysiological dissection of the HVA and LVA currents.

      Among the 3 Cav3 subtypes, Cav3.2 was the only one detected in mouse cones by Davison et al using nested RT-PCR (PMID 35803735). Thus, we obtained the Cav3.2 mouse strain from JAX (B6;129-Cacna1htm1Kcam/J) and generated a Cav3.2 KO/G369i KI double mutant mouse strain. If the Cav3 current that appears in the G369i KI cones is mediated by Cav3.2, then it should be undetectable in cones of the double mutant mice. Moreover, if these Cav3.2 channels contribute to the residual cone synaptic responses in G369i KI mice, then the double mutant mice should be deficient in this regard. We will test these predictions in patch clamp recordings and ERGs.

      Finally, we will conduct Ca2+ imaging experiments in cone terminals of the WT vs G369i KI mice to test whether increased coupling of Cav channels to intracellular Ca2+ release may be involved in cone synaptic responses of the G369i KI mice.

      Reviewer #1 (Public Review):

      Weaknesses:

      “The major criticism that I have of the study is that it infers Ca channel molecular composition based solely on pharmacological analysis, which, as the authors note, is confounded by the cross-reactivity of many of the "specific" channel-type antagonists. The authors note that Cav3 mRNAs have been found in cones, but here, they do not perform any analysis to examine Cav3 transcript expression after G369i-KI nor do they examine Ca channel transcript expression in monkey or squirrel cones, which serve as controls of sorts for the G369i-KI (i.e. like WT mouse cones, cones of these other species do not seem to exhibit LVA Ca currents).”

      Actually, we also used non-pharmacological (i.e., electrophysiological) criteria to back up our interpretation that Cav3 channels contribute to the Cav current in cones primarily in the absence of functional Cav1.4 channels. For example, in Fig.2, we show that the Ca2+ current in G369i KI and Cav1.4 KO mice exhibit the hallmarks of the Cav3 channel (negative activation and inactivation voltages and window current, rapid inactivation), which are quite distinct from the Ca2+ currents in WT cones. In recordings of ground squirrel and macaque cones (Supp.Figs.S2-3), negative holding voltages do not unmask a LVA current according to various criteria. In addition to the transcriptomic approaches described above, we plan to elaborate on the electrophysiological evidence for the absence of a LVA current in WT mouse cones as part of the revision.

      “Secondarily, in Maddox et al. 2020, the authors raise the possibility that G369i-KI, by virtue of having a functional voltage-sensing domain-might couple to intracellular Ca2+ stores, and it seems appropriate that this possibility be considered experimentally here.”

      We will conduct Ca2+ imaging experiments in cone terminals of the WT vs G369i KI mice to test whether increased coupling of Cav channels to intracellular Ca2+ release may be involved in cone synaptic responses of the G369i KI mice.

      “As a minor point: the authors might wish to note - in comparison to another retinal ribbon synapse-that Zhang et al. 2022 (in J. Neuroscience) performed a study of mouse rod bipolar cells found a number of LVA and HVA Ca conductances in addition to the typical L-type conductance mediated by Cav1-containing channels.”

      We are aware of the extensive evidence for the expression of Cav3 channels in retinal bipolar cells (PMID 11604141, 22909426, 19275782, 35896423) and our recordings of cone bipolar cells in ground squirrel confirm this (Supp. Fig.S2D). We could add reference to this work in our revision.

      Reviewer #2 (Public Review):

      Weaknesses:

      “The major critiques are related to the description of the Cav1.4 knock-in mouse as "sparing" function, which can be remedied in part by a simple rewrite, and in certain places, the data may need to be examined more critically. In particular, the authors should address features in the data presented in Figures 6 and 7 that seem to indicate that the retina of the Cav1.4 knock-in is not intact, but the interpretation given by the authors as "intact" is not appropriate and made without rigorous statistical testing.”

      We intended to use “sparing” and “intact” to indicate that cone synapses are present and to some extent functional, in contrast to their complete absence in the Cav1.4 KO mouse. However, we recognize this may be misinterpreted as “normal”. As suggested by the reviewer, we will revise our statistical analyses and text to clarify that cone synaptic responses do indeed differ significantly in G369i KI as compared to WT mice. We feel that this will be a strong addition to the study and will emphasize the key point that Cav3 cannot fully compensate for loss of Cav1.4 with respect to cone synapse structure and function.

      Reviewer #3 (Public Review):

      Weaknesses:

      “The study has been expertly performed but remains descriptive without deciphering the underlying molecular mechanisms of the observed phenomena, including the proposed homeostatic switch of synaptic calcium channels. Furthermore, a relevant part of the data in the present paper (presence of T-type calcium channels in cone photoreceptors) has already been identified/presented by previous studies of different groups (Macosko et al., 2015; pmid 26000488; Davison et al., 2021; pmid 35803735; Williams et al., 2022; pmid 35650675). The degree of novelty of the present paper thus appears limited.”

      We respectfully disagree that our paper lacks novelty. As indicated by Reviewer 2, a major advance of our study is in providing a mechanism that can explain the longstanding conundrum that congenital stationary night blindness type 2 mutations that would be expected to severely compromise Cav1.4 function do not produce complete blindness. We also disagree that the presence of T-type channels in cone photoreceptors has been unequivocally demonstrated, as the non-biased transcriptomic approaches show very little Cav3 transcript expression in mouse cones (PMIDs 26000488, 35650675, 36807640), macaque cones (PMID 30712875), and human cones (PMID 31075224). Transcription may not equate to translation, particularly at low expression levels. We also note that the one study to date that suggests a functional contribution of Cav3 channels in mouse cones (Davison et al., 2021; pmid 35803735) used a DHP to isolate the “LVA” current, which is problematic as described above. Our demonstration of minimal or undetectable Cav3-type currents in mammalian cones using physiological and pharmacological approaches, while a negative result, adds important context to the recent literature. As described in our response to the editor’s review, our planned revisions include testing whether Cav3 transcripts are upregulated in G369i KI cones and whether the Cav3.2 subtype suggested to be present in cones (PMID 35803735) contributes to Cav currents in these cells using Cav3.2 KO and Cav3.2 KO/G369i KI double mutant mice.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviews

      We would like to extend our thanks to the reviewers who took the time to carefully read our paper and provide thoughtful insights and suggestions on how to strengthen our conclusions. All reviewers agreed that our study presented strong data supporting a role for triglyceride lipase brummer (bmm) in regulating testis lipid droplets and spermatogenesis in Drosophila, and that our findings advance our understanding of lipid biology during sperm development. Reviewers made several helpful suggestions on how to strengthen our manuscript even further. Below, we outline how we revised our manuscript in response to reviewer comments to ensure we clearly communicate our data and conclusions with readers, and properly contextualize our findings.

      REVIEWER 1

      In this study, the authors investigate the role of triglycerides in spermatogenesis. This work is based on their previous study (PMID: 31961851) on triglyceride sex differences in which they showed that somatic testicular cells play a role in whole body triglyceride homeostasis. In the current study, they show that lipid droplets (LDs) are significantly higher in the stem and progenitor cell (pre-meiotic) zone of the adult testis than in the meiotic spermatocyte stages. The distribution of LDs anti-correlates with the expression of the triglyceride lipase Brummer (Bmm), which has higher expression in spermatocytes than early germline stages. Analysis of a bmm mutant (bmm[1]) - a P-element insertion that is likely a hypomorphic - and its revertant (bmm[rev]) as a control shows that bmm acts autonomously in the germline to regulate LDs. In particular, the number of LDs is significantly higher in spermatocytes from bmm[1] mutants than from bmm[rev] controls. Testes from males with global loss of bmm (bmm[1]) are shorter than controls and have fewer differentiated spermatids. The zone of bam expression, typically close to the niche/hub in WT, is now many cell diameters away from the hub in bmm[1] mutants. There is an increase in the number of GSCs in bmm[1] homozygotes, but this phenotype is probably due to the enlarged hub. However, clonal analyses of GSCs lacking bmm indicate that a greater percentage of the GSC pool is composed of bmm[1]-mutant clones than of bmm[rev]-clones. This suggests that loss of bmm could impart a competitive advantage to GSCs, but this is not explored in greater detail. Despite the increase in number of GSCs that are bmm[1]-mutant clones, there is a significant reduction in the number of bmm[1]-mutant spermatocyte and post-meiotic clones. This suggests that fewer bmm[1]-mutant germ cells differentiate than controls. To gain insights into triglyceride homeostasis in the absence of bmm, they perform mass spec-based lipidomic profiling. Analyses of these data support their model that triglycerides are the class of lipid most affected by loss of bmm, supporting their model that excess triglycerides are the cause of spermatogenetic defects in bmm[1]. Consistent with their model, a double mutant of bmm[1] and a diacylglycerol Oacyltransferase 1 called midway (mdy) reverts the bmm-mutant germline phenotypes.

      There are numerous strengths of this paper. First, the authors report rigorous measurements and statistical analyses throughout the study. Second, the authors ulize robust genetic analyses with loss-of-function mutants and lineage-specific knockdown. Third, they demonstrate the appropriate use of controls and markers. Fourth, they show rigorous lipidomic profiling. Lastly, their conclusions are appropriate for the results. In other words, they don't overstate the results.

      We thank the Reviewer for their positive assessment of our paper.

      There are a few weaknesses. Although the results support the germline autonomous role of bmm in spermatogenesis, one potential caveat that the mdy rescue was global, i.e., in both somatic and germline lineages. The authors did not recover somatic bmm clones, suggesting that bmm may be required for somatic stem self-renewal and/or niche residency. While this is beyond the scope of this paper, it is possible that somatic bmm does impact germline differentiation in a global bmm mutant.

      In the revised manuscript, we made several changes to address these points.

      1) We now clearly state when we used global versus germline-only loss of mdy to rescue bmm mutant phenotypes in the testis.

      “Notably, at least some of the effects of global loss of mdy on bmm1 males can be attributed to the germline:

      RNAi-mediated knockdown of mdy in the germline of bmm1 males partially rescued the defects in testis size (Figure 4I; Kruskal-Wallis rank sum test with Dunn’s multiple comparison test) and GSC variance (Figure S5J; p=4.5 x 10-5 and 8.2 x 10-3 by F-test from the GAL4- and UAS-only crosses, respectively).”

      “Importantly, testes isolated from males with global loss of both bmm and mdy (mdyQX25/k03902;bmm1) had fewer LD than testes dissected from bmm1 males (Figures 5D, S5I; one-way ANOVA with Tukey multiple comparison test).”

      2) We also discuss the possibility that somatic bmm may play a role in germline differentiation in a global bmm mutant, and present phenotypic data on somatic bmm1 clones.

      “We also reveal a potential non-cell-autonomous role for somatic bmm. While there was no difference in the ratio of Zd-1-positive cells between homozygous clones and heterozygous clones in animals carrying the bmm1 or bmmrev alleles at 14 days post clone induction (Figure S4O; Kruskal-Wallis rank sum test), the distance from the hub to the Zd-1 positive clones reside was significantly decreased in bmm1 homozygous clones (Figure S4P; Kruskal-Wallis rank sum test). Together, these data indicate bmm may play a cell-autonomous role in germline cells, and potentially a non-cell-autonomous role in somatic cells, to regulate spermatogenesis.”

      3) Finally, we clarify that we were unable to assess somatic LD. Specifically, this was a technical issue as the dye we use to visualize testis LD is incompatible with staining protocols to identify somatic cells. As a result, we were unable to count LD in somatic clones with confidence.

      “While we were unable to assess LD in bmm1 somatic clones, our data when taken together reveals a previously unrecognized cell-autonomous role for bmm as a regulator of testis LD in germline cells.”

      Regarding data presentation, I have a minor point about Fig. 3L: why aren't all data shown as box plots (only Day 14 bmm[rev] does).

      In our revised manuscript Figure 4L does present a boxplot across all genotypes and times; the appearance of ‘no boxes’ is simply due to the large number of datapoints with a value of zero, which compress the box near the X-axis.

      Finally, the authors provide a detailed pseudotime analysis of snRNA-seq of the testis in Fig. S2A-D, but this analysis is not sufficiently discussed in the text.

      In the revised manuscript we added text to describe our pseudotime analysis of single-cell RNA seq data in more detail.

      “Using pseudotime analysis, we arranged the germline (Figure S2A) and the somatic cells (Figure S2B) based on their annotated developmental trajectory. The expression pattern of bmm in the germline matched our observation with bmm-GFP reporter (Figure S2C). While levels of the bmm-GFP reporter were lower in somatic cells, single-cell RNA sequencing data identified bmm expression in the somatic lineage that was higher in cells at later stages of development (Figure S2D). Additional neutral lipid- and lipid droplet-associated genes such as lipid storage droplet-2, Seipin, Lipin, and midway also showed differential regulation during differentiation (Figure S2C, S2D). Combined with our data on the location of testis LD, these data suggest that bmm upregulation in both somatic and germline cells during differentiation corresponds to the downregulation of testis LD. Supporting this, germline GFP levels were negatively correlated with testis LD in bmm-GFP flies (Figure 2A, 2C), suggesting regions with higher bmm expression had fewer LD.”

      Overall, the many strengths of this paper outweigh the relatively minor weaknesses. The rigorously quantified results support the major aim that appropriate regulation of triglycerides are needed in a germline cell-autonomous manner for spermatogenesis.

      This paper should have a positive impact on the field. First and foremost, there is limited knowledge about the role of lipid metabolism in spermatogenesis. The lipidomic data will be useful to researchers in the field who study various lipid species. Going forward, it will be very interesting to determine what triglycerides regulate in germline biology. In other words, what functions/pathways/processes in germ cells are negatively impacted by elevated triglycerides. And as the authors point out in the discussion, it will be important to determine what regulates bmm expression such that bmm is higher in later stages of germline differentiation.

      We agree with the reviewer about the many interesting future directions for this project. We added a model figure in the revised manuscript to visualize our findings and highlight remaining questions about how bmm and triglycerides support normal spermatogenesis in Drosophila (Fig. 6).

      REVIEWER 2

      Summary:

      Here, the authors show that neutral lipids play a role in spermatogenesis. Neutral lipids are components of lipid droplets, which are known to maintain lipid homeostasis, and to be involved in non-gonadal differentiation, survival, and energy. Lipid droplets are present in the testis in mice and Drosophila, but not much is known about the role of lipid droplets during spermatogenesis. The authors show that lipid droplets are present in early differentiating germ cells, and absent in spermatocytes. They further show a cell autonomous role for the lipase brummer in regulating lipid droplets and, in turn, spermatogenesis in the Drosophila testis. The data presented show that a relationship between lipid metabolism and spermatogenesis is congruous in mammals and flies, supporting Drosophila spermatogenesis as an effective model to uncover the role lipid droplets play in the testis.

      We thank the Reviewer for their positive assessment of our paper.

      Strengths and weaknesses:

      The authors do a commendably thorough characterization of where lipid droplets are detected in normal testes: located in young somatic cells, and early differentiating germ cells. They use multiple control backgrounds in their analysis, including w[1118], Canton S, and Oregon R, which adds rigor to their interpretations. The authors employ markers that identify which lipid droplets are in somatic cells, and which are in germ cells. The authors use these markers to present measured distances of somatic and germ cell-derived lipid droplets from the hub. Because they can also measure the distance of somatic and germ cells with age-specific markers from the hub, these results allow the authors to correlate position of lipid droplets with the age of cells in which they are present. This analysis is clearly shown and well quantified.

      The quantification of lipid droplet distance from the hub is applied well in comparing brummer mutant testes to wild type controls. The authors measure the number of lipid droplets of specific diafteters, and the spatial distribution of lipid droplets as a function of distance from the hub. These measurements quantitatively support their findings that lipid droplets are present in an expanded population of cells further from the hub in brummer mutants. The authors further quantify lipid droplets in germline clones of specified ages; the quantitative analysis here is displayed clearly, and supports a cell autonomous role for brummer in regulating lipid droplets in spermatocytes.

      Data examining testis size and number of spermatids in brummer mutants clearly indicates the importance of regulating lipid droplets to spermatogenesis. The authors show beautiful images supported by rigorous quantification supporting their findings that brummer mutants have both smaller testes with fewer spermatids at both 29 and 25C. There is also significant data supporting defects in testis size for 14-day-old brummer mutant animals compared to controls. The comparison of number of spermatids at this age is not significant, which does not detract from the story but does not support sperm development defects specifically caused by brummer loss at 14 days. Their analysis clearly shows an expanded region beyond the testis apex that includes younger germ cells, supporting a role for lipid droplets influencing germ cell differentiation during spermatogenesis.

      We thank the reviewer for pointing out this inaccuracy in our manuscript. In the revised manuscript we chose more precise language to describe defects in 14-day-old bmm mutants:

      “Defects in testis size were also observed at 14-day post eclosion; suggesting testis size defects persist later into the life course (Figure S4C; Welch two-sample t-test). In contrast, the number of spermatid bundles per testis was not significantly different between bmm1 and bmmrev males at this age (Figure S4D; Welch two-sample ttest), potentially due to a large decrease in the number of spermatid bundles in 14-day-old bmmrev males (Figure 4C, S4D).”

      The authors present a series of data exploring a cell autonomous role for brummer in the germline, including clonal analysis and tissue specific manipulations. The clonal data indicating increased lipid droplets in spermatocyte clones, and a higher proportion of brummer mutant GSCs at the hub are convincing and supported by quantitation. The authors also show a tissue specific rescue of the brummer testis size phenotype by knocking down mdy specifically in germ cells, which is also supported by statistically significant quantitation. The authors present data examining the number of spermatocyte and post-meiotic clones 14 days aeer clonal induction. While data they present is significant with a 95% confidence interval and a p value of 0.0496, its significance is not as robust as other values reported in the study, and it is unclear how much information can be gained from that specific result.

      We thank the reviewer for raising this point. In the revised manuscript we displayed the p-value clearly in the text and on the figure to ensure our statistical output is clear for readers to evaluate our conclusions regarding bmm mutant clones 14 days after clone induction. We also state that the finding should be reproduced by others given that the statistical significance of this result was not as strong as our other data.

      “Because we observed significantly fewer bmm1 spermatocyte and spermatid clones at 14 days after clone induction (Figure 4K,4L; p = 0.0496, Kruskal-Wallis rank sum test), these effects on germline development may represent a cell-autonomous role in regulating spermatogenesis for bmm in this cell type. Given that the statistical significance of this finding was not as strong as for our other data, future studies should repeat this experiment with more samples.”

      The authors do a beautiful job of validating where they detect brummer-GFP by presenting their own pseudotime analysis of publicly available single cell RNA sequencing data. Their data is presented very clearly, and supports expression of brummer in older somatic and germline cells of the age when lipid droplets are normally not detected. The authors also present a thorough lipidomic analysis of animals lacking brummer to identify triglycerides as an important lipid droplet component regulating spermatogenesis.

      Impact:

      The authors present data supporting the broad significance of their findings across phyla. This data represents a key strength of this manuscript. The authors show that loss of a conserved triglyceride lipase impacts testis development and spermatogenesis, and that these impacts can be rescued by supplementing diet with medium chain triglycerides. The authors point out that these findings represent a biological similarity between Drosophila and mice, supporting the relevance of the Drosophila testis as a model for understanding the role of lipid droplets in spermatogenesis. The connection buttresses the relevance of these findings and this model to a broad scientific community.

      We thank the Reviewer very much for their positive assessment of our paper!

      REVIEWER 3

      In this manuscript, Chao et al seek to understand the role of brummer, a triglyceride lipase, in the Drosophila testis. They show that Brummer regulates lipid droplet degradation during differentiation of germ and somatic cells, and that this process is essential for normal development to progress. These findings are interesting and novel, and contribute to a growing realisation that lipid biology is important for differentiation.

      We thank the Reviewer for their positive comments about our manuscript.

      Major comments:

      1) The data in Figs 1 and 2, while helpful in setting the scene, do not add much to what was previously shown by the same group, namely that lipid droplets are present in both early germ cells and early somatic cells in the testis, and that Bmm regulates their degradation (PMID: 31961851). Measuring the distance of lipid droplets from the hub, while helpful in quantifying what is apparent, that only stem and early differentiated stages have lipid droplets, is not as informative as the way data are presented later (Fig. 2I), where droplets in specific stages are measured. Much of this could be condensed without much overall loss to the manuscript.

      We thank the reviewer for this comment. In our revised manuscript we edited the first part of the paper while still preserving the detailed characterization that builds upon our previous paper.

      2) It would be important to show images of the clones from which the data in Fig. 2I are generated. The main argument is that Bmm regulates lipid droplets in a cell autonomous manner; these data are the strongest argument in support of this and should be emphasised at the expense of full animal mutants (which could be moved to supplementary data).

      We thank the reviewer for this comment. In the revised manuscript we added a figure showing lipid droplets in control and bmm mutant spermatocyte clones in Fig. 3A, 3B with a quantification of this data in Figure 3C.

      Similarly, the title of Fig. S2 ("brummer regulates lipid droplets in a cell autonomous manner") should be changed as the figure has no experiments with cell (or cell-type)-specific knockdowns/mutants. This figure does show changes in lipid droplets in both lineages in bmm mutants, so an appropriate title could be "brummer regulates lipid droplets in both germ and soma".

      We thank the reviewer for this comment, we adjusted the Figure 2 legend title in the revised manuscript to “brummer regulates lipid droplets in both germline and somatic cells of the testis”.

      3) Interestingly, the clonal data show that bmm is dispensable in germ cells until spermatocyte stages, as no increase in lipid droplet number is seen until then. This should be more clearly stated, as it indicates that the important function of Bmm is to degrade lipid droplets at the transition from spermatogonial to spermatocyte stages. This is consistent with the phenotypes observed in which late stage germ cells are reduced or missing. However, the effect on niche retention of the mutant GSCs at the expense of neighbouring wildtype GSCs is hard to explain. Are lipid droplets in mutant GSCs larger than in control? Is there any discernible effect of bmm mutation on lipids in GSCs? Additionally, bam expression is delayed, suggesting that bmm may have roles on cell fate in earlier stages than its roles that can be detected on lipid droplets.

      We thank the reviewer for this comment. We included more text in the revised manuscript to clarify the key role bmm plays in regulating lipid droplets at the spermatogonia-spermatocyte transition.

      “Because we observed no significant effect of cell-autonomous bmm loss on LD at any other stage of germline development (Figure 3C), this suggests bmm function is not required to regulate LD at early stages of germ cell development. Instead, our data suggests bmm plays a role in regulating LD at the spermatogonia-spermatocyte transition.”

      We also added more detail to our description of how bmm affects lipid droplets in cells at the earliest stages of germline development.

      “Given that we detected no effect of cell-autonomous bmm loss on the number of GSC LD (Fig. 3C), more work will be needed to understand how bmm regulates GSC at a stage prior to its effects on LD number.”

      4) The bmm loss-of-function phenotype could be better described. Some of the data is glossed over with little description in the text (see for example the reference to Fig. 3A-C). For instance, in the discussion, the text states "loss of bmm delays germline differentiation leading to an accumulation of early-stage germ cells" (p13, l.25960). However, this accumulation has not been clearly shown, or at least described in the manuscript. Most of the data show a reduction (or almost complete absence) of differentiated cell types. This could indeed be due to delayed differentiation, or alternatively to a block in differentiation or to death of the differentiated cells. The clonal data presented show a decrease in the number of cells recovered, but do not allow inferences as to the timing of differentiation, making it hard to distinguish between the various possibilities for the lack of differentiated spermatids. Apart from data showing that GSCs are more likely to remain at the niche, no further data are shown to support the fact that mutant germ cells accumulate in early stages. While additional experiments could help resolve some of these issues, much of this could also be resolved by tempering the conclusions drawn in the text.

      We thank the reviewer for these comments. In the revised manuscript we temper our conclusions regarding bmm’s precise role in spermatogenesis by discussing different mechanisms (e.g. differentiation or death) that could lead to the phenotypes we observe.

      “This regulation is important for sperm development, as our data indicates that loss of bmm causes a decrease in the number of differentiated cell types. This reduction in differentiated cell types may be attributed to a delay in differentiation, a block in differentiation, or to a loss of differentiated cells through cell death. Future studies will therefore be essential to resolve why bmm loss causes a reduction in differentiated cell types.”

      5) In the discussion (p.14, l-273 onwards), the authors suggest that products of triglyceride breakdown are important for spermatogenesis. However, an alternative interpretation of the results presented here (especially those using the midway mutant) could be that triglycerides impede normal differentiation directly. Indeed, preventing the cells' ability to produce triglycerides in the first place can rescue many of the defects observed. A better discussion of these results with a model for the function of triglycerides and their by-products would be a great improvement to this manuscript.

      We thank the reviewer for this comment. To ensure our data is clearly communicated with readers, we added a model to the paper suggesting how triglyceride and its by-products influence spermatogenesis (Fig. 6) and text to clarify that triglyceride could potentially impeded differentiation.

      “It will also be important to determine whether it is the loss of metabolites produced by bmm’s enzymatic action, or an increase in triglycerides, that leads to the reduction in differentiated cell types during spermatogenesis. Together, these experiments will provide critical insight into how triglyceride stored within testis LD contributes to overall cellular lipid metabolism during spermatogenesis.”

      Together, these changes will strengthen our overall finding that bmm-mediated regulation of testis triglyceride is important for normal sperm development. Because our findings in flies align with and extend data from rodent models, the developmental mechanisms we uncovered about how triglyceride lipase bmm regulates testis lipid droplets and sperm development will likely operate in other species.  

      Reviewer #1 (Recommendations For The Authors):

      I have a minor concern about methodology: how were spermatocytes identified? I ask because data in Figure 3 indicate that there is a significant delay in germline differentiation in the bmm[1] mutant, with relatively smaller germ cells throughout the apical half of the testis. Typical large spermatocyte-like cells are not clearly obvious to me in Fig. 3.

      We thank the Reviewer for suggesting we add more clarity to how we identified spermatocytes. We state in the revised manuscript how we identify spermatocytes:

      “Cells in the testis region occupied by primary spermatocytes were identified by their large cell size and decondensed chromosome staining occupying three nuclear domains [120].”

      Also, we note that while it is difficult to see where the bmm1 testis have spermatocytes in Fig. 4E, this is due to the large number of early-stage cells in this close-up image. The spermatocytes can be more easily seen in Fig. 4I and 4I’ when the whole testis is included in the image.    

      Reviewer #2 (Recommendations For The Authors):

      • Lines 197-198 mention "Boule-positive area," "individualization complexes," and "waste bags." It would be helpful to the reader to explain what these measurements are to help contextualize the data shown related to these statements.

      We thank the Reviewer for this comment. We added the following text to the revised manuscript:

      “Because Boule-positive area, individualization complexes, and waste bags are all markers for later stages in sperm development, these data indicate the loss of bmm causes a reduction in differentiated cell types.”

      • Line 162 states a defect in sperm development observed in 14-day-old bmm[1] males, but the data presented in Figure S3D does not show a significant difference. The words "sperm development" should be removed from this sentence.

      We thank the Reviewer for pointing out this inaccurate statement. We fixed the statement as follows in the revised manuscript:

      “Defects in testis size were also observed at 14-day post eclosion; suggesting testis size defects persist later into the life course (Figure S4C; Welch two-sample t-test). In contrast, the number of spermatid bundles per testis was not significantly different between bmm1 and bmmrev males at this age (Figure S4D; Welch two-sample ttest), potentially due to a large decrease in the number of spermatid bundles in 14-day-old bmmrev males (Figure 4C, S4D).”

      • Line 294 has a typo: "regulating" should likely be "regulated"

      We thank the Reviewer for pointing out this mistake, which we corrected.

      • Line 456 should include the length of time for heat shock

      We thank the Reviewer for pointing out this omission. We now include these details:

      “Adult males were collected at 3-5 days post-eclosion and heat-shocked three times at 37°C for 30 min followed by a 10 min rest period at room temperature between heat shocks.”

      • Methods section beginning on Line 442 might include an explanation of how hub area was quantified.

      We thank the Reviewer for this suggestion. We now include the following information:

      “Hub size was measured by quantifying FasIII-positive area of the testis.”

      • Figure 1 legend could benefit from adding a statement on how spermatocytes (arrowheads) were identified

      We thank the Reviewer for this suggestion, we now refer the reader to the more detailed description in the methods section.

      • Figure 2A should present the merged panel in A' first. The legend states that Panel A shows Lipid Droplets, but LipidTox is not shown until A'.

      We thank the Reviewer for this suggestion, we now clarify that the text refers to panels A-A''''.

      • Figure 2I would benefit from a key, to emphasize that these are individual cell clones, highlighting the idea of cell autonomous effects of bmm in the spermatocytes. Showing example images of spermatocyte clones with increased lipid droplets could also emphasize this result. The legend for this panel should note the statistical test done to confirm significance in the SC result.

      We agree with the Reviewer and have added images of the LD in bmm1 spermatocyte clones in Figure 3B, and the quantification in Figure 3C. We explicitly state the significance of this result and the statistical test in Figure 3 legend.

      • In Figure 3, the cell autonomous data clearly indicates that there are higher proportions of bmm mutant GSCs occupying the hub compared to control GSCs. It could be worth stating whether this observation indicates an increased ability of bmm mutant GSCs to compete for occupying space at the hub.

      We thank the Reviewer for pointing out this potential implication of our data, which we acknowledge in the revised version of our manuscript:

      “Future studies will also need to confirm whether bmm1 mutant GSCs show an increased ability to occupy space at the hub.”

      • In Figure 4, I suggest changing the title of Panel B to "Proportion of significant species in each lipid class" for clarity.

      We made this change in the Figure 5 legend (Figure 5 is the corresponding figure in the revised manuscript).

      • It could be valuable to quantify the number of spermatids in the germline specific mdy knockdown, which would lend additional support to a cell autonomous requirement for bmm in spermatogenesis

      We added a sentence to the revised manuscript recognizing that this is an interesting experiment for studies on the role of germline triglyceride in promoting spermatogenesis.

      “While future studies will need to test whether germline-specific loss of mdy also rescues spermatid number defects in bmm1 males, our data suggest bmm-mediated regulation of testis triglyceride plays a previously unrecognized role in regulating sperm development.”

      Reviewer #3 (Recommendations For The Authors):

      1) bmm-GFP does not show expression in somatic cells yet previous work by the same group has shown a requirement for bmm in the testis soma using C587-Gal4.

      We thank the Reviewer for raising this issue. While the reporter shows low GFP expression in the somatic cells, the single-cell RNA sequencing data we analyze suggests bmm is expressed in these cells. We address this issue in the revised manuscript as follows:

      “While levels of the bmm-GFP reporter were lower in somatic cells, single-cell RNA sequencing data identified bmm expression in the somatic lineage that was higher in cells at later stages of development (Figure S2D).”

      2) p.11 l.200-202 "Because we recovered fewer bmm1 spermatocyte and spermatid clones 14 days after clone induction (Figure 3K,3L; Kruskal-Wallis rank sum test), this effect on germline development represents a cell-autonomous role for bmm." This sentence should be rephrased as the phenotype could be a combination of autonomous roles within the germline and non-autonomous roles in supporting cyst cells.

      “We also reveal a potential non cell-autonomous role for somatic bmm. While there was no difference in the ratio of Zd-1-positive cells between homozygous clones and heterozygous clones in animals carrying the bmm1 or bmmrev alleles at 14 days post clone induction (Figure S4O; Kruskal-Wallis rank sum test), the distance from the hub to the Zd-1 positive clones reside was significantly decreased in bmm1 homozygous clones (Figure S4P; Kruskal-Wallis rank sum test). Together, these data indicate bmm may play a cell-autonomous role in germline cells, and potentially a non-cell-autonomous role in somatic cells, to regulate spermatogenesis.”

      3) The labelling in Fig. 3 is confusing - presumably the graph in 3C refers to spermatid bundles [this comment applies to other figures showing spermatid bundle numbers], not individual spermatids, while the graph in 3G refers to the proportion of the total GSC pool that is contained within the clone. The data in Fig. 3C are not described in the main text.

      We adjusted the confusing labelling to ‘spermatid bundles’ from ‘number of spermatids’, as suggested. We also changed the title of panel Fig. 3G (now 4G) as suggested and men5oned Fig. 3C (now Fig. 4C) in the text.

      4) On p.9, comments are speculative or seek to draw comparisons with the broader literature and would seem to belong more to the discussion (eg "our data suggests flies are a good model to study how bmm/ATGL influences sperm development" - also there is a typo, it should be "suggest").

      We thank the Reviewer for raising concern about our speculative statement; we changed the text as follows in the revised manuscript:

      “This identifies similarities between flies and mice in fertility-related phenotypes associated with whole-body loss of bmm/ATGL.”

      5) The length of the heat shocks used for clone induction should be specified in the methods (rather than just the period in between heat shocks).

      We now include more information on clone induction:

      “Adult males were collected at 3-5 days post-eclosion and heat-shocked three times at 37°C for 30 min followed by a 10 min rest period at room temperature between heat shocks. Amer heat-shock, the flies were incubated at room temperature until dissection.”

      6) p.8 l.132 "bmm-GFP accurately reproduces changes to bmm mRNA levels". This sentence should be rephrased.

      We thank the Reviewer for this comment and rephrased the sentence:

      “We first examined bmm expression in the testis by isolating this organ from flies carrying a bmm promoter driven GFP transgene (bmm-GFP) that recapitulates many aspects of bmm mRNA regulation [77].”

      7) p.9 l.172 "we used germline-specific marker" should read "we used an antibody against the germline-specific marker".

      We corrected this inaccurate statement in our revised manuscript.

      8) p.10 several lines, "GSC" should be "GSCs".

      We corrected this inaccurate use of GSC in our revised manuscript.

      9) p.13 l.247 should read "variance in GSC numbers".

      Thank you, this error was fixed.

    1. Author Response

      We thank the editors and the reviewers for their assessment of our revised manuscript. Please see bellow, our answers to the recommendations by reviewer #2.

      Figure S2F - Seems like a very narrow range of parameters. Is there some fine tuning here?

      The range of values of tau_P that yields previous-trial biases is bounded by below and above for the following reasons: above a certain value of tau_P (therefore large integration time), the bump that had formed in the previous trial is not strong enough to remain stable for a long time, and therefore dissipates by the time the current trial starts (especially when adaptation is fast, towards the left of the third panel). Below a certain value, instead, this integration timescale is small enough to quickly form a representation of the current trial, hence the bump from the previous trial quickly dissipates (due to mutual inhibition). This interplay between the integration and the adaptation timescale as well as considering a phenomenon which is bounded in time (how close the activity bump is to the second stimulus of the previous trial which is presented between -22.4 and -5.6 seconds from the moment we are considering) yields a region for tau_P which is bounded. This region, however, appears narrow due to the limited number of points we have considered for the simulation grid.

      Regarding my comment on lapse at the boundaries (old line 221). Lapse parameters in psychometric curves correspond to errors on the "easy" trials. But the mechanistic explanation for lapse trials is that there is a non-zero probability for the subject to respond in a manner that is random and independent of the stimulus. In the case of extreme stimuli, this is the only reason for errors, and thus looking at the edges of the psychometric curves allows to calculate lapse rate. But - the usual assumption for underlying mechanism is that the subject lapses in all trials, regardless of stimulus. If I understand correctly, this is different than the mechanistic reason for lapses in the network model, which was described as something that happens more in the edges than in the center. Or more generally, to be a stimulus-dependent effect.

      We thank the reviewer for this clarification. The reviewer is right that in our mechanistic model, lapses (as defined by errors on easy trials) are more likely to occur for extreme stimuli, due to the vicinity to the boundary of the attractor. Such errors also occur for non-extreme stimuli, when delay intervals are long enough for the bump in PPC to drift to the boundaries. In experiments, lapse trials as described by the reviewer occur due to multiple different reasons; for lapse that is independent of the stimuli, mechanisms such as attention have been thought to play a role, this however is not included in our model.

      What are the parameters for the distributions (skewed, bimodal, ...)?

      These parameters are reported in the legend of Fig.6, where the distributions appear.

      Bump with adaptation. Sorry for the draft-like comment. I don't think the existing studies are in the form you describe. I do think it might be useful to point readers to these studies. If an interested reader wishes to understand network dynamics in this and similar scenarios, it might be useful to have the pointers. The reference I had in mind was Romani, S., & Tsodyks, M. (2015). Short‐term plasticity based network model of place cells dynamics. Hippocampus, 25(1), 94-105.

      We thank the reviewer for the clarification, and we will include this reference in the Version of Record.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study about the mechanisms underlying our capacity to represent and hold recent events in our memory and how they are influenced by past experiences. A key aspect of the model put forward here is the presence of discrete jumps in neural activity with the posterior parietal region of the cortex. The strength of evidence is largely solid, with some weaknesses noted in the methodology. Both reviewers suggested ways in which this aspect of the model can to be tested further and resolve conflicts with previously published experimental results, in particular the study by Papadimitriou et al 2014 in Journal of Neurophysiology.

      We thank the editors for their assessment. As mentioned in the cover letter, we have addressed all the reviewers’ concerns and would like to request and update of the assessment to reflect the revisions we have made.

      Public Reviews:

      We thank both reviewers for their careful reading and feedback that helped clarify many aspects of the model. Below, we address their comments.

      Reviewer #1 (Public Review):

      This paper aims to explain recent experimental results that showed deactivating the PPC in rats reduced both the contraction bias and the recent history bias during working memory tasks. The authors propose a twocomponent attractor model, with a slow PPC area and a faster WM area (perhaps mPFC, but unspecified). Crucially, the PPC memory has slow adaptation that causes it to eventually decay and then suddenly jump to the value of the last stimulus. These discrete jumps lead to an effective sampling of the distribution of stimuli, as opposed to a gradual drift towards the mean that was proposed by other models. Because these jumps are single-trial events, and behavior on single events is binary, various statistical measures are proposed to support this model. To facilitate this comparison, the authors derive a simple probabilistic model that is consistent with both the mechanistic model and behavioral data from humans and rats. The authors show data consistent with model predictions: longer interstimulus intervals (ISIs) increase biases due to a longer effect over the WM, while longer intertrial intervals (ITIs) reduce biases. Finally, they perform new experiments using skewed or bimodal stimulus distributions, in which the new model better fits the data compared to Bayesian models.

      The mechanistic proposed model is simple and elegant, and it captures both biases that were previously observed in behavior, and how these are affected by the ISI and ITI (as explained above). Their findings help rethink whether our understanding of contraction bias is correct.

      On the other hand, the main proposal - discrete jumps in PPC - is only indirectly verified.

      We agree with the reviewer that the evidence for discrete jumps in PPC has been provided in behavioural results (short-term, n-back trial biases), and not from neural data. However, we believe electrophysiological investigations are out of the scope of the current manuscript and future works are needed to further verify the results.

      The model predicts a systematic change in bias with inter-trial-interval. Unless I missed it, this is not shown in the experimental data. Perhaps the self-paced nature of the experiments allows to test this?

      We thank the reviewer for this great suggestion.

      We had not previously looked at this in the data for the reason that in the simulations, the ITI is set to either 2.2, 6 or 11 seconds, whereas the experiment is self-paced. Therefore, any comparison with the simulation should be made carefully.

      However, after the reviewer’s suggestion, we did look at the change in the bias with the inter-trial interval, by dividing trials according to ITIs lower than 3 seconds (“short” ITI), and higher than 3 seconds (“long” ITI). This choice was motivated by the shape of the distribution of ITIs, which is bimodal, with a peak around 1 second, and another after 3 seconds (new Fig 8F). Hence, we chose 3 seconds as it seemed a natural division. However, 3 seconds also happens to be approximately the 75th percentile of the distribution, and this means that there is much more data in the “short” ITI than the “long” ITI set. In order to have sufficient data in the “long” ITI for clearer effects we used all of our dataset – the negatively skewed, and also two bimodal distributions (of which only one was shown in the manuscript, for succinctness). This larger dataset allows us to clearly see not only a decreasing contraction bias with increasing ITI (Fig 8G), but also a decreasing onetrial-back attractive bias with increasing ITI (Fig 8H). We have uploaded all the datasets as well as scripts used to analyze them to this repository: https://github.com/vboboeva/ParametricWorkingMemory_Data.

      The data in some of the figures in the paper are hard to read. For instance, Figure 3B might be easier to understand if only the first 20 trials or so are shown with larger spacing. Likewise, Figure 5C contains many overlapping curves that are hard to make out.

      We have limited the dynamics in Fig 3B to the first 50 trials for better visibility. Likewise, as suggested, we report the standard error of the mean instead of the standard deviation in old Fig 5C (new Fig 6C) – this allows for the different curves to be better discernible.

      There is a gap between the values of tau_PPC and tau_WM. First - is this consistent with reports of slower timescales in PFC compared to other areas?

      Recent studies by Xiao-Jing Wang and colleagues (Refs. 1-3 below) suggest that may be the case. In Wang et al 2023, Ref 1 below), the authors use a generative model to study the concept of bifurcation in space in working memory, that is accompanied by an inverted-V shape of the time constants as a function of cortical hierarchy.

      Briefly, they propose a generative model of the cortex with modularity, incorporating repeats of a canonical local circuit connected via long-range connections. In particular, the authors define a hierarchy for each local circuit. At a critical point in this hierarchy axis, there is a phase transition from monostability to bistability in the firing rate. This means that a local circuit situated below the critical point will only display a low activity steady state, while those above the critical point additionally display a persistent activity steady state.

      The model predicts a critical slowing down of the neural fluctuations at the critical point, resulting in an inverted-V shape of the time constants as a function of the hierarchy. They test the predictions of their model – the bifurcation in space and that inverted-V-shaped time constants as a function of the hierarchy - on connectome-based models of the macaque and mouse cortex. Interestingly both datasets show similar behavior. In particular, during working memory, frontal areas (higher in the hierarchy, e.g. area 24c in macaques) has a smaller time constant relative to posterior parietal areas (lower in the hierarchy, like LIP or f7). We have now cited this new work.

      [1] https://www.biorxiv.org/content/10.1101/2023.06.04.543639v1

      [2] https://elifesciences.org/articles/72136

      [3] https://www.biorxiv.org/content/10.1101/2022.12.05.519094v3.abstract

      Second - is it important for the model, or is it mostly the adaptation timescale in PPC that matters?

      We have run simulations producing a phase diagram with tau_theta^P on the x-axis, tau^P on the y-axis, and in color, the fraction of trials in which the bump is in the vicinity of a target (Fig S2 F), before the network is presented with the second stimulus. This target can be the first stimulus s_1 (left), mean over stimuli (middle) and previous trial’s stimulus (right)). White point corresponds to parameters of the default network.

      In this phase diagram, the lowest value that tau_P takes is tau_WM=0.01. When tau_P=tau_WM, the bump is rarely in the vicinity of 1-trial-back stimulus, and we can see that tau_PPC should be greater than tau_WM in order for the model to yield 1-trial back effects. We conclude that it is indeed important for tau_PPC > tau_WM.

      We have included this in Fig S2 F of the manuscript.

      Regarding the relation to other models, the model by Hachen et al (Ref 45) also has two interacting memory systems. It could be useful to better state the connection, if it exists.

      The model proposed by Hachen et al is conceptually different in that one module stores the mean of the sensory stimulus; it could be related to a variant of our model where adaptation is turned off in the PPC network (Fig S2 A). However, the task they model is also different: subjects have to learn the location of a boundary according to which the stimulus is classified as ‘weak’ or ‘strong’, set by the experimenter. Hence, it is a task where learning is needed - this contrasts with the task we are modelling, where only working memory is required. How task demands reconfigure existing circuits via dynamics and/or learning to perform different computations is a fascinating area of research that is outside the scope of this work.

      Reviewer #2 (Public Review):

      Working memory is not error free. Behavioral reports of items held in working memory display several types of bias, including contraction bias and serial dependence. Recent work from Akrami and colleagues demonstrates that inactivating rodent PPC reduces both forms of bias, raising the possibility of a common cause.

      In the present study, Boboeva, Pezzotta, Clopath, and Akrami introduce circuit and descriptive variants of a model in which the contents of working memory can be replaced by previously remembered items. This volatility manifests as contraction bias and serial dependence in simulated behavior, parsimoniously explaining both sources of bias. The authors validate their model by showing that it can recapitulate previously published and novel behavioral results in rodents and neurotypical and atypical humans.

      Both the modeling and the experimental work is rigorous, providing compelling evidence that a model of working memory in which reports sometimes sample past experience can produce both contraction bias and serial dependence, and that this model is consistent with behavioral observations across rodents and humans in the parametric working memory (PWM) task.

      Evidence for the model advanced by the authors, however, remains incomplete. The model makes several bold predictions about behavior and neural activity, untested here, that either conflict with previous findings or have yet to be reported but are necessary to appropriately constrain the model.

      First, in the most general (descriptive) formulation of the Boboeva et al. model, on a fraction of trials items in working memory are replaced by items observed on previous trials. In delayed estimation paradigms, which allow a more direct behavioral readout of memory items on a trial-by-trial basis than the PWM task considered here, reports should therefore be locked to previous items on a fraction of trials rather than display a small but consistent bias towards previous items. However, the latter has been reported (e.g., in primate spatial working memory, Papadimitriou et al., J Neurophysiol 2014). The ready availability of delayed estimation datasets online (e.g., from Rademaker and colleagues, https://osf.io/jmkc9/) will facilitate in-depth investigation and reconciliation of this issue.

      As pointed out by the reviewer, in the PWM task that we are modelling here, the activity in the network is used to make a binary decision. However, it is possible to directly analyse the network activity before the onset of the second stimulus.

      In their manuscript, Papadimitriou et al. study a memory-guided saccade task in nonhuman primates and argue that the animals display a small but consistent bias towards previous items (Fig 2). In that figure, the authors compute the error as the difference between the saccade direction and target direction in each trial. They compute this error for all trials in which the preceding trial’s target direction is between 35° and 85° relative to the current trial (counterclockwise with respect to the current trial’s target). They discover that the residual error distribution is unimodal with a mode at 1.29° and a mean at 2.21° (positive, so towards the preceding target’s direction), from which they deduce a small but systematic bias towards previous trial targets.

      We have computed a similar measure for our network with default parameters (Table 1), by subtracting the location of the bump at the end of the delay interval (s_hat(t), ‘saccade’) from the initial location of the first stimulus in the current trial (s1(t) or the ‘target’). We have done this for all trials where s1(t)=0.2, and where s2(t-1) takes specific values. These distributions are characterized by two modes. The first corresponds to those trials where the bump is not displaced in WM (i.e. mean of zero). We can also see the appearance of a second mode at the location of s1(t) - s2(t-1), corresponding to the displacements towards the preceding trial’s stimulus described in the main text. If, instead, we limit the analysis to a small range of previous trials close to s1(t) (similar to Papadimitriou et al) then the distribution of residual errors will appear unimodal, as the two modes merge. Importantly, note that there is a large variability around the second mode, expressing a more complex dynamics in the network. As can be seen in Fig 3B, the location of the bump is not always slaved to the one in the PPC in a straightforward way -- due to the adaptation in the PPC, the global inhibition in the connectivity kernel, as well as interleaved design for various delay intervals, the WM bump can be displaced in nontrivial ways (see also Recommendation no 4), yielding the dispersion around the second peak. It remains to be seen whether such patterns can be observed in the data from previous works on continuous working memory recall (including Papadimitriou et al). However, to our knowledge, such detailed and full analysis of errors at the level of individual trials has not been done.

      In summary, this analysis shows that the type of dynamics in our network is not one of the two cases: 1) small and systematic bias in each and every trial or 2) large error that occurs only rarely; rather, the dispersion around both modes suggests that the dynamics in our model are a mixture of these two limit cases.

      We have also performed another typical analysis, reported in several continuous recall tasks (e.g. Jazayeri and Shadlen 2010) where contraction bias has been reported. We plot WM bump locations after the delay period for every trial (s_hat(t)), and their averages, against the nominal value of s1(t). We see that the mean WM location deviates from the identity line toward the mean values of s1(t), again showing contraction bias as an average effect, while individual trials follow the dynamics explained above.

      We have now included a new section on continuous recall (Sect. 1.5 and a new figure (Fig 5)), which details the two above-mentioned analyses. The analysis of freely available datasets of delayed estimation tasks, unfortunately, is out of the scope of this work, and we leave such analyses to future studies.

      Second, the bulk of the modeling efforts presented here are devoted to a circuit-level description of how putative posterior parietal cortex (PPC) and working-memory (WM) related networks may interact to produce such volatility and biases in memory. This effort is extremely useful because it allows the model to be constrained by neural observations and manipulations in addition to behavior, and the authors begin this line of inquiry here (by showing that the circuit model can account for effects of optogenetic inactivation of rodent PPC).

      Further experiments, particularly electrophysiology in PPC and WM-related areas, will allow further validation of the circuit model. For example, the model makes the strong prediction that WM-related activity should display 'jumps' to states reflecting previously presented items on some trials. This hypothesis is readily testable using modern high-density recording techniques and single-trial analyses.

      As mentioned in response to the previous comment, we note again that in the WM network, the bump ‘displacement’ has a complex dynamics -- the examples we have provided in Fig 1A and 2B mainly show the cases in which jumps occur in the WM network, but this is not the only type of dynamics we observe in the model. We do have instances in which the continuity of the model causes drift across values, and we have now replaced the right panel in Fig 2B with one such instance, in order to emphasize that this displacement towards the previous trial’s stimulus (s2(t-1)) can occur in various ways. For a more thorough analysis, we have analyzed the distance between s1(t) and the position of the bump in the WM network at the end of the delay period s_hat(t), conditioned on specific values of s1(t) and s2(t-1) (Fig 5C). In this figure, we can see the appearance of two modes: one centered around 0, corresponding to the correct trials where the stimulus is kept in WM (s1(t) = s_hat(t)), and another mode centered around s2(t-1), the location of the second stimulus of the previous trial, where the bump is displaced. Note, as we explain in Sect. 1.5, the large dispersion around this second mode, which suggests that the bump is not always displaced to that specific location and may undergo drift.

      We agree with the reviewer that future electrophysiological experiments (or analysis of existing datasets) are necessary for validation of these results.

      Finally, while there has been a refreshing movement away from an overreliance on p-values in recent years (e.g., Amrhein et al., PeerJ 2017), hypothesis testing, when used appropriately, provides the reader with useful information about the amount of variability in experimental datasets. While the excellent visualizations and apparently strong effect sizes in the paper mitigate the need for p-values to an extent, the paucity of statistical analysis does impede interpretation of a number of panels in the paper (e.g., the results for the negatively skewed distribution in 5D, the reliability of the attractive effects in 6a/b for 2- and 3- trials back).

      We share the reviewer’s criticism towards the misuse of p-values – in order for a clearer interpretation of old Fig 5D (new Fig 7E), we have looked at the 2 and 3 trials-back biases by using all of our dataset – the negatively skewed, and also two bimodal distributions (of which only one was shown in the manuscript). This larger dataset of 43 subjects (approximately 17,200 trials) allows us to clearly see the 2 and 3 trial back attractive biases, and the effect that the delay interval exerts on them.

      Reviewer #1 (Recommendations For The Authors):

      Fig 5 A&C - It might be beneficial to separate the distribution of stimuli from the performance. It is hard to read the details of the performance, especially with error bars.

      Following the next recommendation, we have exchanged the standard deviation to standard errors of the mean, hopefully this allows to better read the performance.

      Fig 5C. The number of participants should be written. Perhaps standard errors instead of standard deviation?

      We have now changed the standard deviation to standard errors of the mean and included the number of participants in the figure.

      Fig 2B - hard to understand, because there is no marking of where "perfect" memory of s1 would be.

      The perfect memory of s1 is shown in the upper panel as black bars.

      Fig 3B. dot number 9 (blue, around 0.7) - why is WM higher than stimulus?

      This trial has a long ISI (blue means 10s). During this delay, the bump in the PPC, under the influence of adaptation, drifts far below the first stimulus (note that the previous trial also had its first stimulus in the same location, as a result of which the adaptative thresholds have built up significantly, causing the bump to move away from that location). During this delay period, neurons in the WM network receive inputs from the PPC network: if this input is strong enough, it can disrupt an existing bump; if not, this input still exerts inhibiting influence on the existing bump via the global inhibition in the connectivity. This can cause an existing bump to slowly drift in a random direction, and finally dissipate. Note that the lines in Fig 2B represent the neuron with the maximal activity, this activity may be a stable bump, or an unstable bump that may soon dissipate.

      Other examples with similar dynamics include trials 43 and 54.

      L167 fewer -> smaller

      We have now corrected this.

      Fig 3C - bump can also be in between. Is this binned?

      We have not binned the length of the attractor; to produce that figure, we check whether the position of the neuron with the maximal firing rate is within a distance of ±5% of the length of the whole line attractor from the target location.

      L221 Lapse at the boundary of attractor. This seems very different from behavior. Specifically, if it is in the boundaries, it should be stimulus dependent.

      Very sorry, we did not manage to understand the reviewer’s comment.

      L236 are -> is

      We have now corrected this.

      Fig S4 - should be mostly in main text.

      Part of this figure is in Fig 6A, but given the amount of detail, we think Supplementary Material is better suited.

      L253-254. Differences across all distributions - very minor except the bimodal case.

      That is correct, this is why we conducted the experiment with the bimodal distribution, to better differentiate the predictions of the two models.

      L273 extra comma after "This probability"

      We have now corrected this.

      ITI was only introduced in section 1.5.2. Perhaps worth mentioning the default 5s value earlier in the paper.

      We have now mentioned this in line 97-98.

      Fig S6B title: perhaps "previous stimuli"?

      We have now corrected this.

      L364 i"n A given trial"

      Equation 2 - no decay term?

      Thank you for pointing out this error, we have now corrected this.

      Equation 5,6 are j^W and j^P indices of neurons in those populations?

      Yes, j^W indexes neurons in the WM network, and j^P those in the PPC. We have now added this in the text for clarity.

      Bump with adaptation - other REFs? Sandro?

      We are aware of continuous bump attractors implementing short-term synaptic plasticity in various studies (including by Sandro Romani), but not in the form we have described. May the reviewer kindly point us towards the relevant literature.

      Free boundary - what is the connectivity for neurons 1 and N? Is it weaker than others? Is the integral still 1? Does this induce some bias on the extreme values?

      The connectivity of the network is all-to-all. However, as expressed by Eq. (3), the distance-dependent contribution to the weights, K, decreases exponentially as we move from neuron 1 onwards, and from neuron N down. The sum (or integral, in the large-N limit) of the K_ij for j on either side of neuron i is unity only when i is sufficiently far from 1 or N. We have rephrased the paragraph starting in line 516 to make this clearer.

      The presence of a boundary could introduce a bias in theory, but in practice, it affects the dynamics only when the bump drifts sufficiently close to it. The smallest stimulus in the simulated task has amplitude 0.2, with width 0.05, which implies the activation of 50 neurons on either side of neuron 400. If one compares this with the width of the kernel K in stimulus space (d_0 = 0.02), which spans ~10 neurons, we can see that the bump of activity stays mostly far from the boundary. It is possible, though it is observed rarely, when several consecutive long delay intervals happen to occur, that the bump in PPC drifts beyond the location corresponding to either the minimum or maximum stimulus.

      Code availability?

      Code simulating the dynamics of the network as well as analysing the resulting data can be found in the following repository: https://github.com/vboboeva/ParametricWorkingMemory Code used to analyse human behavioural data and fit them with our statistical model can be found in this repository: https://github.com/vboboeva/ParametricWorkingMemory_Data Code used to run the auditory PWM experiments with human subjects (adapted from Akrami et al 2018) can be found here: https://github.com/vboboeva/Auditory_PWM_human

      L547 stimuli

      We have now corrected this.

      Equation 14 uses both stimuli. Was this the same for the rest of analysis in the paper (first figures for instance)?

      This equation was used for all GLM analyses (Figs 9 and S6).

      D0 is very small (0.02). Does this mean that activity is essentially discrete in the model? Fig 1A & 2B - the two examples of model activity suggest this is the case. In other words - are there cases where the continuity of the model causes drift across values? Can you show an example (similar to Fig 1A)?

      Since this point has been raised beforehand, we refer to the first comment, Fig 2B and Sect. 1.5 for the response to this question.

      Table 1 - inter trial interval 6. Text says 5

      We have now corrected this in the text.

      Reviewer #2 (Recommendations For The Authors):

      In addition to my review above, I just have a few minor comments:

      • If I understood correctly, the squares inside the purple rectangle in Figure 1B are meant to show a gradation from red to blue, but this was hard to make out in the pdf.

      Actually the squares are all on one side or the other of the diagonal, therefore they do not have any gradation.

      • line 164: "The resulting dynamics... [are]?"

      We have corrected this in the text.

      • Fig 7B legend: "The network performance is on average worse for longer ITIs" – correct?

      This was a mistake, we have replaced worse with better.

      Other comments

      We realized that the colorbar reported the incorrect fraction classified in Figs 1B, 2C, 7B (new 8B), S2C, S3A, S5B. We have corrected this in the new version of the manuscript.

      We also found a minor mistake in one of our analysis codes that computed the n-trial back biases for different delay intervals. This did not change our results, actually made the effects clearer. The figures concerned are Fig 3F and new Fig 7E.

    1. Author Response

      eLife assessment

      This study presents important findings for understanding cortical processing of color, binocular disparity, and naturalistic textures in the human visual cortex at the spatial scale of cortical layers and columns using state-of-the-art high-resolution fMRI methods at ultra-high magnetic field strength (7 T). Solid evidence supports an interesting layer-specific informational connectivity analysis to infer information flow across early visual areas for processing disparity and color signals. While the question of how the modularity of representation relates to cortical hierarchical processing is interesting and fundamental, the findings that texture does not map onto previously established columnar architecture in V2 is suggestive but would benefit from further controls. The successful application of high-resolution fMRI methods to study the functional organization along cortical columns and layers is relevant to a broad readership interested in general neuroscience.

      Thank you for your assessment of our manuscript "Mesoscale functional organization and connectivity of color, disparity, and naturalistic texture in human second visual area ". We have carefully considered the public reviews and have outlined our plans of revision by providing point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      To support the finding that texture is not represented in a modular fashion, additional possibilities must be considered. These include the effectiveness and specificity of the texture stimulus and control stimuli, (b) further analysis of possible structure in images that may have been missed, and (c) limitations of imaging resolution.

      Thank you for your suggestions. We will provide evidence and additional analyses to show that there was indeed a large difference in high-order statistical information between the texture and control stimuli in our study, and thus the contrast between the two stimuli should be effective in localizing the processing of high-order texture information. Compared to the previous studies, another reason for the weaker texture selectivity in the current study could be the smaller number of images used and the slower rate of image presentation. Although our fMRI result at 1-mm isotropic resolution did not show a modular processing of naturalistic texture in CO-stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We will discuss these limitations in the revised manuscript.

      More in-depth analysis of subject data is needed. The apparent structure in the texture images in peripheral fields of some subjects calls for more detailed analysis. e.g. Relationship to eccentricity and the need for a 'modularity index' to quantify the degree of modularity. A possible relationship to eccentricity should also be considered.

      We will perform further analysis based on your suggestion, especially regarding the relationship between eccentricity and modulation index. We will discuss this possibility in the revised manuscript.

      Given what is known as a modular organization in V4 and V3 (e.g. for color, orientation, curvature), did images reveal these organizations? If so, connectivity analysis would be improved based on such ROIs. This would further strengthen the hierarchical scheme.

      Thank you for your suggestion. The informational connectivity analyses used highly informative voxels by feature selection, which may already represent information from the modular organizations in these higher visual areas. We will examine the functional maps for possible modular organizations.

      Reviewer #2 (Public Review):

      In lines 162-163, it is stated that no clear columnar organization exists for naturalistic texture processing in V2. In my opinion, this should be rephrased. As far as I understand, Figure 2B refers to the analysis used to support the conclusion. The left and middle bar plots only show a circular analysis since ROIs were based on the color and disparity contrast used to define thin and thick stripes. The interesting graph is the right plot, which shows no statistically significant overlap of texture processing with thin, thick, and pale stripe ROIs. It should be pointed out that this analysis does not dismiss a columnar organization per se but instead only supports the conclusion of no coincidence with the CO-stripe architecture.

      Reviewer #1 also raised a similar concern. We agree that there may be a smaller functional module of textures in area V2 at a finer spatial scale than our fMRI resolution. We will rephrase our conclusions to be more precise.

      In Figure 3, cortical depth-dependent analyses are presented for color, disparity, and texture processing. I acknowledge that the authors took care of venous effects by excluding outlier voxels. However, the GE-BOLD signal at high magnetic fields is still biased to extravascular contributions from around larger veins. Therefore, the highest color selectivity in superficial layers might also result from the bias to draining veins and might not be of neuronal origin. Furthermore, it is interesting that cortical profiles with the highest selectivity in superficial layers show overall higher selectivity across cortical depth. Could the missing increase toward the pial surface in other profiles result from the ROI definition or overall smaller signal changes (effect size) of selected voxels? At least, a more careful interpretation and discussion would be helpful for the reader.

      We will discuss the limitations of cortical depth-dependent analysis using GE-BOLD fMRI. All our stimuli produced robust activations in these visual areas, thus the flat laminar profiles of modulatory indices are unlikely to be caused by smaller signal changes. We will show the original BOLD responses in addition to the modulation index.

      I was slightly surprised that no retinotopy data was acquired. The ROI definition in the manuscript was based on a retinotopy atlas plus manual stripe segmentation of single columns. Both steps have disadvantages because they neglect individual differences and are based on subjective assessment. A few points might be worth discussing: (1) In lines 467-468, the authors state that V2 was defined based on the extent of stripes. This classical definition of area V2 was questioned by a recent publication (Nasr et al., 2016, J Neurosci, 36, 1841-1857), which showed that stripes might extend into V3. Could this have been a problem in the present analysis, e.g., in the connectivity analysis? (2) The manual segmentation depends on the chosen threshold value, which is inevitably arbitrary. Which value was used?

      The retinotopic atlas on the standard surface is usually quite accurate in defining the boundaries of early visual areas. Although some stripes may extend into V3, these patterns should be more robust in V2. In our analysis, we selected only those with clear organizations within the retinotopic atlas. Thus, the signal contribution from V3 is likely to be small and would not affect the pattern of results. In addition, the results between V3 and V2 could be very different, we will compare the pattern of results from these areas in additional analyses. The threshold for segmentation is abs(T)>2, we will clarify this in the method.

      The use of 1-mm isotropic voxels is relatively coarse for cortical depth-dependent analyses, especially in the early visual cortex, which is highly convoluted and has a small cortical thickness. For example, most layer-fMRI studies use a voxel size of around isotropic 0.8 mm, which has half the voxel volume of 1 mm isotropic voxels. With increasing voxel volume, partial volume effects become more pronounced. For example, partial volume with CSF might confound the analysis by introducing pulsatility effects.

      We agree that the 1-mm isotropic voxel is much smaller in volume than the 0.8-mm isotropic voxel, but the resolution along the cortical depth is not a large difference. In addition to our study, there are also other studies showing that fMRI at 1-mm isotropic resolution is capable of resolving cortical depth-dependent signals. Also, our fMRI slices were oriented perpendicular to the calcarine sulcus, the higher in-plane resolution will also benefit in resolving depth-dependent signals. We will discuss these issues about fMRI resolution in the revised manuscript.

      The SVM analysis included a feature selection step stated in lines 531-533. Although this step is reasonable for the training of a machine learning classifier, it would be interesting to know if the authors think this step could have reintroduced some bias to remaining draining vein contributions.

      Several precautions have been taken in the ROI definition to reduce the influence of large draining veins. The same number of voxels were selected from each cortical depth for the SVM analysis, thus there was no bias from the superficial layers susceptible to draining veins. Also, since both feedforward and feedback connections involved the superficial voxels, the remaining influence of large draining veins should be comparable between the two connections.

      Reviewer #3 (Public Review):

      The authors tend to overclaim their results.

      Thank you for your comments. We will add more control analyses to strengthen our findings, and have appropriate discussion of results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article describes a useful python-based image-analysis tool for bacteria growing in the 'mother-machine' microfluidic device. This new method for image segmentation and tracking offers a user-friendly graphical interface based on the previously developed, promising environment for image analysis 'Napari'. The authors demonstrate the usefulness of their software and its robust performance by comparing it to other methods used for the same purpose. The comparison provides solid support for the new method, although it would have been even stronger if tested using data sets from other groups. This article will be of interest for scientists who utilize the 'mother machine', not least because it also provides a short overview of how to set up this widely used device.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors aim to develop an easy-to-use image analysis tool for the mother machine that is used for single-cell time-lapse imaging. Compared with related software, they tried to make this software more user-friendly for non-experts with a design of "What You Put Is What You Get". This software is implemented as a plugin of Napari, which is an emerging microscopy image analysis platform. The users can interactively adjust the parameters in the pipeline with good visualization and interaction interface.

      Strengths:

      • Updated platform with great 2D/3D visualization and annotation support.

      • Integrated one-stop pipeline for mather machine image processing.

      • Interactive user-friendly interface.

      • The users can have a visualization of intermediate results and adjust the parameters.

      We thank the reviewer for their positive comments.

      Weaknesses:

      • Based on the presentation of the manuscript, it is not clear that the goals are fully achieved.

      • Although there is great potential, there is little evidence that this tool has been adopted by other labs.

      • The comparison of Otsu and U-Net results does not make much sense to me. The systematic bias could be adjusted by threshold change. The U-Net output is a probability map with floating point numbers. This output is probably thresholded to get a binary mask, which is not mentioned in the manuscript. This threshold could also be adjusted. Actually, Otsu is a segmentation method and U-Net is an image transformation method and they should not be compared together. U-Net output could also be segmented using Otsu.

      We agree that the comparison of the classical and U-Net results may be misleading. As the reviewer points out, the issue ultimately comes down to thresholding. Indeed, the threshold of both the Otsu and U-Net outputs could be adjusted to bring them into line with each other. The comparison between the Otsu pipeline and U-Net pipeline is meant to illustrate that any pipeline (making use of a variety of methods) may be highly susceptible to the value of a user-input (or hard-coded threshold).

      We have clarified the discussion to emphasize that the comparison is not specifically between U-Net and Otsu but between the two pipelines (lines 238 - 257).

      We have also clarified that the U-Net probability map output was binarized with a threshold of 0.5 (lines 538-541). We note the same activation function and threshold are used in DeLTA. As the reviewer points out, Otsu’s method could indeed be applied to threshold the U-Net output as well. What we referred to as the “Otsu” MM3 method itself uses Otsu thresholding coupled with a Euclidean distance transform and a Random Walker algorithm. For clarity we now refer to it as a classical or non-learning method in the text.

      • The diversity of datasets used in this study is limited.

      We have added a section “Testing napari-MM3 on other datasets” (lines 187-196) evaluating the performance of MM3 on 4 datasets (3 E. coli, 1 Corynebacterium glutamicum) from outside our lab, demonstrating its versatility.

      • There is some ambiguity in the main point of this manuscript, the title and figures illustrate a complete pipeline, including imaging, image segmentation, and analysis. While the abstract focus only on the software MM3. If only MM3 is the focus and contribution of this manuscript, more presentations should focus on this software tool. It is also not clear whether the analysis features are also integrated with MM3 or not.

      We have added a line (lines 160-162) clarifying that final analysis and plotting must be done outside of napari. MM3 itself processes raw microscopy images, segments cells and reconstructs cell lineages (Figure 2).

      • The impact of this work depends on the adoption of the software MM3. Napari is a promising platform with expanding community. With good software user experience and long-term support, there is a good chance that this tool could be widely adopted in the mother machine image analysis community.

      We thank the reviewer for their endorsement of MM3’s potential.

      • The data analysis in this manuscript is used as a demo of MM3 features, rather than scientific research.

      Reviewer #2 (Public Review):

      The authors present an image-analysis pipeline for mother-machine data, i.e., for time-lapses of single bacterial cells growing for many generations in one-dimensional microfluidic channels. The pipeline is available as a plugin of the python-based image-analysis platform Napari. The tool comes with two different previously published methods to segment cells (classical image transformation and thresholding as well as UNet-based analysis), which compare qualitatively and quantitatively well with the results of widely accessible tools developed by others (BACNET, DelTA, Omnipose). The tool comes with a graphical user interface and example scripts, which should make it valuable for other mother-machine users, even if this has not been demonstrated yet.

      We thank the reviewer for their positive comments.

      The authors also add a practical overview of how to prepare and conduct mother-machine experiments, citing their previous work and giving more advice on how to load cells using centrifugation. However, the latter part lacks detailed instructions.

      We have added a more detailed experimental protocol, including the procedure we use for cell loading, to the lab github page https://github.com/junlabucsd/mother-machine-protocols (linked in the main text).

      Finally, the authors emphasize that machine-learning methods for image segmentation reproduce average quantities of training datasets, such as the length at birth or division. Therefore, differences in training can propagate to difference in measured average quantities. This result is not surprising and is normally considered a desired property of any machine-learning algorithm as also commented on below.

      Points for improvement:

      Different datasets: The authors demonstrate the use of their method for bacteria growing in different growth conditions in their own microscope. However, they don't provide details on whether they had to adjust image-analysis parameters for each dataset. Similarly, they say that their method also works for other organisms including yeast and C. elegans (as part of the Results section) but they don't show evidence nor do they write whether the method needs to be tuned/trained for those datasets. Finally, they don't demonstrate that their method works on data from other labs, which might be different due to differences in setup or imaging conditions.

      We have added a section “Testing napari-MM3 on other datasets” (lines 187-196) evaluating the performance of MM3 on 4 datasets (3 E. coli, 1 Corynebacterium glutamicum) from outside our lab, demonstrating its versatility. We provide details of the procedure and parameters used in the Methods section. (“Analysis of external datasets” lines 476-486).

      Bias due to training sets:

      The bias in ML-methods based on training datasets is not surprising but arguably a desired property of those methods. Similarly, threshold-based classical segmentation methods are biased by the choice of threshold values and other segmentation parameters. A point that would have profited from discussion in this regard: How to make image segmentation unbiased, that is, how to deliver physical cell boundaries? This can be done by image simulations and/or by comparison with alternative methods such as fluorescence microscopy.

      We agree this is an important point. We have revised the relevant sections (lines 238 - 270) to add context to the discussion of bias in both classical and deep learning methods. We have added a subsection (lines 401 - 410) discussing methods to this end, such as synthetic training data generation or calibrating the segmentation to fluorescence images.

      The authors stress the user-friendliness of their method in comparison to others. For example, they write: 'Unfortunately, many of these tools present a steep learning curve for most biologists, as they require familiarity with command line tools, programming, and image analysis methods.' I suggest to instead emphasize that many of the tools published in recent years are designed to be very use friendly. And as will all methods, MM3 also comes at a prize, which is to install Napari followed by the installation of MM3, which, according to their own instructions, is not easy either.

      We have modified our language to acknowledge that indeed recent software such as DeLTA and BACMMAN make a point to be user-friendly and accessible (lines 52-53).

      Reviewer #1 (Recommendations For The Authors):

      -The resources, including documentation and code, are referenced and are not easy to find. It should be easier for readers to curate them in a separate Resources section.

      We have created a Resources section in the Methods (top of first page) with the documentation, code and protocols hyperlinked.

      • It would be easier to understand the usage of MM3 with a screen recording video. I found a video from the GitHub paper, but the resolution is a bit low. Attaching a high-resolution screenshot video would be helpful.

      A high resolution tutorial video has been made more visible on the github page.

      • In Table 1, AMD GPU is used which is not easy to use for Deep Learning. It is not clear whether the GPU is used for Deep Learning training and inference.

      We have clarified this point in the Table 1 caption, and linked to a reference on how to use AMD GPUs with Tensorflow on Macs.

      • Some paragraphs in the Discussion section are like blogs with general recommendations. Although the suggestions look pretty useful, it is not the focus of this manuscript. It might be more appropriate to put it in the GitHub repo or a documentation page. The discussion should still focus on the software, such as features, software maintenance, software development roadmap, and community adoption.

      • It would be easier for reviewers to add line numbers in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Software Installation: This might be something for the GitHub forum, but briefly trying to install the plugin myself, I already failed at the first line of the GitHub instructions, which is to use mamba for installation. This relates to my point above: Any program that is not stand-alone requires some user-savviness and trial-and-error, which is just hard to avoid for any method. I suggest being less critical of 'other methods' and instead focus on the advantage of the mother-machine-specific aspects of napari-mm3.

      The authors write 'Still, most labs do not have the time and resources to evaluate other tools they do not use critically, [...]'. The sentence is not very clear. Evaluating tools not used is obviously difficult/impossible.

      We have reworded this sentence to be more clear (lines 54-55).

      The authors write: 'The supervised learning method uses a convolutional neural net (CNN) with the U-Net architecture [20].' Can the authors cite previous work that has taken advantage of this approach before (e.g., DelTA)?

      We have added citations to DeLTA and other previous software (line 151).

      Cell tracking and lineage reconstruction should be described in more detail and/or with reference to previous work.

      We have added more details to the SI (lines 554 - 567) discussing the method in the context of existing mother machine analysis software.

      The authors provide a figure for a '3D printed cell loader', but as far they don't give instructions including a CAD file and the model of the fan used for spinning. The same holds for the stage inset (which, as far as I see, is not referred to in the manuscript text nor described in a figure caption).

      Thank you for pointing out this omission. The centrifuge is referenced in Box 1. We have updated the manuscript with a link to a Github repository containing CAD files & details of the centrifuge construction. We decided to remove the stage insert from the figure.

      Figure S3: Is the asymmetry in growth rate due to the expression of a fluorescent protein, due to strain differences, or due to imaging artifacts? Maybe this is impossible to tell based on the available datasets, but this could be discussed.

      Based on previous work (DOI 10.1099/mic.0.057240-0) it is likely due to the expression of the fluorescent protein and fluorescence imaging. We have added a brief discussion in the Figure S3 caption.

    1. Author Response

      The authors appreciate the reviewers' thoughtful and constructive feedback. We are pleased to have the opportunity to address their comments through a revised version to strengthen our work. In particular:

      (1) As suggested, we will add references/details in Methods to further help readers to establish the cohort as population-derived and clarify details about the analysis and specificity of results.

      (2) We agree that reserve, inefficiency, and compensation are complex issues needing more discussion. We will add definitions and discussion to clarify our approaches, including multivariate/univariate analyses and addressing the specificity of results. We also appreciate the suggestions for future research directions.

      A revised version addressing these valuable recommendations will improve our study's contribution towards quantitative methods for understanding reserve and compensation in healthy cognitive ageing.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heat-killed yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Thank you so much for appreciating our effort to generate a live whole-cell vaccine by treating with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells. CAET is a live cell just that it replicates slower than the wild type. Since the respected reviewer presumed CAET to be a dead strain similar to heat-killed, most of his/her comments were partly negative.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTA-treated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Thank you very much for appreciating our work and finding our strain to be a live whole-cell vaccine strain with translational potential. Since the current study focused on the identification and detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. We are in the process of developing another manuscript where we describe both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. Although the Figs. 5 and 6 are the main straight of the paper, we are afraid that this respected reviewer found them not convincing.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper by Perovic and colleagues describes how important blood vessels called collaterals form during development and remodel/expand upon injury to the brain. These vessels are conduits between arteries that do not have strong blood flow physiologically but upon injury can compensate for conduit loss. Published work by others is largely descriptive and does not address the cellular sources of collaterals over time. Here elegant lineage tracing is used to better understand the source of vascular endothelial cells during embryonic development, and how these lineages contribute to remodeling upon injury. The work is ambitious and important as collateral capacity can strongly influence the trajectory of outcomes with vascular blockage. The work reveals that proliferative arterial EC is the primary contributor to the collaterals developmentally, with a small contribution from capillary/venous EC, and that this shifts to almost completely arterial contribution from birth onward. There are several aspects of the work that, if addressed, would strengthen the study and better support the interesting and novel conclusions, including analysis of non-collateral lineage contributions, more careful interpretation of fixed image data, and more careful annotation of the image panels.

      We thank the reviewer for appreciating the ambition, importance and novelty of our work, and for the constructive suggestions for improvements.

      Reviewer #2 (Public Review):

      Pial collateral vessels are anastomotic connections that cross-connect distal arterioles of the middle, anterior, and posterior cerebral arteries. With respect to ischemic stroke, good pial collateral flow positively correlates with decreased infarct volume and improved recovery; accordingly, optimizing collateral flow represents an important intervention for limiting stroke damage. The goal of this study was to determine the endothelial cell (EC) subtype(s) that contribute to the embryonic and neonatal development of pial collaterals and their expansion in response to stroke. To this end, the authors used lineage tracing methods in the mouse, labeling arterial endothelial cells (using Bmx-CreERT on switch line, R26mTmG) or venous and microvascular endothelial cells (using Vegfr3-CreERT on R26mTmG) and assessing pial collaterals via confocal microscopy. The authors convincingly demonstrate that arterial-lineage ECs comprise the majority of pial collateral ECs during development and in adulthood, with a minor contribution from pial plexus-derived microvascular ECs that decline over time. They also convincingly demonstrate that pial collateral outward remodeling after experimentally-induced stroke (distal middle cerebral artery occlusion, or dMCAO) involves, at least in part, local proliferation of arterial-lineage ECs. The latter is intriguing given that arterial ECs generally leave the cell cycle. While these conclusions are quite solid, some key details are missing that could improve analysis, and some important caveats are not addressed. Moreover, less convincing are mechanistic claims that pial collaterals form via a migratory process of "mosaic colonization" of a preexisting vessel.

      We thank the reviewer for the careful assessment and suggestions for improvements. Claiming migratory behaviour from static images is indeed always tricky and comes with caveats. Our conclusions however are based on the appearance of cells in locations where they are not found at earlier stages. Given that we could exclude persistent recombination, a sound conclusion must be that cells appear in the new location through some means of translocation. Given our experience with the morphology of migrating cells in vivo, the appearance of polarized filopodial structures coinciding with the direction of observed appearance of cells at progressive later stages, strongly suggests active migration. Moreover, these highly migrating cells also exhibit ICAM2 positivity, suggesting that they are directly lining the pre-collateral lumen. In our explanation of how the immigration might occur, we would need to consider solitary cell migration through interstitial space, or rather intercalation movement. The active participation of migrating cells in lumen formation of the nascent pre-collateral suggests intercalation, but further analysis needs to be performed (such as a detailed analysis of cell-cell junctions or sustained apico-basal polarity). The conclusion that such a process highlights mosaic colonization of preexisting vessels is tightly linked to the demonstration of continuous lumen, whilst being found in a vessel without lineage marker, but beginning expression of arterial markers such as Cx40.

      1) It is difficult to understand whether individual collaterals are truly mosaic vessels, or whether arterial or venous/microvascular lineage ECs predominate in any particular region of the pial collateral vasculature. This is due to a number of methodological reasons: arterial and venous/microvascular contributions to pial collaterals were assessed independently, only a few (and in some cases, just one) collaterals were analyzed in each mouse, and regionality/location of collaterals was not addressed. Additionally, the inefficiency and variability of EC labeling, especially with the Vegfr3-CreERT line (Fig. S1, ~6-30%), compounds this problem.

      Factual error: 6 - 22% (not 30)

      The reviewer is correct in their statement that the independent assessment of contribution makes it difficult to locally demonstrate mosaicism. However, we are not aware of a method that could trace two different populations from different sources using recombination genetics simultaneously. Mosaicism however can be concluded from two observations independently. One, we find contribution from an alternative source that at the time point of labelling does not colocalize with arterial BMX lineage cells. Second, the BMX-lineage labelling is never complete in the collaterals, at least at developmental stages. Future work using scRNA seq may shed more light onto the degree of mosaicism. However at this point, the data strongly suggest mosaicism, even if the majority of the cells are of the BMX-lineage. The comment on inefficiency or variability of labelling in particular with the Vegfr3-CreERT line is interesting. At this point, we cannot rule out that the observed variability is due to intrinsic variability in expression, rather than inefficient recombination, or variability thereof. With our current tools we cannot easily distinguish between the two. Again, we hope that future studies with scRNA seq will be able to shed more light onto this interesting biology. Finally, we have not carefully assessed regionality, but have not seen obvious correlations with the degree of mosaicism. It is however important to note that in no case did we just examine one collateral per hemisphere. Each data point is an average of all collaterals from a part of a given collateral zone (imaging region). Usually, it is possible to image 2-4 collateral regions in each embryo. We always imaged multiple collaterals per animal, but sometimes only one region was imaged (due to technical issues).

      2) The identification of "pre-collateral" vessels requires further support. The authors define these vessels by their connection to the feeding artery, their (often) larger diameter, and their more pronounced ICAM2 expression. While most of these criteria are demonstrated in Figure S3, it is not apparent how these vessels were defined in Figure 4, which lacks specific annotation of each of these identifying criteria. As the identification of these novel vessels is one of the key findings of this paper, a more robust method of unambiguously defining them is warranted.

      We agree that it would be fabulous to have a unique marker at hand that identifies pre-collaterals. Our careful analysis of the distribution of the markers we tested, firmly established that the levels of ICAM2 expression nicely highlight structures that become colonized by these BMX lineage cells. Cx40 staining also confirmed this impression. We will attempt better annotation based on these markers to help the reader appreciate these findings. The combination of anatomical location and connection pattern with the stronger ICAM2 staining in our hands is a highly reliable and unambiguous identifier of what we called “pre-collaterals”.

      3) The conclusion that collateral-forming ECs migrate in the direction of flow into preexisting vessels is not well supported. The authors state that the presence of filopodial projections (Figure 4) supports this conclusion. However, filopodia number and directional polarization/orientation were not quantified, and "intercalation movements"/migration, per se, cannot be inferred from these static images.

      The reviewer is correct that claiming migration from static images is always difficult. As stated above, we base our conclusions on the progressive appearance of cells exhibiting migratory behavior, as well as the morphology including filopodia. Although we indeed didn’t quantify filopodia, these structures are in our experience not found on endothelial cells that do not engage in migration. Their consistent presence, and directionality is strongly suggestive of movement. . We will attempt to clarify this better in the text and the figures.

      4) In Figure 5, the simplest explanation for relative Cx40 expression in different vessels is the absence (low expression) or presence (high expression) of flow. This figure provides little mechanistic insight beyond this already-known relationship, and it is unclear how many times this experiment was performed (there is no N, no quantification or correlation).

      Flow is indeed one component of what regulated Cx40. However, a key point of this figure is to show that Cx40 expression can precede the recruitment of BMX lineage cells. This is important to distinguish whether arterial identity is only achieved by recruitment of BMX lineage cells, or exists in certain vessels (for example because they may have more flow) already before this colonization event. It suggests that the BMX population may rather serve to consolidate arterial state, as other structures that may have been Cx40 before, but do not become colonized lose arterial identity? We disagree that this finding does not contribute important information. If only BMX-lineage cells would express Cx40, the conclusion would be very different. This is not a question of how much, but of whether arterialization requires the recruitment of particular cells, or is induced in vessels that adopt arterial identity. This is not a singular observation and we will add the N number onto the figure legend.

      5) There is no statistical analysis in this work. This is justified by the authors by their admission that the study is of a "descriptive nature and...exploratory design."

      This is correct.

      Reviewer #3 (Public Review):

      Summary:

      These studies focus on a very interesting, understudied phenomenon in vascular development - the formation of pial collaterals between cerebral arteries. Understanding the mechanism(s) that regulates this process during normal development could provide important insights for the treatment of adult stroke patients, for which repair is highly dependent on collateral formation. Insights may also be relevant to other collateral-dependent diseases, such as heart disease and chronic peripheral ischemia.

      Strengths:

      The investigators use lineage tracing and 3D imaging to show that, in mouse embryos, endothelial cells (ECs) predominantly from Bmx+ arteries and some from the Vegfr3+ microvasculature, invade pre-existing pre-collateral vascular structures in a process they termed "mosaic colonization", and arterialization of the vessel segments is said to occur concurrently with colonization, although details about EC phenotypes are lacking. Growth of the collaterals in response to ischemic injury relies on local replication of the ECs within the collaterals and not further recruitment from veins and the microvasculature. Although detailed molecular mechanisms are not provided, demonstration of the "cellular mechanism" of pial collateral vascularization is novel.

      Weaknesses:

      Nonetheless, there are some issues that should be addressed, particularly to clarify the phenotype of the ECs forming the collaterals and expanding in response to injury; only their "origin" was traced and not their identity/growth after labeling in Bmx+ vessels.

      We thank the reviewer for pointing out the importance and novelty of our findings, and for the constructive suggestions for improvements. We indeed focussed here on origin and an attempt to distinguish how the cells arrive in their location rather than on their phenotype. We have performed detailed phenotypic analysis including EM analysis of collaterals but without the ability to connect these to the traced lineages. We therefore chose to leave these data for a separate manuscript. Future work will attempt to fully characterize these populations including their transcriptome using scRNA seq. However, isolating collateral ECs to faithfully characterize them is very challenging, and will not be a part of this manuscript. We have performed stainings for various arterial markers, with variable success.. Nevertheless, a full functional study will be part of future work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: The authors study the appearance of oscillations in motifs of linear threshold systems, coupled in specific topologies. They derive analytical conditions for the appearance of oscillations, in the context of excitatory and inhibitory links. They also emphasize the higher importance of the topology, compared to the strength of the links. Finally, the results are confirmed with WC oscillators, which are also linear. The findings are to some extent confirmed with spiking neurons, though here results are less clear, and they are not even mentioned in the Discussion.

      Overall, the results are sound from a theoretical perspective, but I still find it hard to believe that they are of significant relevance for biological networks, or in particular for the oscillations of BG-thalamus-cortex loop in PD. I find motifs in general to be too simplistic for multiscale and generally large networks as is the case in the brain. Moreover, the division of regions is more or less arbitrary by definition, and having such a strong dependence on an odd/even number of inhibitory links is far from reality. Another limitation is the fact that the cortex is considered a single node. Similarly, decomposing even such a coarse network in all possible (238 in this case) motifs doesn't seem of much relevance, when I assume that the emergence of pathological rhythms is more of an emergent phenomenon.

      Strengths:

      From the point of view of nonlinear dynamics, the results are solid, and the intuition behind the proofs of the theorems is well explained.

      Weaknesses:

      As stated in the summary, I find the work to be too theoretical without a real application in biological systems or the brain, where the networks are generally very large.

      We respectfully disagree with the reviewer here. The second half of the paper is all about explaining a biological problem. We have shown the validity of our theoretical results (which indeed were obtained in idealized settings) to explain emergence of oscillations in the basal ganglia. We clearly show that our theoretical results hold both in a rate-based model and in a network model with spiking neurons. The model with spiking neurons is one of the most complete network models of the basal ganglia available in the literature. So we emphasize that we have provided a clear application of our results for the brain networks.

      It is not the problem in the simplicity of the model or of the topology, it is often the case that the phenomena are explained by very reduced systems, but the problem is that the applicability of the finding cannot be extended. E.g. the Kuramoto model uses all-to-all coupling, or similar with QIF neurons which also need to follow a Lorentzian distribution in order to derive a mean field.

      We do not understand this comment. There is no need to extend these results to a network of Kuramoto models because in that setting we already assume that individual nodes/populations are oscillating – there is no problem of emergence of oscillations. Here, we are specifically considering a setting in which nodes themselves are not oscillators. We agree that we, at this point, have no insight into how to extend our analytical proof to a situation where individual nodes are spiking.

      But in those cases, relaxing the strict conditions that were necessary for the derivations, still conserves the main findings of the analysis, which I don't see being the case here. The odd/even number rule is too strict, and talking about a fixed and definite number of cycles in the actual brain seems too simplistic.

      We have clearly relaxed most of our assumptions when we considered a network model of basal ganglia in which each subpopulation is a collection of spiking neurons. And as we have shown our results still hold (see Figure 5). Again our model is about oscillations in a network of networks i.e. network of brain regions.

      At meso-scale it is not unreasonable to find such cycles and even-odd number rules. We have shown this for the case of a cortico-basal ganglia model. We can also extend this to cortico-thalamic networks and so on. We have already emphasized this point in the introduction to avoid any confusion: see lines 62-66 – “We prove this conjecture for the threshold-linear network (TLN) model without delays which can closely capture the dynamics of neural populations. Therefore, it is implicit that our results do not hold at the neuronal level but rather at the level of neuron populations/brain regions e.g. the basal ganglia (BG) network which can be described a network of different nuclei.” and lines 69-70 – ’Within the framework of the odd-cycle theory, distinct nuclei are associated with either excitatory or inhibitory nodes.’

      Being linear is another strong assumption, and it is not clear how much of the results are preserved for spiking neurons, even though there is such an analysis, or maybe for other nonlinear types of neuronal masses.

      Clearly our results hold in a network of spiking neurons (see Figure 5). It is of course interesting to ask whether our results hold in a network where individual spiking neurons have more complex spiking behavior like AdEx or Quadratic IF. But that kind of analysis deserves a full manuscript on its own.

      Delays are also mentioned, and their impact on the oscillatory networks is as expected: it reduces the amplitude, but there is no link to the literature, where this is an established phenomenon during synchronization. Finally, the authors should also discuss the time-delays as a known phenomenon to cause or amplify oscillations at different frequencies in a network of coupled oscillators, e.g Petkoski & Jirsa Network Neuroscience 2022, Tewarie et al. NeuroImage 2019, Davis et al. Nat Commun 2021.

      This is indeed a weakness of our model. But as the reviewer already knows, dynamical systems with delays are very difficult to analyze analytically. We have mentioned this in the limitations of the model and the analysis. In our simulations we have considered delays and when the delays are within reasonable limits our results hold.

      Reviewer #2 (Public Review):

      Summary:

      The authors present here a mathematical and computational study of the topological/graph theory requirements to obtain sustained oscillations in neural network models. A first approach mathematically demonstrates that a given network of interconnected neural populations (understood in the sense of dynamical systems) requires an odd number of inhibitory populations to sustain oscillations. The authors extend this result via numerical simulations of (i) a simplified set of Wilson-Cowan networks, (ii) a simplified circuit of the cortico-basal ganglia network, and (iii) a more complex, spike-based neural network of basal ganglia network, which provides insight on experimental findings regarding abnormal synchrony levels in Parkinson's Disease (PD).

      Strengths:

      The work elegantly and effectively combines solid mathematical proof with careful numerical simulations at different levels of description, which is uncommon and provides additional layers of confidence to the study. Furthermore, the authors included detailed sections to provide intuition about the mathematical proof, which will be helpful for readers less inclined to the perusal of mathematical derivations. Its insightful and well-informed connection with a practical neuroscience problem, the presence of strong beta rhythms in PD, elevates the potential influence of the study and provides testable predictions.

      Weaknesses:

      In its current form, the study lacks a more careful consideration of the role of delays in the emergence of oscillations. Although they are addressed at certain points during the second part of the study, there are sections in which this could have been done more carefully, perhaps with additional simulations to solidify the authors' claims. Furthermore, there are several results reported in the main figures which are not explained in the main text. From what I can infer, these are interesting and relevant results and should be covered. Finally, the text would significantly benefit from a revision of the grammar, to improve the general readability at certain sections. I consider that all these issues are solvable and this would make the study more complete.

      This point has been made by the first reviewer as well. So we repeat our answer:

      This is indeed a weakness of our model. But as the reviewer already knows, dynamical systems with delays are very difficult to analyze analytically. We have mentioned this in the limitations of the model and the analysis. In our simulations we have considered delays and when the delays are within reasonable limits our results hold.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my comments above, I think that the work is already quite solid and relevant but would significantly improve if some issues were addressed:

      We would like to thank the reviewer for valuable comments and constructive feedback which has helped us greatly improve the manuscript.

      1) While the authors acknowledge early on the limitations of this study in terms of not considering plasticity or neuron biophysics (line 72), I think that the absence of propagation delays should be explicitly included here. This absence leads to inaccuracies --for example, the sentence "Consider a small network of two nodes. If we connect them mutually with excitatory synapses, intuitively we can say that the two-population network will not oscillate" (line 74) is only correct if the delays (or signal latencies) are zero. With a proper delay, two excitatory neurons can engage in oscillations with a period given by two times the value of the delay.

      A similar situation happens for inhibitory neurons, where the winner-take-all dynamics described in line 77 is only valid for zero delay. It is known that a homogeneous population of inhibitory spiking neurons with delayed synapses can lead to fast oscillations (Brunel and Hakim 1999), something which is also valid for the equivalent inhibitory single node with delayed self-inhibition. Indeed, a circuit of two inhibitory populations with delayed self- and cross-inhibition can generate oscillations, contradicting the main conclusion of the odd number of inhibitory nodes needed for oscillations.

      Because of these considerations, I think the authors should be more careful when explaining the effects of delays, and state that their main results on the link between oscillations and having an odd number of inhibitory nodes are not valid when delays are considered. They could modify the sentences in lines 72-77 above and include a supplementary figure right after their simulation study for the Wilson-Cowan (to explain the examples above, and also the one in the next point).

      The reviewer has brought up a critical point regarding the impact of propagation delays, and we completely concur with your assessment. In our study, we indeed did not comprehensively consider the effects of propagation delays in cycles with even inhibition, which may introduce inaccuracies in our conclusions.

      We note that in the Wilson-Cowan model with delays, certain cycles with even number of inhibitory links can also generate oscillations with a period equal to twice the delay value. However, in our hand such oscillations were transient and dissipated quickly.

      To better reflect the limitations of our research, we have made significant modifications to the relevant sections in our manuscript.

      In line 100, we've added text to explicitly state that we considered delays in our simulations and acknowledged their potential to generate oscillations ("Given the importance of delays in biological network such as BG, we will consider them in the simulations.").

      In line 102, we've clarified that our conclusions are based on a scenario without delays ("In this following, we give simple examples of the possibility of oscillation (or not) based on the connectivity characteristics of small networks without delays. Let us start with a network of two nodes.").

      Additionally, in line 230, we've included a reference figure supplement 3-2 to highlight the outcomes in terms of oscillations ("EII network only resulted in transient oscillations (Fig. 3, figure supplement 3-1, figure supplement 3-2)").

      In lines 234-237, we've added a sentence discussing the role of synaptic delays in generating transient oscillations in cycles with an even number of inhibitory components, referring to figure supplement 3-2 ("In networks with even number of inhibitory connections (e.g. EII, EEE, II), synaptic delays are the sole mechanism for initiating oscillations, however, unless delays are precisely tuned such oscillations will remain transient (see Supplementary figure supplement 3-2)").

      Moreover, in response to the reviewer’s suggestion, we have included an additional figure supplement 3-2 to illustrate how cycles with even inhibitory components generate transient oscillations when propagation delays are taken into account. This figure provides a visual representation of the phenomenon and enhances the clarity of our findings.

      2) In Figure 3, two motifs (III and EII) are explored to demonstrate the validity of the results across different parameters. Delays don't seem to play a disruptive role in these two cases, but the results seem to be different for other motifs not considered here. Aside from the examples mentioned above, I can imagine how a motif of EEE (i.e. a circle of three excitatory Wilson-Cowan neurons) would display oscillations when delays are included, as the activation would 'circulate' along the ring. However, this EEE motif has an even number of inhibitory units (or perhaps zero is considered an exception, but if so it's not mentioned in the text).

      We thank the reviewer for this observation regarding Figure 3. Indeed, the impact of delays may differ for other motifs not considered in our study. For example, as the reviewer has correctly anticipated, a motif of EEE (a circular network of three excitatory Wilson-Cowan neurons) would exhibit oscillations when delays are included, as activation could 'circulate' along the ring.

      To address this concern,we have performed new simulations (added as a new supplementary figure supplement 3-2). As illustrated in figure supplement 3-2, oscillations may indeed arise in the EEE motif when delays are introduced. However, these oscillations will eventually dissipate – at least with our settings.

      3) Figures 1b, 1c, and 4e display interesting results, but these are absent from the main text. Please include the description of those results. Particularly the case of Figs 1b and 1c seems very relevant to understanding the main results in the context of more complex networks, in which multiple loops with odd and even numbers of inhibitory units would coexist in the network. Does the number of odd-inhibitory loops in a given network affect somehow the power or frequency of the resulting network oscillations? It would be interesting to show this.

      Indeed, we did not explain Figs 1b,c and 4e properly. Now we have revised the manuscript in the following way to incorporate these results:

      In lines 124-128, we added the following text to introduce the concept: "We can generalize these results to cycles of any size, categorizing them into two types based on the count of their inhibitory connections in one direction (referred to as the odd cycle rule, as illustrated in Fig. 1b). More complex networks can also be decomposed into cycles of size 2…N (where N is number of nodes), and predict the ability of the network to oscillate (as shown in Fig. 1c)" In line 298, we included the following text to highlight the relevant result: "Next, we removed the STN output (equivalent to inhibition of STN), the Proto-D2-Arky subnetwork generated oscillations for weak positive inputs to the D2-SPNs (Fig.4e, bottom)."

      How the number of odd/even loops affect the frequency is an interesting question. Intuitively there should be a relation between the two. However, a complete treatment of this question is beyond the scope of the manuscript but we think that in a network with identical node properties, more odd cycles should imply higher oscillation power.

      4) The cortico-BG model is focused on how inactivating STN could suppress (or not) beta oscillations, following experimental observations. However, besides mechanisms for extinguishing oscillations, it would be interesting to see if the progressive emergence of pathological beta oscillations could be explained by the modification of some of the nodes in the model (for example, explicitly mimicking the loss of dopaminergic neurons in the substantia nigra). This could be a very interesting additional figure in the main text.

      This is an interesting suggestion. Something similar has been already done – e.g. Kumar et al. (2010) showed that progressive increase of inhibition of GPe can lead to oscillations. Similarly Holgado et al. (2008) showed how progressive change in the mutual connectivity between STN and GPe can cause oscillations. More recently, Ortone et al. (PloS Comp. Biol 2023) and Azizpour et al. (2023 Bioarxiv) have also shown the effect of progressive change in individual node properties on oscillations in basal ganglia using numerical simulations. Our work in a way provides the theoretical backing to their work. Therefore, we think it is not necessary to again show these results in our model. Instead we have cited these papers. Lines 392-396

      5) I observed some grammatical inconsistencies in the text, some of them are indicated below. I would suggest carefully going through the text to correct those issues or seeking help with editing.

      -line 32 "...which can closely capture the neural population dynamics". Which population dynamics? Do the authors refer to general neural dynamics?

      -line 33 "long term behavior" -> long-term behavior

      -line 68 "given the ionic channel composition" -> "given its ionic channel composition"

      We apologize for the grammatical inconsistencies in our manuscript. We have made the necessary corrections to improve the clarity and accuracy of our text.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript is useful for analytically showing that a cyclic network of threshold-linear neural populations can only oscillate if it has an odd number of inhibitory nodes with strong enough connections. Establishing this result, which holds under rather narrow assumptions, relies on standard tools from dynamical system theory. I find the strength of support for this result to be incomplete for the reasons detailed below:

      Although the mathematical arguments used appear to be correct, the manuscript lacks in rigor and clarity. For instance, the main result presented in theorem 2 is stated in a very unclear fashion: aside from the oddity of the number of inhibitory nodes, there are two conditions to check, which determines four cases. This can be explained in a much more straightforward way without introducing four relations in equations 4-7.

      We acknowledge the reviewer’s concern regarding the presentation of the main result in Theorem 2.

      We would like to emphasize that the introduction of four relations in equations 4-7 was intended to provide a detailed and transparent exposition of the conditions for the main result. While we understand that this approach may appear less straightforward, it allows for a more comprehensive understanding of the underlying logic and the multiple factors influencing the outcomes.

      However, we are open to suggestions for more concise and clear ways to express these conditions if the reviewer has specific recommendations or if there are alternative approaches that the reviewer believes would be more effective in conveying the information.

      Moreover, equation 3 in that same theorem is clearly wrong.

      We sincerely apologize for the typographical error in equation 3 within the same theorem. We thank the reviewer for noticing this. We have revised the text to rectify this mistake. The equation has now been corrected to ensure its accuracy.

      The proof of theorem 2 relies on standard linear algebra and can be improved as well: there are typos, approximations, and missing words (see line 664). The rigor of the exposition is also unsatisfactory. For instance, the proof of Lemma 1 ends with the sentence: "Similarly as before, the convergence of the dynamics driven by the left and right terms ends the proof". I don't know what this means.

      We thank the reviewer for the comments and suggestions. We have made the necessary adjustments to enhance the rigor and clarity of our mathematical reasoning in the revised manuscript.

      In line 644, we have provided clarification for the sentence you found unclear. The revised version now offers a more precise explanation that should help in understanding the proof.

      At the same time, the intuitive arguments presented in the main text are vague at best and do not really help grasping the possible generalizability of the results. For instance, I do not understand the message of panel B in Figure 2 and there seems to be no explanation about it in the main text.

      The main purpose of Figure 2B is to offer a visual representation of the concept and to serve as an aid for readers who may prefer a graphical illustration over extensive equations. While we understand that the figure may not provide a complete explanation on its own, it is intended to complement the text and mathematical content presented in the main text. In the revised version we have added the explanation of Figure 2B.

      Aside from the analytical result, most of the paper consists in simulating networks with distinct inhibitory cyclic structure to validate the theoretical argument. I do not find this approach particularly convincing due to the qualitative nature of the numerical results presented. There is little quantitative analysis of the network structure in relation to the emergence of oscillations. It is also hard to judge whether the examples discussed are cherry picked or truly representative of a large class of dynamics.

      The reviewer has a valid concern about numerical simulations and qualitative nature of the results. We would like to provide some perspective on our approach.

      In our paper, the primary focus is on the mathematical proof, which rigorously establishes the existence of our results. However, we understand that numerical simulations are valuable for illustrating the applicability of the theoretical framework and providing insights into the practical implications.

      If we get into the quantitative description of all the results, the manuscript will become prohibitively long. We acknowledge that there is a balance to be struck between theory and numerical examples in a research paper. We believe that, in conjunction with the mathematical proof, the numerical simulations serve the purpose of illustrating the existence of our results in specific examples. While we cannot provide an exhaustive exploration of all possible network structures, we have chosen representative cases to demonstrate the applicability of our findings. Some of these are already provided in figure supplements S3-1 and S3-3. In the absence of specific suggestions from the reviewer we would like to leave it as is.

      Moreover, the authors apply their cycle analysis to real-world networks by considering cycles of inhibitory nodes independently, whereas the same nodes can belong to several cycles. I find it hard to believe that considering these cycles independently should be enough to make predictions about the emergence of oscillations, as these cycles must interact with one another via shared nodes. I do not understand the color coding used to mark distinct cycles in supplementary figures. There is also not enough information to understand figures in the main text. For instance, I do not understand what the grids are representing in panel B, Figure 4.

      We have clarified the color coding and added more information to understand the figures. We appreciate the reviewer’s concern about our application of cycle analysis to real-world networks and the clarity of our figures. It is not a matter of belief – we have provided a mathematical proof and complemented that with illustrative examples from real-world networks i.e. cortico-basal ganglia network with both rate-based and spiking neurons. Clearly our results hold.

      Regarding the color coding in supplementary figures, we have revised the color scheme to make it more intuitive and informative in caption of figure 4: we use different colors to mark potential oscillators in each motif in BG, and each color means an oscillator from panel a. For more details, see figure supplements 4-1–4-6. The colors now represent distinct cycles more clearly, helping readers better interpret the figures.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the forces that shape the genomic landscape of transposable elements. By exploiting both long-read sequencing of mutation accumulation lines and in vivo transposition assays, the authors offer compelling evidence that structural variation rather than transposition largely shapes transposable element copy number evolution in budding yeast. The work will be of interest to the transposable element and genome evolution communities.

      Public Reviews:

      Reviewer #1 (Public Review):

      Henault et al build on their own previous work investigating the longstanding hypothesis that hybridization between divergent populations can activate transposable element mobilization (transposition). Previously they created crosses of increasing sequence divergence, using both intra- and inter-species hybrids, and passaged them neutrally for hundreds of generations. Their previous work showed that neither hybrids isolated from natural environments nor hybrids from their mutation accumulation lines showed consistent evidence of increased transposable element content. Here, they sequence and assemble long-read genomes of 127 of their mutation-accumulation lines and annotate all existing and de novo transposable elements. They find only a handful of de novo transposition events, and instead demonstrate that structural variation (ploidy, aneuploidy, loss of heterozygosity) plays a much larger role in the transposable element load in a given strain. They then created transposable element reporter constructs using two different Ty1 elements from S. paradoxus lineages and measured the transposition rate in a number of intraspecific crosses. They demonstrate that the transposition rate is dependent on both the Ty1 sequence and the copy number of genomic transposable elements, the latter of which is consistent with what has been observed in the literature on transposable element copy number control in Saccharomyces. To my knowledge, others have not directly tested the effect of Ty1 sequence itself (have not created diverse Ty1 reporter constructs), and so this is an interesting advance. Finally, the authors show that mitotype has a moderate effect on transposition rate, which is an intriguing finding that will be interesting to explore in future work.

      This study represents a large effort to investigate how genetic background can influence transposable element load and transposition rate. The long read sequencing, assembly, and annotation, and the creation of these reporter constructs are non-trivial. Their results are straightforward, well supported, and a nice addition to the literature.

      The authors state that the results from their current work support results taken from their previous study using short-read sequencing data of the same lines. The argument that follows is whether the authors gained anything novel from long-read sequencing. I would like to see the authors make a stronger argument for why this new work was necessary, and a more detailed view of similarities or differences from their previous study (when should others choose to do long read vs. short read of evolved lines?).

      We thank the reviewer for the suggestion. While we initially aimed to justify the relevance and novelty of the current in relation to our previous study, we understand that this justification may not have been strong enough.

      In the second paragraph of the introduction, we explain how the multidimensional nature of TE load makes it more complex to characterize that simply reporting the abundance of a given TE family in a given genome. We added the following concluding sentence to further emphasize the importance of long reads in TE-focused genome inference:

      “As such, ongoing technological and computational advances in genome inference, including long-read sequencing, will certainly be key to getting a detailed understanding of the dynamics of TEs and the underpinning evolutionary forces.”

      In the penultimate introductory paragraph, we summarize our previous work from 2020 and highlight that the evolution of Ty contents in MA lines was inferred from aggregate measures of genomic abundance of TE families using short reads. We then make the point that combinations of multiple SVs could affect the landscape of TEs in ways that are not reflected by crude short-read measures. We added the following sentence to further emphasize this point and contrast it with the necessity of using more powerful methodologies for genome resolution:

      “Under this scenario, measuring Ty family abundance would yield no significant net change, and the dissection of the underlying SVs using short reads could often be challenging.”

      Relatedly, the authors should report the rates of structural variants that they observe. How are these results similar/different from other mutation-accumulation work in S. cerevisiae?

      Since this work does not attempt to provide an exhaustive report of all the SVs in the MA lines, but rather focus on attributing an SV type to individual loci occupied by TEs, we cannot include these estimates, excepted for de novo transposition itself (see below). We added the following sentence to the Results section on the classification of Ty loci by SV types:

      “We note that the current methodology does not aim at providing an exhaustive quantification of all SVs in the MA lines, as previously done for some SV types (Marsit et al., 2021), but focuses solely on loci containing Ty elements.”

      We added estimates of the average retrotransposition rate in the MA experiment based on the number of de novo insertions detected in the MA lines genomes.

      Figure 4:

      “The average retrotransposition rates estimated from the counts of de novo insertions (per line per generation per element) are the following: CC1, 1.0✕10-5; CC2, 4.9✕10-6; CC3, 7.6✕10-6; BB1, 1.5✕10-5; BC2, 1.7✕10-5; BA1, 6.5✕10-6; BA2, 2.2✕10-5; BSc1, 3.6✕10-5.”

      We added the following paragraph in the Discussion section to specifically discuss these estimates in relation to the in vivo measurements.

      “We note that while the CC crosses tend to have the lowest retrotransposition rates as estimated from the de novo insertions (~1✕10-5 per line per generation per element; Figure 4), these values are several orders of magnitude higher than the in vivo measures in SpC backgrounds. The discrepancy between these estimates could be due to uncharacterized biases inherent to each method. They could also be linked to differences between the parental genotypes used to generate the MA crosses and the fluctuation assays. One major difference is the use of ade2 genotypes in the MA parents, a strategy that was initially adopted to provide a marker for the loss of mitochondrial respiration (Joseph and Hall, 2004; Lynch et al., 2008). It has been shown that the induction of adenine starvation through minimal adenine concentration in the medium and deletion of ADE2, which inactivates the adenine de novo biosynthesis pathway, increases Ty1 transcript levels (Todeschini et al., 2005), resulting in higher transposition rates. Rich complex medium like the one that was used for the MA experiment (YPD) can exhibit substantial variation in adenine concentration (VanDusen et al., 1997), and adenine can quickly become the limiting nutrient for ade2 strains (Kokina et al., 2014). Thus, we cannot exclude that the choice of initial ade2 genotypes could have inflated the transposition rates in the MA experiment.”

      Since the authors show a small, but consistent influence of mitotype on transposition rates, adding further evidence for the role of mtDNA in regulating transposition, I'm curious what the transposition rate of a p0 strain is. I think including these results could make this observation more compelling.

      We agree that measuring in vivo transposition rates in ρ0 backgrounds would be an interesting avenue. However, there is a large distinction between having non-functional mitochondrial respiration in ρ0 strains and inheriting diverse functional mtDNA haplotypes. The effects we show are all linked to the reciprocal inheritance of intact mtDNAs, producing ρ+ strains that are all respiration-competent, as shown by our growth confirmations on non-fermentable carbon sources for all the diploid backgrounds generated. While potentially interesting, adding transposition rates measures for the ρ0 backgrounds seems hard to justify in the context of our results.

      Reviewer #2 (Public Review):

      This is an interesting follow-up study that uses long-read sequencing to examine previously constructed mutation accumulation lines between wild populations of S. cerevisiae and S. paradoxus. They also complement this work with reporter assays in hybrid backgrounds. The authors are attempting to test the hypothesis that hybridization leads to genome shock and unrestrained transposition. The paper largely confirms previous results (suggesting hybridization does not increase transposition) that are well cited and discussed in the paper, both from this group and from the Smukowski Heil/Dunham group but extends them to a new set of species/hybrids and with some additional resolution via the long read sequencing. The paper is well written and clear and I have no serious complaints.

      In the abstract, the authors make three primary claims:

      Structural variation plays a strong role in TE load.

      Transposition plays only a minor role in shaping the TE landscape in MA lines.

      Transposition rates are not increased by hybridization but are affected by genotype-specific factors.

      I found all three claims supported, albeit with some minor questions below:

      Structural variation plays a strong role in TE load.

      Convinced of this result. However:

      Line 185-187/Figure 3C: I'm curious given that the changes in Ty count are so often linked to changes in gross DNA sequence whether the count per total DNA sequence is actually changing on average in these genomes. Ie., does hybridization tend to increase TE count via CNV or does hybridization tend to increase DNA content in the MA lines and TEs come along for the ride?

      The Ty content definitely “rides along” with the rest of the genome that is affected by retrotransposition-unrelated SVs. To further highlight this point, we added a panel (E) to Figure 3 in which we correlate the net Ty copy number change (same as panel D, formerly C) to the corresponding genome size, which reflects the amount of DNA lost/gained by all SV types. We added the following to the results section:

      “The distributions of net Ty CN change per MA line showed that most crosses had significant gains (Figure 3D), suggesting that Ty load can often increase as a result of random genetic drift. Some (but not all) of these crosses also exhibited significant increases in genome size after evolution (Supplemental Figure S7A). The net Ty CN changes per MA line subgenome were globally correlated to the corresponding changes in subgenome size (Figure 3E). Even after excluding polyploid lines (which have the largest changes in both Ty CN and genome size), we found a significant relationship between the two variables (mixed linear model with random intercepts and slopes for MA crosses, P-value=3.71✕10-9; Supplemental Figure S7B), indicating that SVs affecting large portions of the genome have a substantial impact on the Ty landscape.”

      One question about ploidy (lines 175-177):

      Both aneuploidy and triploidy seem easy to call from this data. A 3:1 tetraploidy as well. However, in Figure 2B there are tetraploids that are around the 1:1 line. How are the authors calling ploidy for these strains? This was not clear to me from the text.

      This detail was indeed missing from the manuscript. The ploidy level of all MA lines was previously measured by DNA staining and flow cytometry, and the ploidy level of the subgenomes of each polyploid MA line was previously inferred from short-read sequencing. We modified the figure captions and the main text to include this along with the corresponding references:

      Figure 2:

      “The ploidy level of each line was previously determined by DNA staining and flow cytometry (Charron et al., 2019; Marsit et al., 2021).”

      Main text:

      “The ratio of classified bases per subgenome was consistent with the corresponding ploidy levels: triploid BC lines had two copies of the SpC subgenome, while tetraploid lines had both SpC subgenomes duplicated (Charron et al., 2019; Marsit et al., 2021) (Figure 2B).”

      “Finally, we used the ploidy level of each MA line subgenome as previously measured by flow cytometry and short-read sequencing (Charron et al., 2019; Marsit et al., 2021).”

      Reviewer #3 (Public Review):

      Henault et al. address the important open question of whether hybridization could trigger TE mobilization. To do this they analysed MA lines derived from crosses of Saccharomyces paradoxus and Saccharomyces cerevisiae using long-read sequencing. These MA lines were already analysed in a previous publication using Illumina short-read data but the novelty of this work is the long-read sequencing data, which may reveal previously missed information. It is an interesting message of this study that hybridization between the two species did not lead to much TE activity. Due to this low activity, the authors performed an additional TE activity assay in vivo to measure transposition rates in hybrid backgrounds. The study is well written and I cannot spot any major problems. The study provides some important messages (like the influence of the genotype and mitochondrial DNA on transposition rates).

      Major comments

      • What I miss the most in this work is the perspective of the host defence against TEs in Saccharmoces. Based on such a mechanistic perspective, why do the authors think that hybridization could lead to a TE reactivation? For example, in Drosophila small RNAs important for the defence against a TE, are solely maternally transmitted. Hybrid offspring will thus solely have small-RNAs complementary to the TEs of the mother but not to the TEs of the father, therefore a reactivation of the paternal TEs may be expected. I was thus wondering, what is the situation in yeast. Why would we expect an upregulation of TEs? Without such a mechanistic explanation the hypothesis that TEs should be upregulated in hybrids is a bit vague, based on a hunch.

      We agree with the reviewer that in the first version of the manuscript, the justification for the investigation of the reactivation hypothesis in the first place was not self-sufficient and relied too much on our previous work, upon which this article builds. We extensively remodeled the introduction to better justify the investigation of this hypothesis in the context of the current knowledge on the regulation of Ty elements in Saccharomyces.  

      Reviewer #1 (Recommendations For The Authors):

      It's interesting that the net change in transposable element copy number in mutation accumulation lines is either insignificant or gain, and never a significant loss. I think this could make a nice discussion point regarding the roles of drift and selection on TE load.

      We thank the reviewer for the suggestion and agree that this is an interesting perspective that we did not explore in the first version of the manuscript. We thus included a short discussion point in the Results:

      “The distributions of net Ty CN change per MA line showed that most crosses had significant gains (Figure 3D), suggesting that Ty load can often increase as a result of random genetic drift.”

      We also added the following paragraph to the discussion section:

      “Our experiments illustrate how under weakened natural selection efficiency, TE load can increase in hybrid genomes by the action of transposition-unrelated SVs. This offers a nuanced perspective on the classical interpretation of the transposition-selection balance model (Charlesworth et al., 1994; Charlesworth and Langley, 1989), in which increased TE load would be predominantly driven by the relaxation of purifying selection against TE insertions generated by de novo transposition. Our results suggest that SVs arising in the context of hybridization can act as a significant source of TE insertion polymorphisms which natural selection can purge more or less efficiently, depending on the population genetic context. This is closely related to the idea that sexual reproduction could favor the spread of TE families, contributing to their evolutionary success (Hickey, 1982; Zeyl et al., 1996). Since the insertion polymorphisms that contribute to increase TE load mostly originate from standing genetic variation, they could be less deleterious and thus harder for natural selection to purge efficiently.”

      The point about the role of LOH in TE load is cool!

      We thank the reviewer for their enthusiasm, it is one of our favorite results as well.

      Figure 1: Add a figure component of the green box and label it Ty1 or TE.

      We modified Figure 1 accordingly.

      Figure 2C: what is the assembly size ratio?

      We added the following sentence to the figure caption to clarify what we define as assembly size ratio:

      “Assembly size ratio refers to the ratio of subgenome assembly size to the corresponding parental assembly size.”

      Something cut off in the N50 plot axis

      Unfortunately, we can’t seem to understand what the reviewer meant with this comment, nothing seems cut out of the figure panel 2C in any of our versions of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      These are all minor comments/suggestions that the authors can take or leave.

      Line 42: "fuels" should be "fuel".

      Since the verb refers to “source” and not “variants”, we believe it should be at the third person singular.

      Line 43: unclear what the authors mean by "regroup".

      We understand how this phrasing may sound strange. We modified the sentence accordingly:

      “Structural variation is a term that encompasses a broad variety of large-scale sequence alterations”

      Line 51-52: There are a couple of really nice papers that could be cited here from Anna Selmecki's group (Todd et al. 2020, Todd and Selmecki 2019, both in eLife).

      We thank the reviewer for the suggestions, we included some of these references in the manuscript.

      Figure 1: This is a nice cartoon! I'd suggest spelling out LOH here for a truly naive reader.

      We modified the Figure 1 accordingly.

      Figure 3A: One thing that is slightly lost here in the presentation is the relative frequency of the different events because of the changing scales across 3A. I can see why you want to do it this way, but would consider whether there may be a way to present this that makes it more obvious how much more frequent polyploidy is than excision for example.

      We agree with the reviewer that the focus of this visualization is to compare crosses and individual MA lines within SV types, and fails to display the relative importance of each SV type. We solved this by including an additional panel (new 3A) that shows how the number of Ty loci affected by each SV type scales in comparison to others.

      Figure 5: I'm not a fan of the gray bars highlighting the individual strains. This made the graph less intuitively readable for me.

      We tend to agree with the reviewer and rolled back to a previous version of Figure 5 that was lighter on annotations.

      One thing I would like to see in the future from this data (definitely not in this paper) is genome rearrangements within these hybrid MA lines. How often are there structural changes and how often are those changes mediated by repeats including TEs?

      We completely agree with the reviewer that this would be a very interesting avenue, with a distinct (and likely higher) set of challenges at the analysis level compared to simply focusing on TE sequences like we did here. We hope to be able to tackle this goal in the future of this project.

      Reviewer #3 (Recommendations For The Authors):

      • I'm not from the yeast field. But why this focus on the Ty-load? Are Ty's the only active TEs in yeast? Provide some background on the TE landscape in yeast and a justification for focusing on Ty's.

      We agree with the reviewer that this point was only implicit in the introduction. We modified the introductory segment on Saccharomyces yeasts to mention that Ty retrotransposons are the only TEs found in these genomes, thus explaining the exclusive focus on them. It now reads as follows:

      “In the case of Saccharomyces cerevisiae, the only TEs found are five families of long terminal repeat (LTR) retrotransposons families named Ty1-Ty5 (Kim et al., 1998).”

      • 56 I would argue that Petrov et al 2003 is not the best citation for arguing that TEs can lead to genomic rearrangement through ectopic recombination. Petrov solely showed that some long TE families are at lower population frequency than short TE families ones. This could be due to many reasons (e.g. recent activity of long TEs - mostly LTRs) but Petrov interpreted the data as being due to ectopic recombination. Petrov, therefore, did not demonstrate any direct evidence for the involvement of ectopic recombination.

      We agree with the reviewer that this reference is not the best choice to simply support the role of TEs in generating ectopic recombination events and modified the references accordingly.

      • For the assembly the authors used two steps 1) separate the reads based on similarity to a subgenome 2) and assembly the reads from the resulting two sets separately. This is probably the only viable approach, but I'm wondering if this step can lead to some biases (many reads may not be assigned to one sub-genome or assigned to the wrong sub-genome). An alternative, possibly less biased approach, would be to use one of the emerging assemblers that promise to assemble sub-genomes. Maybe discuss why this approach was not pursued.

      We completely agree that our method has some level of bias. We adopted it because it seemed the most appropriate to answer our question, which required to resolve individual TE insertions at the level of single haplotype sequences. One specific challenge of this dataset is that we have a relatively wide range of nucleotide divergence between parental subgenomes in the different MA crosses, from <1% to ~15%. The efficiency of haplotype separation from tools that are not necessarily designed to be tunable with respect to the level of nucleotide divergence seemed uncertain, which is why we opted for a custom methodology. Although read non-classification remains a problem that is hard to solve (and would remain so using orthogonal strategies), we believe that read misclassification is minimized by our stringent criteria for read classification. The goal of this study was not to develop a tool nor to benchmark our approach against existing diploid assembly tools. It yielded phased genome representations that were of sufficient completeness and contiguity to confidently answer our questions, and we believe that pushing the discussion towards technical considerations would fall outside of our main objective.

      • The authors used a decision tree to classify Ty loci. What were the training data? How were the trees validated? Decision tree is a technical term for a classifier in machine learning. I do not think the authors used machine learning in this work, but rather an "an ad-hoc set of rules". The term decision tree in this study is misleading.

      We believe that the term “decision tree” can simply refer to a hierarchy of conditional rules implemented as a classification algorithm. As the reviewer pointed, it is clear from the manuscript that none of the analyses performed include any form of training or fitting of a machine learning classifier. However, we agree that its specific reference to the machine learning classifier can create unnecessary confusion. We thus agree to remove this term from the manuscript and replaced all its instances by “a hierarchy of binary rules”.

      • 272: as it is the CNC explanation does not make a lot of sense to me; some information is missing, is p22 expression increasing with copy numbers?

      Yes, p22 expression correlates positively with the CN of p22-expressing Ty1 elements.

      Why are the two alternative downstream codons important?

      We thought it would be useful to mention the two start codons at this point because later in the discussion, we bring the conservation of the first start codon as an observation consistent with the putative expression of p22 in S. paradoxus. We also thought that it helped clarify the mechanism by which the N-truncated version of the protein is expressed.

      p22 interferes with assembly viral particles when in high copy numbers, but what happens when at low copy numbers, is it essential for retroviral activity? Is it even necessary for the virus or just some garbage product (they mention N-truncated).

      To our knowledge, these questions regarding the potential molecular functions of p22 outside of a retrotransposition restriction factor are still open. We added details to the background on CNC in the Introduction and Results section to help clarify some the points raised:

      Introduction:

      “The best known regulation mechanism in yeast is termed copy number control (CNC) and was characterized in the Ty1 family of S. cerevisiae. This mechanism is a potent copy-number dependent negative feedback loop by which increasing the CN of Ty1 elements strengthens their repression (Czaja et al., 2020; Garfinkel et al., 2003; Saha et al., 2015).”

      Results:

      “The mechanism of negative copy-number dependent self-regulation of retrotransposition (CNC) was characterized in the Ty1 family of S. cerevisiae (Garfinkel et al., 2016). This mechanism relies on the expression of an N-truncated variant of the Ty1 capsid/nucleocapsid Gag protein (p22) from two downstream alternative start codons (Nishida et al., 2015; Saha et al., 2015). p22 expression scales up with the CN of Ty1 elements that encode it (Tucker et al., 2015), which gradually interferes with the assembly of the viral-like particles essential for Ty1 replication (Cottee et al., 2021; Saha et al., 2015). Thus, CNC yields a steep negative relationship between the retrotransposition rate measured with a tester element and the number of Ty1 copies in the genome (Garfinkel et al., 2003; Tucker et al., 2015).”

      • mtDNA influences transposition, is anything known about the mechanism?

      When presenting this result, we make it clear that this finding is not new and was previously observed in S. cerevisiae x S. uvarum hybrids by Smukowski-Heil et al. (2021). In this reference, the authors discuss multiple mechanisms by which mitochondrial biology and mito-nuclear interplay may affect transposition rate, although their data cannot support one specific hypothesis. Our data does not to allow to further dissect the mechanistic basis of the mtDNA effect, not more than the effect of distinct Ty1 natural variants. Since we simply provide new independent evidence for the mtDNA effect, it seems to us that repeating the discussion on putative mechanisms while bringing no support to any given hypothesis would be of limited relevance.

      • During the first reading, I got quite confused about what CN means (copy number as it turned out). I suggest using abbreviations only if absolutely necessary, and I'm not entirely convinced it is necessary here. But I leave this to the discretion of the authors.

      We agree that the excessive use of abbreviations in manuscripts is annoying. However, in this case, “copy number” is used so extensively that its abbreviation seemed to improve the reading experience. Thus, we would prefer to keep it unchanged.

      • Fig 3D: Wilcoxon Rank sum test. It is not clear to me what was tested here? Which data were used?

      We confirm that the statistical test employed is the Wilcoxon signed-rank test, and not the Wilcoxon rank-sum test (also known as Mann-Whitney U-test). The Wilcoxon signed-rank test is used here as a non-parametric one-sample test against the null hypothesis that the distribution is centered around zero.

      • de novo -> italics

      We choose to follow the recommendation of the general style conventions of the ACS guide for scholarly communications not to italicize common Latin terms like “de novo”, “e.g.” and “i.e.”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The reviewers make some suggestions aimed towards increasing the clarity of the manuscript, and I suggest that the authors examine those carefully. In particular, the figure is difficult to read and could contain additional information to help the reader's interpretation. For example, Reviewer 1 suggests including sample age estimates alongside depth, while Reviewer 3 also notes that there is missing information in the figure. Apart from the figure, Reviewer 1 suggests two additional analysis to help explain the amount of mammoth DNA recovered, which they observe is much higher than previous similar investigations. This would seem to be an important issue to address, given the surprising nature of the findings. In addition to this larger issue, the Reviewer makes a few important suggestions for supplementary material that may be needed to support the authors' statements.

      Some additional recommended edits -- in particular to the text and included references to related studies -- are suggested by Reviewers 2 and 3, and both commented on the lack of a publicly-available data repository. The authors may also wish to comment on or revisit their differential treatment of wooly mammoth vs. wooly rhinoceros samples, though I suspect this has more to do with low read numbers for the rhinos.

      Thank you very much for the positive assessment of our manuscript and clear suggestions for revision. We address these points below.

      Reviewer #1 (Recommendations For The Authors):

      I have a few suggestions that might further improve the manuscript:

      It is difficult for the reader to follow which core slices exactly have been sampled and sequenced. The authors mention 23 samples were taken from core LK-001 and 16 samples from core LK-007. From the text it remains unclear to me what the exact age of each of these samples is. Figure 1 shows the depth at which the LK-001 core was sampled, maybe sample age estimates could be included here.

      Thanks for pointing this out. We have added approximate ages to Figure 1, added the depth range to the text (“from 1.5 to 80 cm”; l. 73-74, caption Figure 1), and reworked the table of the sampling depths in the supplement.

      Line 84-87. The authors mention the retrieval of DNA from several expected Arctic taxa, however no further data regarding these findings is given in the manuscript. It would be useful to report the same numbers for these species as the ones given for the Mammuthus and woolly rhinoceros, which would allow for a comparison of the relative abundance of the DNA between these species. Are the expected Arctic species for instance at much higher (DNA) abundance in the samples? It would also be interesting to know if the authors discovered DNA from extant species that are unlikely to have occurred in the geographic region. A (supplementary)table listing the number of mapped reads to each of the respective mitogenomes for each sequence library would be useful for the reader.

      We added a supplementary table (S8) indicating the numbers of reads assigned to mammals.

      Line 90: I am somewhat amazed by the amount of mammoth DNA the authors recovered from these cores. A total depth of over 400X of the mitogenome is quite extraordinary and I am not aware of any ancient sediment study to date that has retrieved a similar amount of data. For instance, the Wang et al. 2021 paper, which the authors cite, sequenced over 400 samples and did not find any mammoth DNA in 70% of those. For the 30% of samples showing signs of mammoth DNA they retrieved on average 530 sequence reads. In this study the authors find on average ~20.000 reads, in 22 out of the 23 sequence libraries. This makes me wonder if the way the mapping was performed has been too lenient, resulting in possible spurious mappings? To really confirm the authenticity of the mammoth (and woolly rhino data) I would suggest two additional analysis:

      1) Mapping all the sequence libraries to a reference consisting of the complete Asian-elephant genome (for instance https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_024166365.1/), the complete human genome (+mitogenome) and the Asian elephant mitogenome. This could possibly reduce spurious mappings as conserved regions between the genomes are filtered out and could also reduce the possible mapping of NUMTS. If the authors could show that after such a mapping approach a significant number of reads are still assigned to the Asian elephant part (including the mitogenome) of the reference, the reported findings would be strengthened.

      2) I also suggest to construct a mitochondrial haplotype network from the obtained DNA, while also including previously published Asian and African elephants as well as previously published mammoth mitogenomes. If the obtained haplotypes indeed show that they cluster within the known haplotype diversity of mammoth, that would be strong support for the authenticity of the data

      The same analysis could be considered for the woolly rhino data, although the lower read numbers might make this analysis challenging.

      We agree that the amount of mammoth DNA is surprising, which is why we opted for further laboratory experiments for confirmation of the hybridization capture results of the first core, i.e., 1) DNA extraction from a second core of a different lake, 2) a quantitative PCR approach (ddPCR), and 3) metabarcoding. Our results of the highly specific ddPCR and metabarcoding assays confirmed considerable amounts of mammoth DNA in two sediment cores of different lakes, thus we have no doubts regarding the authenticity of the data. Considering the large amount of mammoth DNA, the high number of reads, and particularly the high mitogenome coverage, we argue that the effect of some spurious mapping is negligible and does not affect the main outcome and conclusions of our study. Although we agree that a haplotype network would be interesting, such analyses would stretch beyond the focus of this publication.

      Line 91: The authors mention negative controls (extraction and library blanks) did not produce any reads assigned to mammals. This is quite remarkable, as in my experience low levels of (human)contamination are almost always present in the blanks. Could the authors comment on why they think the blanks did not show any signal of mammalian DNA?

      The hybridization capture enrichment and the filtration and mapping procedures likely eliminated human contamination. Also, the data were mapped against Arctic mammal mitogenomes, which did not include human reference sequences. However, six of the sediment samples contained human sequences (now shown in supplementary table S8), albeit at low read counts (mean = 65)

      Line 97: "mapping suggested that the sequences throughout the core originated from multiple individuals" The authors do not provide any supporting data showing this. I think that an analysis (for instance based on allele frequencies) has to be included in manuscript to support this claim.

      We agree that his claim was not sufficiently supported. We performed further analyses including genomic data of previously retrieved mammoth remains and assigned our data to these haplogroups; the results were added to the main text and are shown as a figure (Fig. 2).

      Line 98: "Signatures of post-mortem DNA decay were comparably minor."

      Do the authors know if the used hybridisation enrichment method can distort the measurement of post-mortem damage? Are for instance reads with C-T substitutions less likely to be captured by the baits?

      To our knowledge, there is no study suggesting that damaged sites are less likely to be captured. In general, the hybridization capture procedure is not overly specific, and studies report that DNA is readily and preferentially captured as long as the difference between baits and DNA is not above 10%.

      Line 100: "The proportions of bases did not suggest a substantial deviation from those in the reference genomes or in the closest extant relative of Mammuthus, the Asian elephant (Elephas maximus)."

      It is not clear to me what the authors mean by this. Could the authors explain how this was measured and what their interpretation of this result is?

      We realize that the sentence was unclear. We meant that the nucleotide composition was similar to that of the reference genomes or the closest extant relative. However, as we do not consider this important for the argument, we have removed this sentence from the manuscript.

      Given the high number of recovered mammoth reads in the samples, it would be interesting to know how much mammoth reads are present in the sample before enrichment capture with the baits. Shotgun sequencing the raw extract of one of the samples with the highest number of mammoth reads might allow for a rough estimate of mammoth DNA abundance compared to the other extant species (e.g. reindeer, Arctic lemming and hare) found in the sample(s). This could give further clarification about the extent of stratigraphy disturbance and its overall effect on the DNA based community reconstruction. However, this is just a suggested additional analysis and not something I believe crucial for supporting the overall findings in this manuscript.

      We fully agree that this would be a highly interesting and informative additional analysis to perform. It was, however, not possible to perform this additional analyses in the course of the current experiments.

      Finally, I could not find a public link to the (sequence)data produced in this study. I strongly encourage the authors to make their data publicly available.

      Thank you for pointing this out. We have added a Data Availability paragraph, including the respective reference.

      Reviewer #2 (Recommendations For The Authors):

      In the Discussion it is mentioned that the reasons for Mammoth extinction are not entirely clear but are largely attributed to sudden climate warming (and add some relevant citations). However, there is also abundant literature that suggest humans also played a role in their extinction (for instance, a recent one, Damien et al. (2022) at Ecology Letters 25: 127-137).

      We agree with the reviewer and have added some the recent citation highlighting the possible influence of humans.

      One possibility to add further interest to this paper would be to conduct a phylogenetic tree with the Mammoth mitogenome(s) retrieved and a reference dataset; it could be interesting to know where do they fall in the phylogeny -already abundant with tens of individuals- and maybe it could be even possible to roughly estimate their date. There are some papers that report many Mammoth mitogenomes, including of course some from Siberia; for instance Chang et al. (2017) at Sci Reports and also Fellow Yates et al. (2017) also at Sci Reports (the latter mainly from Central Europe).

      We are well aware of the amount of mt genomes available for mammoth, and such an analyses would be an interesting addition, potentially also offering the possibility to date the DNA. However, the analyses was hampered and would be less secure for this dataset, as our sequences display quite some variation among each other, suggesting that we have a mix of multiple mt genomes, which we cannot readily distinguish. We thus refrain from this, also because we instead provide multiple lines of evidence for the existence of the mammoth DNA in the surface sediment core (metabarcoding, ddPCR).

      Minor points:

      -Correct wooly to woolly

      Revised.

      -In the sampling description it is not totally clear if the samples were taken at 1 cm each (it is mentioned that core LK-001 is sliced in the field at 1-cm steps for radiometric dating and later it is explained that 23 samples were analyzed from this core, but it is unclear if they represent 23 cm of core)

      -Maybe the authors could briefly define some terms such as "talik"

      Revised.

      Reviewer #3 (Recommendations For The Authors):

      Maybe I missed this but I could not find a data availability statement or the location of the repository

      We have added a Data Availability paragraph, including the respective reference.

      It would be good to see some additional analysis on the distribution of the woolly rhinoceros DNA through the sediment core - like the figure for the mammoth i.e read numbers vs depth.

      We have added to the supplements a table showing the numbers of assigned mammal reads over the core depths (Table S8). However, as rhinoceros reads are considerable rarer in our results, we did not produce a figure.

      Would it be possible to be more explicit about the multiple mammoth individuals, could you calculate a minimum number or haplotypes for example.

      We agree that his claim was not sufficiently supported and added results from additional analyses (incl. Fig. 2). Please see our response above.

      Based on the aim stated in the introduction, the analysis of the Arctic biodiversity of this area is missing, it would be nice to see these result added or maybe the focus needs to be changed for clarity.

      We now explicitly state that this objective pertains to a different study, which is currently still in preparation for publication.

      The single main figure needs a bit more consideration. For example in panel A - there was no information on the transformation performed or what the general trend line refers to. Do the results in panel B refer to all 22 libraries? What is the x-axis in Panel C and what do the coloured lines refer to? Additionally, I think the figure needs to be in higher resolution with increased text size on all axes.

      We revised the figure and the caption for clarity and readability.

      Finally this might be an accidental typo - but when referring to the sample aged at around 8,677 years in text it states this the 36.5 cm sample (line 130 and 192), but the supplementary says this is the 51cm sample (Table S6). This would maybe impact potential conclusions. Would you be able to clarify this.

      Thank you for noting this error, we revised it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Answers to reviewers’ comments

      Peer Reviewers 2 and 3 criticized the name of the antibody – hvCADab - and the lack of proof that it recognized a classic cadherin. These criticisms were justified and in the intervening months the issue has been resolved. hvCADab does not recognize the cadherin protein, although it was made to an 18 amino acid sequence from the intracellular domain of the H. vulgaris cadherin protein. Newly available genome sequences from two other species, Hydra oligactis and Hydra viridissima, now show that the 18 amino acid antigen sequence is not present in these species.

      Nonetheless, the nerve net in both species is strongly stained by the antibody. Hence we have renamed the antibody PNab (pan-neuronal antibody). The antigen is currently not known. Nevertheless the antibody is an excellent reagent for imaging the nerve net in Hydra.

      We have revised the section on antibody preparation in Materials and Methods to state explicitly that PNab does not recognize classic cadherin. To support this conclusion we have added a sequence comparison (Suppl Fig 3) of the intracellular domains of classic cadherins from H. vulgaris, H. oligactis and H. viridissima, which show that the 18aa antigen sequence is only present in the H. vulgaris classic cadherin and not in the cadherin sequences from H. oligactis and H. viridissima. All three sequences have highly conserved p120/delta-catenin and beta-catenin binding domains. The sequence between these domains is highly variable and the 18aa antigen sequence used for antibody production is clearly not present in the H. oligactis and H. viridissima sequences.

      Both reviewers also criticized our evidence for pan-neuronal staining as inadequate. Hence we have now included additional data. We have stained a transgenic strain expressing NeonGreen under the control of a pan-neuronal alpha-tubulin promoter (Primak et al 2023). 684/684 transgenic nerve cells were stained with PNab. We consider this convincing evidence, in addition to the evidence presented previously, that PNab stains all nerve cells in Hydra. The first paragraph of Results has been revised to include these data.

      Reviewer 2 suggested moving gap junction/innexin data (Suppl Fig 3 and 4) from the Discussion to Results. These are indeed new results and we have followed this suggestion. Fig 12 (new) clearly shows gap junctions between neurites in bundles. It also shows that nerve cells in bundles express cell type specific innexins and hence can form cell type specific gap junctions. We have also added new images (Fig 11) of a transgenic Hym176B strain stained with PNab. These show that neurite bundles in the ectoderm contain neurites from different nerve cell types = neural circuits and hence that neurite links must be specific, e.g. gap junctions.

      As suggested by Reviewer 2 we have now provided a 3D interactive version of the block face SEM reconstruction (Suppl Fig 4). This shows that connections between neurites in bundles consist of thin overlapping fingers rather than “conventional” terminal contacts. It also shows that the purple neurite and extends past the green nerve cell body and does not end on it.

      Reviewer 2 suggested deleting discussion of possible functions for the endodermal nerve net (Discussion). We disagree with this suggestion. Our imaging results showed no connections between ectodermal and endodermal nerve nets. We also presented quantitative data for the absence of contact between the nerve nets in the gastric region. Consistent with our observations, Dupre and Yuste (2017) found no functional connection between the ectodermal and endodermal nerve nets based of neural activity measurements. Nevertheless, Giez et al (2023) in a recent preprint have described contact between specific endodermal and ectodermal nerve cells in the hypostome involved in the mouth opening response to glutathione. Both their observation and ours may be correct. The issue is not resolved. Hence we have included a discussion of possible functions for ectodermal and endodermal nerve nets. Importantly, our conclusions incorporate the difference in connectivity between muscle processes and nerve cells in the two nerve nets.

      Specific comments / Recommendations

      Reviewer 2

      Novelty: two preprints (Giez et al 2023) became available after the submission of our preprint. These include the results cited by the reviewer. These were not available to us at the time of submission.

      hvCADab has been re-named (see above). The differentiating nerve cell in Fig 11B is indeed stained by PNab. We have adjusted the intensities of red and green channels to show this more clearly.

      We consider the very clear black space between ectoderm and endoderm e.g. Fig 2B or Fig 4A to be an adequate marker for mesoglea. Use of an anti-mesoglea antibody would reduce the clarity of the image.

      It is always possible to look at more parts of Hydra tissue for possible nerve connections between ectoderm/endoderm. Nevertheless we provide the first quantitative data on the lack of contacts between 133 nerve cells (57 ectodermal and 76 endodermal) in the body column. Such data has not been previously available. And the EM result (Westfall 1973) cited by the reviewer is anecdotal at best. In later serial sectioning results on the hypostome/tentacle region from the Westfall lab no mention is made of nerve connections between the ectoderm and the endoderm. However, based on the results in the cited preprints (Giez et al) a closer examination of the hypostome/tentacle region in particular is warranted.

      To strengthen our conclusion that there are no contacts between the ectodermal and endodermal nerve nets, we now explicitly cite results from Dupre and Yuste (2017) on a calcium reporter strain demonstrating the absence of any crosscorrelation between the firing patterns of ectodermal RP1 network and the endodermal RP2 network. There was also no correlation between the activity of the second ectodermal nerve net CB and the endodermal RP2 network. These results demonstrate the absence of functional contacts between ectodermal and endodermal nerve nets.

      The reviewer criticizes the absence of trans-mesoglea links between ectodermal and endodermal epithelial cells in our EM images, e.g. Fig 9A. We can assure the reviewer that such links are frequently observed, although not in the image we chose for Fig 9A. This image, however, clearly documents two neurite bundles next to ectodermal muscle fibers.

      We agree with the reviewer that neurite bundles are an important discovery. And they raise the question of synaptic connections between neurites in bundles. Unfortunately, it is not possible to scan along the block face reconstruction (Fig 10) and count synapses. The resolution is not sufficient. Although scattered dense core vesicles (DCV) are observed in neurites, clustered DCV described by Westfall et al (1971) as synapses were not observed. We did, however, observe gap junctions between neurites in bundles (noted in Suppl Fig 3). These data have now been moved to the main body of the paper as Fig 12 together with the scRNAseq results on innexin gene expression in nerve cells. These results make it clear that neurites in bundles are connected via gap junctions and that these gap junctions are specific for neural circuits.

      The reviewer suggests that neurite bundles are an artifact of their interaction with muscle processes at the base of epithelial cells. We disagree with this statement. Muscle processes are temporary structures. They are withdrawn and reformed during every epithelial cell division, which occur approximately every three days. Bundles are almost certainly more stable structures. Furthermore, neurite bundles in the endoderm are distant from endodermal muscle fibers (Fig 4B and Fig 9D) and their polygonal pattern (Fig 2D) is completely different from the circumferential bands of endodermal muscle fibers.

      Reviewer 3

      Specific comments and suggestions have been answered above. Importantly, we show that the PNab antibody does not recognize cadherin and that it clearly stains all nerve cells in Hydra.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      Reply: We would like to give thanks for all of the highly valuable comments. Prior to our study, we were also convinced that the calcification model of Miliolid (porcelaneous) foraminifera was relatively solid. Nevertheless, our SEM imaging results surprisingly contradicted the old model. The main difference is the in situ biomineralization of calcitic needles that precipitate within the chamber wall after deposition of ACC-bearing vesicles. We agree that our fluorescence studies presented in the paper are not conclusive evidence for the calcification model used by the studied Miliolid species. However, our fluorescent results show that “the old model” (sensu Hemleben et al., 1986) is not completely outdated. Most of the fluorescent imaging data show a vesicular transport of substrates necessary for calcification. This transport is presented by Calcein labelling experiments (Movie 1 that show a high number of dynamic endocytic vesicles of sea water circulation within the cytoplasm. These very fine Calcein-labelled vesicles are most likely responsible for transport and deposition of Ca2+ ions. This is partly consistent with the model presented by Hemleben et al. (1986). We may speculate that calcite nucleation is already occurring within the transported vesicles, but at this stage of research we have no evidence for this phenomenon.

      Further live imaging fluorescence data show autofluorescence of vesicles upon excitation at 405 nm (emission 420–480 nm) associated with acidic vesicles marked by pH-sensitive LysoGlow84, may be a hint indicating association of ACC-bearing vesicles with acidic vesicles. Such spatial association of these vesicles may indicate a mechanism of pH elevation in the vesicles transporting Ca2+-rich gel to the calcifying wall of the new chamber.

      We will do our best to limit the physiological interpretation presented based on fluorescence studies in the revised version of the manuscript. We are convinced that our fluorescent live imaging experiments provide important observations in biomineralizing Miliolid foraminifera, which are still missing in the existing literature. It should be stressed that all the fluorescent experiments and SEM observations were based on specimens constructing and biomineralizing new chambers. All of them belong to the same species and come from the same culture. Due to the aforementioned reasons, it is worthwhile presenting these complimentary results of our study. In the future they may be helpful in further exploration and understanding of all aspects of calcification in foraminifera.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      Reply: We would like to acknowledge the review and all valuable comments. We do not argue that Miliolida and Rotallida utilise an identical calcification mechanism, but both groups utilize less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:

      The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the organic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing. There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      Reply: We used laser line 405nm and multiphoton excitation to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well, but it is not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance), but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminifera shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold will be shown.

      2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      Reply: We agree with the reviewer that biogenic minerals should, by definition, contain some organic components. We wrote that "no organic matter was detected between the needles of the porcelain structures” as we did not detect any organic structures based only on our FE-SEM observations. We are convinced that the shell incorporates a limited amount of organic matrix. We will rephrase this part of the text to avoid further confusion.

      3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Reply: Thank you for this comment. We will do our best to supplement our explanations. We are aware of the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Protist, https://doi.org/10.1016/j.protis.2022.125886) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to distinguish these types of polarization. We will aim to investigate this issue in our future research.

      4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Reply: Our intention was to express that mesocrystals are common biogenic components in the skeletons of marine organisms, however Miliolid needles that form distinct geometric shapes limited by planar crystalline faces are unique type of mesocrystals.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their valuable feedback and comments. Below we have addressed all points carefully and have, when needed, revised the manuscript accordingly.

      Note that we have taken the opportunity to correct minor typos and unclear text in the revised manuscript.

      Of importance to the editors and reviewers, we detected a few minor factual errors in the method section, which we have now corrected. The first error was that we wrongfully stated that our final dataset had 6358 unique TCRs, whereas it was in fact 6353 unique TCRs. The second error was that we stated that the maximum length of CDR1ꞵ was 5, where it was in fact 6. The last error was that we stated that we used a Levenshtein distance of at least 3 to discard similar peptides when swapping the TCRs to generate negatives. This should have been a Levenshtein greater than 3, to match the script we used to generate negatives (though no peptides had a Levenshtein distance of exactly 3).

      eLife assessment

      This important study reports on an improved deep-learning-based method for predicting TCR specificity. The evidence supporting the overall method is compelling, although the inclusion of real-world applications and clear comparisons with the previous version would have further strengthened the study. This work will be of broad interest to immunologists and computational biologists.

      It is not fully clear to us what is meant by “clear comparisons with the previous version”. In the manuscript we consistently compare the performance of each novel approach introduced to that of the ancestor NetTCR-2.1. Further, we concluded the manuscript with a performance to a large set of current state-of-the-art methods by training and evaluating the novel modeling framework on the IMMREP22 benchmark data.

      We agree that the manuscript can be improved by including a brief discussion of real-life applications of models for prediction of TCR specificity, and have included a brief text in the introduction.

      Reviewer #1 (Recommendations For The Authors):

      It was a great pleasure to read this article. All the concepts and motivations are clearly defined. I have just a few questions.

      What was the motivation behind employing a 1:5 positive-negative ratio? Could it be the cause of worse performance in the case of outliers?

      The ratio 1:5 is based on results from earlier work [36561755]. In this work, negatives were constructed as a mix of swapped and true (i.e measured) negatives with a ratio 1:5 for each. This work demonstrated a slight gain when including both types of negatives compared to only using swapped. In a subsequent publication [https://doi.org/10.1016/j.immuno.2023.100024], it demonstrated that optimal performance was obtained when only including swapped negatives (again in a ratio 1:5). Given this, we maintained this approach in the current work. It is clear that this choice is somewhat arbitrary, and that further work is needed to fully address this issue and the general issue of how to best generate negatives for ML of TCR specificity. Such work is in our view however beyond the scope of the current manuscript.

      Why is the patience of 200 epochs for peptide-specific models and 100 epochs for pan-specific and pre-trained models used in the context of the early stopping mechanism?

      We observed that the loss curve was overall very stable in the case of pan-specific training, likely due to the large amount of data included in this training. Therefore, these models were less likely to become stuck in a local minimum during training, meaning that a lower patience for early stopping would not prevent the model from learning optimally. In contrast, we found for some peptides that the loss curve was very erratic, and would sometimes become stuck in a local minimum for an extended time. To resolve this, the patience was increased from 100 to 200, which resulted in a better chance to escape these minima, as well as a better overall performance.

      Why is weight 3.8 used in the weighted loss function in the pan-specific model?

      The weighted loss was scaled with a division factor (c) of 3.8, in order to get an overall loss that was comparable to training without sample weights. This was primarily done to better compare the two approaches (scaling and no scaling) in terms of loss, and not so much to improve the training itself, as we already use a relatively conservative sample weight scaling based on log2. We have added a brief sentence to clarify this in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This work is the evolution of previous studies that developed the NetTCR platform, and in a previous paper cited in this study, the authors explore the paired dataset approach with "paired α/β TCR sequence data". In this manuscript, the authors should make clear what advances were made when compared to the previous study. This is not clear, although extensive reference is made to NetTCR 2.0 and 2.1. Differences are scattered throughout the manuscript, so I would suggest a section or paragraph clearly delineating the advances in model architecture and training when compared to previous versions recently published.

      It is not clear to us when the reviewer is referring to when stating “the authors should make clear what advances were made when compared to the previous study”. Throughout the manuscript we consistently compare the performance of each novel approach introduced to that of the ancestor NetTCR-2.1. In addition, we briefly discuss all of the changes to the architecture and training at the start of the discussion section. Further, we concluded the manuscript with a performance to a large set of current state-of-the-art methods by training and evaluating the novel modeling framework on the IMMREP22 benchmark data. It is correct that the advances are described progressively by introducing each novel approach one by one, i.e. refining the machine learning model architecture and training setup, data denoising in terms of outlier identification in the training data, new model architectures combining the properties of a pan- and peptide-specific model, and integration of similarity based approach to boost model performance). We believe this helps better justify the relevance of each of the novel approaches introduced.

      In Figure 3, the colors have labels, but they are not explained in the legend or in the text. This makes it very difficult to understand the data in the various columns. Also, since it represents the Mean AUC, the data would be best displayed with a boxplot or a mean and bars for variance.

      We agree, and have changed Figure 3 and its corresponding AUC 0.1 figure (Supplementary Figure 1) into a boxplot. We also further clarified what the different models were in the figure text.

      Given the potential impact of this work on bioengineering and biotechnology, I would suggest adding a paragraph or section to the discussion where potential applications of the current model, or examples of applications of previous (or competing) models have been used to further biological research.

      We agree and have added a brief sentence in the introduction to outline biotechnological applications of models for prediction of TCR specificity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Trenker et al. report cryo-EM structures of HER4/HER2 heterodimers and HER4 homodimers bound to Neuregulin-1b (Nrg1b) and Betacellulin (BTC). As observed for prior cryo-EM structures of full-length or near full-length HER-family receptors only the extracellular regions are visualized, presumably owing to flexibility in the relative orientation of extra- and intra-cellular regions. The authors observe no appreciable differences between Nrg1b and BTC bound heterodimers, both ligands, in this case being high-affinity ligands, and modest "scissor-like" differences in the subunit relationships in HER4 homodimers with Nrg1b and BTC bound.

      The authors also show that, as they showed for HER3, the HER4 dimerization arm is not indispensable for forming heterodimers with HER2 despite the HER4 dimerization arm forming a more canonical interaction with HER2. Perhaps most interestingly, the authors observe glycan interactions that appear to stabilize intra- and inter-subunit interactions in HER4 homodimers but that inter-subunit glycans are not present in HER2/HER4 heterodimers. The authors speculate that these glycan interactions may contribute to the apparent propensity of HER4 to homodimerize vs. heterodimerize with HER2.

      I realize that an important role of reviewers is to provide authors with informed and critical comments, but I found this manuscript a well-written, thoughtful, and important contribution. My only note is that I am not an electron microscopist so have assumed the microscopy has been carried out expertly and rely on other reviewers to vet structure determinations.

      We thank the reviewer for sharing our enthusiasm and the positive assessment of our manuscript. We have carefully reviewed the all microscopy-related concerns while responding to the assessment of reviewer #2.

      Reviewer #2 (Public Review):

      With the data presented in this manuscript, the authors help complete the set of high-resolution HER2-associated complex heterodimer structures as well as HER4 homodimer structures in the presence of NRG1b and BTC. Purification of HER2-HER4 heterodimers appears to be inherently challenging due to the propensity of HER4 to form homodimers. The authors have used an effective scheme to isolate these HER2-HER4 heterodimers and have employed graphene-oxide grid chemistry to presumably overcome the issues of low sample yield for solving cryo-EM structures of these complexes. The authors conclude HER2-HER4 heterodimers with either ligand are conformationally homogeneous relative to the HER4 homodimers. The HER2-HER4 heterodimers also appear to be better stabilized compared to other published HER2 heterodimers. The ability to model glycans in the context of HER4 homodimers is exciting to see and provides a strong rationale for the stability of these structures. Overall, the work is of great interest and the methods described in this work would benefit a wide variety of structural biology projects.

      We thank the reviewer for their positive assessment of our manuscript.

      Major comments:

      1) The HER2-HER4 heterodimer with BTC appears to be the lowest resolution of the reported structures. Although the authors claim the overall structure is similar to the HER2-HER4 heterodimer with NRG1b, it is therefore unclear whether the lower resolution of the BTC is due to challenging data collection conditions, sample preparation, or conformational dynamics not discernible due to the lower resolution. The authors should minimally clarify where they see the possible issues arising for the lower resolution as this is a key aspect of the work.

      The most likely reason for the lower resolution of the HER2/HER4/BTC reconstruction is not the underlying fundamental biology but a certain degree of preferred orientations in the sample, as can be seen from the directional FSC curves in the supplemental materials (Figure S3). We would like to note that while the overall resolution of the HER2/HER4/BTC reconstruction may be comparatively lower than other reconstructions presented in the manuscript, it remains of sufficiently high quality to substantiate our key claims. Specifically, our analysis indicates a close resemblance between the HER2/HER4/BTC reconstruction and the HER2/HER4/NRG reconstruction. For example, individual beta strands can still be well resolved allowing their accurate placement. There may be differences in features at higher resolution than 4.5Å between these two reconstructions which we cannot observe due to the lower resolution of HER2/HER4/BTC map, but these would amount to side chain motions rather than larger secondary structure movement. In the manuscript, we only draw comparisons between domain movements in different heterodimer structures and do not see any conformational variability in the final reconstructions, nor in their 3D classification analyses. Thus, we do not attribute the lower resolution of HER2/HER4/BTC reconstruction to increased dynamics at resolution scales that are discussed in the manuscript. What is more likely, is that variability in data quality, which we commonly observe between different GO grids, contributes to differences in resolution between different samples and potentially to the different orientation distributions. To comment on these possibilities, we added the following text to the manuscript (italic, underlined):

      Page 8 top paragraph:

      “Despite the diverse sequences of the NRG1β and BTC ligands, the larger-scale domain conformation of the HER2/HER4 heterodimers stabilized by each ligand is identical with only small differences in the ligand binding pockets (Figure 1d). Due to the lower resolution of the HER2/HER4/BTC complex, we cannot exclude the possibility of differences in side-chain arrangements between the two structures. However, we attribute the lower resolution to variability in data collection on GO grids, which we frequently observe, rather than differences in conformational heterogeneity of HER2/HER4/BTC.”

      Page 10, second paragraph:

      “Our cryo-EM structures of the full-length HER2/HER4 complexes bound to either NRG1β or BTC, did not reveal discernible differences at the receptor dimerization interface and larger-scale domain arrangements (Figure 1d).”

      2) For all maps, authors should display Euler angle plots from their final refinements to assess the degree of preferred orientation. Judging by the sphericity, it appears all the structures, except HER2-HER4-BTC, have well-sampled projection distributions. However, a formal clarification would be useful to the reader.

      We thank the reviewer for pointing this out. We regarded the 3DFSC curves included in our original submission as sufficient measure for projection distributions. In the revised manuscript, we now also include Euler angle plots from respective CryoSPARC refinements in the supplemental Figures.

      3) The authors should also include map-model FSCs to ascertain the quality of the map with respect to model building, as this is currently missing in the submission.

      We included map-model FSCs from Phenix validation runs in our supplemental material.

      Minor comments:

      1) With respect to complex formation, is there a reason why HER2 expression is dramatically lower than HER4?

      The expression of HER2 and HER4 in Expi293F cells, and consequently the amount of HER2 and HER4 receptors at the beginning of our first purification step, which is the NRG1b-mediated pulldown of HER4, is not noticeably different. After this initial purification step, a significant portion of HER2 is lost due to the fact that HER2/HER4 complexes constitute only a small fraction of the total HER complexes because HER4 homodimers preferentially tend to form. This is the reason why HER4 levels after the first purification step shown on the gel in Figure S1b are significantly higher than those of HER2. In the revised manuscript, in Figure S1d, we now show that both receptors are expressed at a comparable levels at the beginning of purification. In this experiment, levels of HER2-MBP-TS and HER4-TS purified separately from the equivalent volumes of transfected Exp293F cell culture via their shared TS-tags (MBP=Maltose Binding Protein, TS=Twin-Strep) are evaluated on a Coomassie-stained gel. When equal volumes of these elutions are then mixed and either subjected to HER4-directed pulldown using NRG1b-coated Flag-resin (lane 3, Figure S1d of the revised manuscript) or HER2-MBP-directed pulldown using amylose resin in the presence of NRG1b (lane 4, Figure S1d of revised manuscript), none of these pulldowns reveals substantial HER2/HER4 heterodimerization indicating that HER4 homodimerization is favored.

      2) Figures S1e authors should clarify if HER2 substitutions are VR alone or do these include GD substitutions as well. These should be suitably clarified in the main text.

      The HER2 constructs used in all cellular assays do not include the G778D mutation. We clarified this in Figure S1e, in the Materials and Methods section and in the main text on page 6.

      3) The validation reports for all 4 reported structures suggest the user-provided FSC-derived resolutions are different from those calculated by the deposition server. Are the masks deposited significantly different compared to the ones generated within cryoSPARC?

      The user-provided FSC-derived resolutions are different from those calculated by the server because the server only calculates resolution of unmasked curves from half maps while we provide the resolution derived from masked FSCs. These were all calculated using masks generated within the respective refinement job in cryoSPARC. However, we did notice that our author-provided FSC curves were from unmasked maps and we replaced the provided unmasked FSCs with masked FSCs as generated in cryoSPARC. These FSC plots in the validation reports now reflect the author-provided resolution in our validation reports and the plots generated by cryoSPARC shown in Figures S2, S3, S9 and S10.

      4) For interpretation regarding activation through phosphorylation in Figure 2e, have the authors considered HER4 could homodimerize as well? It appears from the data presented in Figure 4 and S12 that the propensity to form homodimers is greater for HER4 than to heterodimerize with HER2, despite the VR/IQ substitutions. This also appears to be supported by the reasonable amount of signal for pERK in lanes with HER4-IQ alone in the presence of NRG1b. It is recommended that the authors comment on this possibility.

      The IQ mutation, originally engineered to disrupt the receiver interface in EGFR, has been shown to have residual activity, which is greater than the mutation on the opposite site of the asymmetric dimer interface (VR) (PMID:16777603). This might be because this mutation partially destabilizes an inactive state of HER kinases by disrupting the hydrophobic interactions, which are both important for kinase inhibition and for stabilization of the active dimer. While IQ mutation is significantly inhibitory, as evidenced by the fact that we do not detect NRG1b-dependent HER4 phosphorylation in cells expressing HER4-IQ alone, it is possible that undetectable levels of phosphorylated HER4 cause the small increase in pERK signal. To acknowledge this possibility, we added the following sentence to the appropriate paragraph on page 10 in the main text:

      “Small increases in pERK levels in cells expressing the HER4-IQ construct are consistent with previous observations that the IQ mutation in HER kinase domains has small residual activity through homodimerization (PMID:16777603).”

      5) In the following line, "NRG1b-induced phosphorylation of HER2, HER4, ERK and AKT was not notably affected by substitution of the HER4 dimerization arm to a GS-arm relative to wild type receptors", it is unclear what the authors mean by wild-type receptors? There is presently no wildtype HER2 and/or HER4 tested in this blot.

      We thank the reviewer for pointing this out. Wild type receptors here refer to WT dimerization arm sequences in contrast to GS-arm mutants. We corrected the language in the appropriate place in the main text:

      “NRG1b-induced phosphorylation of HER2, HER4, ERK and AKT was not notably affected by substitution of the HER4 dimerization arm to a GS-arm relative to receptors featuring wild type dimerization arm sequences, indicating that the HER4 dimerization arm is not required for assembly and activation of HER2/HER4 heterodimers (Figure 2e).”

      6) Considering the asparagine residues can potentially mediate stabilization of HER2-HER4 dimers through glycosylation, the authors should include western blot data for receptor-activation for mutants where glycosylation can be disrupted. This could minimally instruct the reader on how functionally relevant the identified interactions like N576-N358 are.

      We agree with the Reviewer that this is a very interesting and important point, and it is subject of our future investigations. The different spectra of glycosylation that we observe between HER4 homodimers and HER2/HER4 heterodimers suggest that glycans will modulate these interactions differently. We speculate that glycans will likely be more important for HER4 homodimerization where glycosylation is more pronounced in our reconstructions. To investigate how these interactions change in the absence of single glycan modifications or their combinations, will also require taking into consideration how glycan mutations will alter an equilibrium between HER4 homodimers and HER2/HER4 heterodimerization. Such studies will require months of mutagenesis and optimization of controlled expression of such mutants, ideally generation of stable cell lines, and likely and ideally structural follow up studies. We respectfully argue that this undertaking is beyond the main scope of the current manuscript, and conceptually constitutes a separate, very important question that we are working on.

      Reviewer #1 (Recommendations For The Authors):

      The structural coordinates should be deposited in the RCSB.

      The coordinates will be released upon publication of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Figure S1b authors should ideally include a silver stain gel to assess the purity of the heterodimer-ligand complex. Although HER subunits are discernible, there is no clear band for NRG1b.

      Given its small size (9.7 kDa) our NRG1b construct is typically difficult to detect in our samples, but we would like to respectfully argue that the fact that we can resolve it at high resolution in our cryo-EM reconstructions provides sufficient evidence that it is present. Likewise, we argue that the Coomassie-stained gel we present in the manuscript is sufficient. It demonstrates that our purifications yield a stoichiometric complex of enough purity to obtain a high resolution cryo-EM reconstruction. Since we are not making any other claims about these preparations, we respectfully argue that providing a silver stain gel is not necessary to support conclusions of our study.

      We thank the reviewer for point this out. To best reflect what we wanted to convey, we change it to: “and is the same as observed in structures of an isolated HER2 ectodomain.”

      3) Page 8 first paragraph line 3, although one can deduce where the ligand binding pocket is, it would be clearer if this is marked in Figure 1d.

      We added arrows in the figure to indicate the ligand-binding pocket.

      4) Figure 2b inset A needs to be labeled 'A'.

      The inset was already labelled but in a different corner. We rearranged the label to make it clearer.

      5) Figure S5c will benefit from inset images zooming into the dimerization arm. It is hard to visualize the subtleties of the structural changes in the current format.

      Figure 5c predominantly shows side-views of various heterodimer overlays to highlight subtle differences in larger-scale assembly that correlate with differences in dimerization arm engagement. This side-orientation is not suitable for zooming into the dimerization arm regions, which can only be effectively visualized in front views (the view of the heart-shaped dimer illustrated in Figure 1a). We show a zoomed-in view of this representation in main Figure 2c, which is what we understand the Reviewer is requesting.

      6) Fig 3e is it A102 or A202 in the bottom-most panel.

      This is now corrected, thank you.

      7) Fig S9 revisit the color code for NRG1b, it appears there is no blue subunit of NRG1b. Also revisit the RMSD in the figure legend, since the text appears to suggest a different set of RMSDs for the 3 overlays.

      We fixed the color code in the Figure, thank you.

      In reference to Figure S9 (Figure S11 in the revised manuscript) we discuss two types of RMSDs:

      1) RMSDs between our cryo-EM homodimers and the crystal structure homodimers. The structure overlays are shown in Figure S9a and RMSD values were mentioned in the Figure legends. However, in the original manuscript we did not explicitly mention these values in the main text but have now added them to the main text of the revised version of the manuscript.

      2) RMSDs between monomers within our cryo-EM structures and within monomers of the crystal structure. Figure S11b and Figure S11c of the revised manuscript show these overlays for the cryo-EM structures only and the values are present in the Figure legend. We do not show the respective overlay for the crystal structures, which is why the values are not mentioned in the Figure legends, but we discuss the values in the main text.

      We recognize that this is confusing and added RMSD values for 1. to the main text and discuss this more carefully:

      “Our cryo-EM structures of the HER4/NRG1b homodimer differs slightly from the three HER4/NRG1b homodimers per asymmetric unit in the 3U7U crystal structure in which each monomer adopts a different orientation of the domain IV relative to the rest of the ectodomain (Figure S9a, RMSD: 5.438 Å, 5.435 Å and 3.662 Å). Notably, our two cryo-EM HER4 homodimer structures are more symmetric than the crystal structures of the HER4/NRG1β ectodomain homodimer. RMSDs for monomers within our cryo-EM structures are 1.42 Å in the cryo-EM HER4/NRG1b homodimer and 1.58 Å in the HER4/BTC homodimer (Figure S9b+c) compared to the monomers in the crystal structures which align with RMSDs of 1.67 Å, 5.76 Å and 2.38 Å”

      8) Page 12 paragraph 2 last line, expand on the abbreviation NAG.

      It is now expanded.

      9) What is the slit width used for the energy filter during data collection?

      The slit width was 20 eV. We added this information to the Methods section.

      10) The crosslinking conditions of 0.2% glutaraldehyde for 40 min on ice, with no quenching seems rather harsh. Have the authors attempted other crosslinking conditions? Do milder conditions or GraFix not help with complex stabilization?

      We thank the Reviewer for pointing this out. The reaction was quenched after 40 min by addition of 40 µl of 1M Tris pH 7.4 buffer. This information is now included in the Methods section. We have screened ideal crosslinking conditions for HER4 homodimers, and previously for HER2/HER3 heterodimers, and found that these crosslinking conditions were the mildest conditions that achieved complete crosslinking as assessed by SDS-PAGE.

      11) Have the authors used default parameters for all their data processing steps? Were additional steps like local per-particle CTF refinement and global defocus refinement employed during refinement?

      We did not perform any per particle CTF refinements as we previously have not observed any improvement from running such refinement on our size particles on top of per patch CTF estimation that already takes into account local CTF differences per micrograph. To make the manuscript clearer in this regard we added the following statement to the Methods section: “Unless specifically mentioned here or in the processing workflow, default parameters in CryoSPARC were used for each processing step.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary:

      In this interesting work, the authors investigated an important topical question: when we see travelling waves in cortical activity, is this due to true wave-like spread, or due to sequentially activated sources? In simulations, it is shown that sequential brain module activation can show up as a travelling wave - even in improved methods such as phase delay maps - and a variety of parameters is investigated. Then, in ex-vivo turtle eye-brain preparations, the authors show that visual cortex waves observable in local field potentials are in fact often better explained as areas D1 and D2 being sequentially activated. This has implications for how we think about travelling wave methodology and relevant analytical tools.

      Strengths:

      I enjoyed reading the discussion. The authors are careful in their claims, and point out that some phenomena may still indeed be genuine travelling waves, but we should have a higher evidence bar to claim this for a particular process in light of this paper and Zhigalov & Jensen (2023) (ref 44). Given this careful discussion, the claims made are well-supported by the experimental results. The discussion also gives a nice overview of potential options in light of this and future directions.

      The illustration of different gaussian covariances leading to very different latency maps was interesting to see.

      Furthermore, the methods are detailed and clearly structured and the Supplementary Figures, particularly single trial results, are useful and convincing.

      We are glad the reviewer found our manuscript “interesting”, the questions we raise “important”, our claims “well-supported by the experimental results”, and our methods “detailed and clearly structured”.

      The details of the sequentially activated Gaussian simulations give some useful results, but the fundamental idea still appears to be "sequential activation is often indistinguishable from a travelling wave", an idea advanced e.g. by Zhigalov & Jensen (2023). It takes a while until the (in my opinion) more intriguing experimental results.

      To emphasize the experimental results, we switched between the analytical results and the experimental results. Correspondingly, figure 2 now illustrates the more intriguing experimental results and figure 3 the analytical results. In addition, we added subtitles to the different sections of the results to ease the navigation through the paper and to enable the readers to access the different sections more easily.

      One of the key claims is that the spikes are more consistent with two sequentially activated modules rather than a continuous wave (with Fig 3k and 3l key to support this). Whilst this is more consistent, it is worth mentioning that there seems to be stochasticity to this and between-trial variability, especially for spikes.

      In the revised manuscript we added the reviewer’s comment about stochasticity, and we discuss its possible origins:

      "The transition was also not clear when examining spiking responses in some of the trials (as indicated by high DIP scores, Figure 2K). However, the observation that temporal grouping became more pronounced when using ALSA (a more robust estimate of local excitability) (Figure 2L,N), suggests that high DIP values may result from variability in the spike times of single neurons, and not necessarily from the lack of modular activation. Such issues can be resolved by denser sampling of spiking activity in the tissue."

      Recommendations For The Authors:

      The eye-cortex turtle preparation is not the most common. I would add more context about how specific the results are to this preparation vs how comparable it is to human data.

      We added a sentence explaining the relevance of our preparation: “Finally, while the layered organization of turtle cortex is different than that of mammalian cortex, the basic excitability features of both tissues are similar (Connors and Kriegstein, 1986; Hemberger et al., 2019; Kriegstein and Connors, 1986; Larkum et al., 2008; Shein-Idelson et al., 2017b), and substantial differences in the manner by which field potentials and spikes spread through the tissue are not to be expected.”

      Philosophical question: when does a 'module' become small enough for it to count as a travelling wave? More on this could be added to the discussion. I think we are in the very early days for a true understanding of travelling waves, and I wonder if these sequentially activated modules will functionally correspond to the known cortical segregation, or if it varies by area/task.

      We agree with the reviewer that macroscopic waves could be composed of smaller modules (or single neurons at the smallest scale). Our results suggest that modular patterns can be classified as wave patterns both at large scales (of brain areas) and smaller scales of local neural circuits. Therefore, we believe it is necessary to make this distinction across different scales. We sharpened this point in the first paragraph of the discussion:

      "…We showed that LFP measurements indicative of waves propagating across turtle cortex are underlined by discrete and consecutively activated neuronal populations, and not by a continuously propagating wavefront of spikes (Figure 2). Similarly, activation profiles that resemble continuous travelling waves in EEG simulations can be underlined by consecutive activation of two discrete cortical regions (Figure 1). We replicated these results using an analytical model and demonstrated that a simple scenario of sequentially activated Gaussians can exhibit WLPs with a rich diversity of spatiotemporal profiles (Figure 3). Our results offer insight into the scenarios and conditions for WLP detection by identifying failure points that should be considered when identifying travelling waves and therefore suggest caution when interpreting continuous phase latency maps as microscopically propagating wave patterns. Such failure points may exist both when examining activity at the scale of brain regions (Figure 1) and smaller neural circuits (Figure 2). Therefore, our results suggest that the discrepancy between modular and wave activation should be examined across spatial scales. Specifically, it is not necessarily the case that at the fine grained (single neuron) scale activation patterns are modular, but, following coarse graining, smooth wave patterns emerge. Rather, modular activation may hierarchically exist across scales (Kaiser and Hilgetag, 2010; Meunier et al., 2010) and may be masked by smeared spatial supra-threshold excitability boundaries. Below we discuss these limitations across techniques and their implications.”

      I would advise the authors to focus on the experimental data, perhaps by putting the simulations second, and by putting some of the equation details that are in Methods into the Supplementary Information. Whilst the simulation parameter space is well-explored, the fundamental idea of spreading Gaussians is relatively simple, and the current manuscript organization detracted from the main message for me a little bit.”

      Following the referee’s suggestion, we switched between the section with experimental data and the one with the analytic model (see response to comment 1). In addition, to ease the reading of the methods, we moved the mathematical derivation and related equations to appendix 1.

      Things I thought about that you may also enjoy thinking about: Could we tell something about sequential sources vs travelling waves by the nature of the wave - e.g. shape or dispersion? If some wave properties are conserved whilst travelling, this could be evidence for travelling vs two sources.

      This is a wonderful suggestion. We are currently working on a follow up publication with a new approach to do exactly that! We think that this new body of work is outside the scope of this paper.

      Could synaptic potentials spread like waves, but spikes more in modular bursts? This would also explain the LFP vs spikes difference - maybe travelling waves of EPSPs are there priming the network, 'looking' for suitable modules to activate, which then activate sequentially. The current discussion is quite spike-focused - could some information be in synaptic potentials after all?

      This is an interesting idea with intriguing functional implications. We added this idea to our discussion (see paragraph below). In addition, to emphasize our discussion on synaptic potentials, we reorganized the paragraphs in the discussion to separate between our discussion on sub-threshold excitability (which is mostly synaptic) and supra-threshold excitability which is the focus of the second part of the discussion.

      “Variability in responses may also be explained by differences in propagation mechanisms (Ermentrout and Kleinfeld, 2001; Muller et al., 2018; Wu et al., 2008). Several reports suggest that waves are underlined by propagation along axonal collaterals (Muller et al., 2018, 2014). Both the transmembrane voltage-gated currents excited during action potentials as well as the post-synaptic currents along axonal boutons can potentially contribute to measured signals. However, such waves travel at high propagation speeds and are not compatible with the wide diversity of wave velocities and mechanisms of local neuronal interactions (Ermentrout and Kleinfeld, 2001; Feller et al., 1996). An intriguing possibility is that such axonal waves prime neuronal excitability by sub-threshold inputs that later result in modular supra-threshold activation. The ability to experimentally discriminate between axonal inputs and local spiking excitability (e.g. by reporters with different wavelengths) can potentially resolve such discrepancies.

      Our turtle cortex results (Figure 2) exemplify how contrasting sub-threshold LFP measurements with supra-threshold spiking measurements can yield different conclusions about the nature of activity spread….”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      […] While this does not rule out criticality in the brain, it decidedly weakens the evidence for it, which was based on the following logic: critical systems give rise to power law behavior; power law behavior is observed in cortical networks; therefore, cortical networks operate near a critical point. Given, as shown in this paper, that power laws can arise from noncritical processes, the logic breaks. Moreover, the authors show that criticality does not imply optimal information transmission (one of its proposed functions). This highlights the necessity for more rigorous analyses to affirm criticality in the brain. In particular, it suggests that attention should be focused on the question "does the brain implement a dynamical latent variable model?".

      These authors are not the first to show that slowly varying firing rates can give rise to power law behavior (see, for example, Touboul and Destexhe, 2017; Priesemann and Shriki, 2018). However, to our knowledge they are the first to show crackling, and to compute information transmission in the critical state.

      We thank the reviewers for their thoughtful assessment of our paper.

      We would push back on the assessment that our model ‘has nothing to do with criticality,’ and that we observed ‘signatures of criticality [that] emerge through fundamentally non-critical mechanisms.’ This assessment partially stems from the definition of criticality provided in the Public Comment, that ‘criticality is a very specific set of phenomena in physics in which fundamentally local interactions produce unexpected long-range behavior.’

      Our disagreement is largely focused on this definition, which we do not think is a standard definition. Taking the favorite textbook example, the Ising model, criticality is characterized by a set of power-law divergences in thermodynamic quantities (e.g., susceptibility, specific heat, magnetization) at the critical temperature, with exponents of these power laws governed by scaling laws. It is not defined by local interactions. All-to-all Ising model is generally viewed as showing a critical behavior at a certain temperature, even though interactions there are manifestly non-local. It is possible that, by “local” in the definition, the Public Comment meant that interactions are “collective” and among microscopic degrees of freedom. However, that same all-to-all Ising model is mathematically equivalent to the mean-field model, where criticality is achieved through large fluctuations of the mean field, but not through microscopic interactions.

      More commonly, criticality is defined by power laws and scaling relationships that emerge at a critical value of a parameter(s) of the system. That is, criticality is defined by its signatures. What is crucial in all such definitions is that this atypical, critical state requires fine tuning. For example, in the textbook example of the Ising model, a parameter (the temperature) must be tuned to a critical value for critical behavior to appear. In the branching process model that generates avalanche criticality, criticality requires tuning m=1. The key result of our paper is that all signatures expected for avalanche criticality (power laws, crackling, and, as shown below, estimates of the branching rate m), and hence the criticality itself, appear without fine-tuning.

      As we discussed in our introduction, there are a few other instances of signatures of criticality (and hence of criticality itself) emerging without fine-tuning. The first we are aware of was the demonstration of Zipf’s Law (by Schwab, et al. 2014, and Aitchison et al. 2016), a power-law relationship between rank and frequency of states, which was shown to emerge generically in systems driven by a broadly distributed latent variable. A second example, arising from applications of coarse-graining analysis to neural data (cf., Meshulam et al. 2019; also, Morales et al., 2023), was demonstrated in our earlier paper (Morrell et al. 2021). Thus, here we have a third example: the model in this paper generates signatures of criticality in the statistics of avalanches of activity, and it does so without fine-tuning (cf., Fig. 2-3).

      The rate at which these ‘criticality without fine-tuning' examples are piling up may inspire revisiting the requirement of fine-tuning in the definition of criticality, and our ongoing work (Ngampruetikorn et al. 2023) suggests that criticality may be more accurately defined through large fluctuations (variance > 1/N) rather than through fine-tuning or scaling relations.

      References:

      • Schwab DJ, Nemenman I, Mehta P. “Zipf’s Law and Criticality in Multivariate Data without FineTuning.” Phys Rev Lett. 2014 Aug; doi::101103/PhysRevLett.113.068102,

      • Aitchison L, Corradi N, Latham PE. “Zipf’s Law Arising Naturally When There Are Underlying, Unobserved Variables.” PLOS Computational biology. 2016 12; 12(12):1-32. doi:10.1371/journal.pcbi.1005110

      • Meshulam L, Gauthier JL, Brody CD, Tank DW, Bialek W. “Coarse Graining, Fixed Points, and Scaling in a Large Population of Neurons.” Phys Rev Lett. 2019 Oct; doi: 10.1103/PhysRevLett.123.178103.

      • Morales GB, di Santo S, Muñoz MA. “Quasiuniversal scaling in mouse-brain neuronal activity stems from edge-of-instability critical dynamics.” Proceedings of the National Academy of Sciences. 2023; 120(9):e2208998120.

      • Morrell MC, Sederberg AJ, Nemenman I. “Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systems.” Phys Rev Lett. 2021 Mar; doi: 10.1103/PhysRevLett.126.118302.

      • Ngampruetikorn, V., Nemenman, I., Schwab, D., “Extrinsic vs Intrinsic Criticality in Systems with Many Components.” arXiv: arXiv:2309.13898 [physics.bio-ph]

      Major comments:

      1) For many readers, the essential messages of the paper may not be immediately clear. For example, is the paper criticizing the criticality hypothesis of cortical networks, or does the criticism extend deeper, to the theoretical predictions of "crackling" relationships in physical systems as they can emerge without criticality? Statements like "We show that a system coupled to one or many dynamical latent variables can generate avalanche criticality ..." could be misinterpreted as affirming criticality. A more accurate language is needed; for instance, the paper could state that the model generates relationships observed in critical systems. The paper should provide a clearer conclusion and interpretation of the findings in the context of the criticality hypothesis of cortical dynamics.

      Please see the response to the Public Review, above. To clarify the essential message that the dynamical latent variable model produces avalanche criticality without fine-tuning, we have made revisions to the abstract and introduction. This point was already made in the discussion (first sentence).

      Key sentences changed in the abstract:

      "… We find that populations coupled to multiple latent variables produce critical behavior across a broader parameter range than those coupled to a single, quasi-static latent variable, but in both cases, avalanche criticality is observed without fine-tuning of model parameters. … Our results suggest that avalanche criticality arises in neural systems in which activity is effectively modeled as a population driven by a few dynamical variables and these variables can be inferred from the population activity."

      In the introduction, we changed the final sentence to read:

      "These results demonstrate how criticality in neural recordings can arise from latent dynamics in neural activity, without need for fine-tuning of network parameters."

      2) On lines 97-99, the authors state that "We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from recurrent dynamics locally". This idea is also repeated at the beginning of the Summary section. Perhaps being agnostic isn't such a good idea: it's possible that the recurrent dynamics is in a critical regime, which would just push the problem upstream. Presumably you're thinking of recurrent dynamics with slow timescales that's not critical? Or are you happy if it's in the critical regime? This should be clarified.

      We have amended this sentence to clarify that any latent dynamics with large fluctuations would suffice:

      ”We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from large fluctuations in local recurrent dynamics.”

      3) Even though the model in Equation 2 has been described in a previous publication and the Methods section, more details regarding the origin and justification of this model in the context of cortical networks would be helpful in the Results section. Was it chosen just for simplicity, or was there a deeper reason?

      This model was chosen for its simplicity: there are no direct interactions between neurons, coupling between neurons and latent variables is random, and simulation is straightforward. More complex latent dynamics or non-random structure in the coupling matrices could have been used, but our aim was to explore this model in the simplest setting possible.

      We have revised the Results (“Avalanche scaling in a dynamical latent variable model,” first paragraph) to justify the choice of the model:

      "We study a model of a population of neurons that are not coupled to each other directly but are driven by a small number of dynamical latent variables -- that is, slowly changing inputs that are not themselves measured (Fig.~\ref{fig:fig1}A). We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from large fluctuations in local recurrent dynamics. The model was chosen for its simplicity, and because we have previously shown that this model with at least about five latent variables can produce power laws under the coarse-graining analysis \citep{Morrell2021}."

      We have added the following to the beginning of the Methods section expanding on the reasons for this choice:

      "We study a model from Morrell 2021, originally constructed as a model of large populations of neurons in mouse hippocampus. Neurons are non-interacting, receiving inputs reflective of place-field selectivity as well as input current arising from a random projection from a small number of dynamical latent variables, representing inputs shared across the population of neurons that are not directly measured or controlled. In the current paper, we incorporate only the latent variables (no place variables), and we assume that every cell is coupled to every latent variable with some randomly drawn coupling strength."

      4) The Methods section (paragraph starting on line 340) connects the time scale to actual time scales in neuronal systems, stating that "The timescales of latent variables examined range from about 3 seconds to 3000 seconds, assuming 3-ms bins". While bins of 3 ms are relevant for electrophysiological data from LFPs or high-density EEG/MEG, time scales above 10 seconds are difficult to generate through biophysically clear processes like ionic channels and synaptic transmission. The paper suggests that slow time scales of the latent variables are crucial for obtaining power law behavior resembling criticality. Yet, one way to generate such slow time scales is via critical slowing down, implying that some brain areas providing input to the network under study may operate near criticality. This pushes the problem toward explaining the criticality of those external networks. Hence, discussing potential sources for slow time scales in latent variables is crucial. One possibility you might want to consider is sources external to the organism, which could easily have time scales in the 1-24 hour range.

      As the reviewers note, it is a possibility that slow timescales arise from some other brain area in which dynamics are slow due to critical dynamics, but many other plausible sources exist. These include slowly varying sensory stimuli or external sources, as suggested by the reviewers. It is also possible to generate “effective” slow dynamics from non-critical internal sources. One example, from recordings in awake mice, is the slow change in the level of arousal that occurs on the scale of many seconds to minutes. These changes arise from release of neuromodulators that have broad effects on neural populations and correlations in activity (for a focused review, see Poulet and Crochet, 2019).

      We have added the following sentence to the Methods section where timescales of latent variables was discussed:

      "The timescales of latent variables examined range from about $3$ seconds to $3000$ seconds, assuming $3$-ms bins. Inputs with such timescales may arise from external sources, such as sensory stimuli, or from internal sources, such as changes in physiological state."

      5) It is common in neuronal avalanche analysis to calculate the branching parameter using the ratio of events in consecutive bins. Near-critical systems should display values close to 1, especially in simulations without subsampling. Including the estimated values of the branching parameter for the different cases investigated in this study could provide more comprehensive data. While the paper acknowledges that the obtained exponents in the model differ from those in a critical branching process, it would still be beneficial to offer the branching parameter of the observed avalanches for comparison.

      The reviewers requested that the branching parameter be computed in our model. We point out that, for the quasi-stationary latent variables (as in Fig. 3), a branching parameter of 1 is expected because the summed activity at time t+k is, on average, equal to the summed activity at time t, regardless of k. Numerics are consistent with this expectation. Following the methodology for an unbiased estimate of the branching parameter from Wilting and Priesemann (2018), we checked an example set of parameters (epsilon = 8, eta = 3) for quasi-stationary latent fields. We found that the naïve (biased) estimate of the branching parameter was 0.94, and that the unbiased estimator was exp(−1.4⋅10−8) ≈ 0.999999986.

      For faster time scales, it is no longer true that summed activity is constant over time, as the temporal correlations in activity decay exponentially. Using the five-field simulation from Figure 2, we calculated the branching parameter for several values of tau. The biased estimates of m are 0.76 (𝜏=50), 0.79 (𝜏=500), and 0.79 (𝜏=5000). The corrected estimates are 0.98 (𝜏=50), 0.998 (𝜏=500), and 0.9998 (𝜏=5000).

      6) In the Discussion (l 269), the paper suggests potential differences between networks cultured in vitro and in vivo. While significant differences indeed exist, it's worth noting that exponents consistent with a critical branching process have also been observed in vivo (Petermann et al 2009; Hahn et al. 2010), as well as in large-scale human data.

      We thank the reviewers for pointing out these studies, and we have added the missing one (Hahn et al. 2010) to our reference list. The following was added to the discussion, in the section “Explaining Experimental Exponents:”

      "A subset of the in vivo recordings analyzed from anesthetized cat (Hahn et al. 2010) and macaque monkeys (Petermann et al. 2009) exhibited a size distribution exponent close to 1.5."

      Along these lines, we noted two additional studies of high relevance that have been published since our initial submission (Capek et al. 2023, Lombardi et al. 2023), and we have added these references to the discussion of experimental exponents.

      Minor comments:

      1) The term 'latent variable' should be rigorously explained, as it is likely to be unfamiliar to some readers.

      Sentences and clauses have been added to the Introduction, Results and the Methods to clarify the term:

      Intro: “Numerous studies have reported relatively low-dimensional structure in the activity of large populations of neurons [refs], which can be modeled by a population of neurons that are broadly and heterogeneously coupled to multiple dynamical latent (i.e., unobserved) variables.”

      Results: “We studied a population of neurons that are not coupled to each other directly but are driven by a small number of dynamical latent variables -- that is, slowly changing inputs that are not themselves measured.”

      Methods: “Neurons are non-interacting, receiving inputs reflective of place-field selectivity as well as input current reflecting a random projection from a small number of dynamical latent variables, representing inputs shared across the population of neurons that are not directly measured.”

      2) There's a relatively important typo in the equations: Eq. 2 and Eq. 6 differ by a minus sign in the exponent. Eqs. 3 and 4 use the plus sign, but epsilon_0 on line 198 uses the minus sign. All very confusing until we figured out what was going on. But easy to fix.

      Thank you for catching this. We have made the following corrections:

      1) Figures adopted the sign convention that epsilon > 0, with larger values of epsilon decreasing the activity level. Signs in Eqs. 3 and 4 have been corrected to match.

      2) Equation 5 was missing a minus sign in front of the Hamiltonian. Restoring this minus sign fixed the discrepancy between 2 and 6.

      3) In Eq. 7, the left hand side is zeta'/zeta', which is equal to 1. Maybe it should be zeta'/zeta? Fixed, thank you.

      Additional comments:

      The authors are free to ignore these; they are meant to improve the paper.

      We are extremely grateful for the close reading of our paper and note the actions taken below.

      1) We personally would not use the abbreviation DLV; we find abbreviations extremely hard to remember. And DLV is not used that often.

      Done, thank you for the suggestion.

      2) l 198: epsilon_0 = -log(2^{1/N}-1) was kind of hard to picture -- we had to do a little algebra to make sense of it. Why not write e^{-epsilon_0} = 2^{1/N}-1 \approx log(2)/N, which in turn implies that epsilon_0 ~ log(N)?

      Thank you, good point. We have added a sentence now to better explain:

      "...which is maximized at $\epsilon_0 = - \log (2^{1/N} - 1)$, independent of $J_i$ and $\eta$. After some algebra, we find that $\epsilon_0 \sim \log N$ for large $N$."

      3) Typo on l 202: "We plot P_ava as a function of epsilon in Fig. 4B". 4B --> 4D.

      Done

      4) It would be easier on the reader if the tables were all in one place. It would be even nicer to put the parameters in the figure captions. Or at least N; that one is kind of important.

      Table placement was a Latex issue, which we have now fixed. We also have included links between tables and relevant figures and indicated network size.

      5) What's x_i in Eqs. 7 and 8?

      We added a sentence of explanation. These are the individual observations of avalanche sizes or durations, depending on what is being fit.

      6) The latent variables evolve according to an Ornstein-Uhlenbeck process. But we might equally expect oscillations or non-normal behavior coupling dynamical modes, and these are likely to give different behavior with respect to avalanches. It might be worth commenting on this.

      7) The model assumes a normal distribution of the coupling strengths between the latent variables and the binary units. Discussing the potential effects of different types of random coupling could provide interesting insights.

      Both 6 and 7 are interesting questions. At this point, we could speculate that the main results would be qualitatively unchanged, provided dynamics are sufficiently slow and that the distribution of coupling strengths is sufficiently broad (that is, there is variance in the coupling matrix across individual neurons). Further studies would be needed to make these statements more precise.

      8) In Fig 1, tau_f = 1E4 whereas in Fig 2 tau_f = 5E3. Why the difference?

      For Figure 1, we chose a set of parameters that gave clear scaling. In Figure 2, we saw some value in showing more than one example of scaling, hence different parameters for the examples in Fig 2 than Fig 1. Note that the Fig 1 simulations are represented in Fig. 2 G-J, as the 5-field simulation with tau_F = 1e4.

  2. Jan 2024
    1. Author Response

      eLife assessment

      This study presents a valuable finding on a new role of Foxp3+ regulatory T cells in sensory perception, which may have an impact on our understanding of somatosensory perception. The authors identified a previously unappreciated action of enkephalins released by immune cells in the resolution of pain and several upstream signals that can regulate the expression of the proenkephalin gene PENK in Foxp3+ Tregs. However, whereas the generation of transgenic mice with conditional deletion of PENK in Foxp3+ cells and PENK fate-mapping is novel and generates compelling data, they show an incomplete analysis of Tregs in the control and transgenic mice, proper tamoxifen controls nor the role of PENK+ skin T cells to further support their hypothesis. Nonetheless, the study would be of interest to the biologists working in the field of neuroimmunology and inflammation.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors explore mechanisms through which T-regs attenuate acute pain using a heat sensitivity paradigm. Analysis of available transcriptomic data revealed expression on the proenkephalin (Penk) gene in T-regs. The authors explore the contribution of T-reg Penk in the resolution of heat sensitivity.

      Strengths:

      Investigating the potential role of T-reg Penk in the resolution of acute pain is a strength.

      Weaknesses:

      The overall experimental design is superficial and lacks sufficient rigor to draw any meaningful conclusions.

      For instance:

      1) The were no TAM controls. What is the evidence that TAM does not alter heat-sensitive receptors.

      Author response : By comparing panel A and C, it appears that heat-sensitivity in controls (blue dots) is slightly different before and after TMX administration, suggesting that heat-sensitive receptors are moderately altered by TMX per se. However, heat sensitivity is increased by two fold in KO animals. Thus, a possible effect of TAM on heat receptors is not responsible for the heat hyperalgesia seen in KO, as shown in figure 4 and S3.

      2) There are no controls demonstrating that recombination actually occurred. How do the authors know a single dose of TAM is sufficient?

      Author response : these experiments are in progress. Specificity of the deletion will be presented in an updated version of the manuscript in the near future.

      3) Why was only heat sensitivity assessed? The behavioral tests are inadequate to derive any meaningful conclusions. Further, why wasn't the behavioral data plotted longitudinally

      Author response : We respectfuly point the reviewer to figure S3 where the longitudinal data are presented. New behavorial tests are being performed. The results will be presented in a revised version.

      Reviewer #2 (Public Review):

      Summary:

      The present study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Strengths:

      The manuscript is clear and reveals a previously unappreciated role of enkephalins, as released by immune cells, in sensory perception. The rationale in this manuscript is easy to follow, and conclusions are well supported by data.

      Weaknesses:

      The sensory deficit of Penk cKO appears to be quite limited compared to control littermates.

      Reviewer #3 (Public Review):

      Summary:

      Aubert et al investigated the role of PENK in regulatory T cells. Through the mining of publicly available transcriptome data, the authors confirmed that PENK expression is selectively enriched in regulatory but not conventional T cells. Further data mining suggested that OX40, 4-1BB as well as BATF, can regulate PENK expression in Tregs. The authors generated fate-mapping mice to confirm selective PENK expression in Tregs and activated effector T cells in the colon and spleen. Interestingly, transgenic mice with conditional deletion of PENK in Tregs resulted in hypersensitivity to heat, which the authors attributed to heat hyperalgesia.

      Strengths:

      The generation of transgenic mice with conditional deletion of PENK in foxp3 and PENK fate-mapping is novel and can potentially yield significant findings. The identification of upstream signals that regulate PENK is interesting but unlikely to be the main reason why PENK is predominantly expressed in Tregs as both BATF and TNFR are expressed in effector T cells.

      Weaknesses:

      There is a lack of direct evidence and detailed analysis of Tregs in the control and transgenic mice to support the authors' hypothesis. PENK was previously reported to be expressed in skin Tregs and play a significant role in regulating skin homeostasis: this should be considered as an alternative mechanism that may explain the changed sensitivity to heat observed in the paper.

      Author response : Supplementary figures are being prepared and new results are being collected to show that the KO do not perturb immune and/or skin homeostasis at the time of the experiments. These will be presented in a revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The present study provides a phylogenetic analysis of the size prefrontal areas in primates, aiming to investigate whether relative size of the rostral prefrontal cortex (frontal pole) and dorsolateral prefrontal cortex volume vary according to known ecological or social variables.

      I am very much in favor of the general approach taken in this study. Neuroimaging now allows us to obtain more detailed anatomical data in a much larger range of species than ever before and this study shows the questions that can be asked using these types of data. In general, the study is conducted with care, focusing on anatomical precision in definition of the cortical areas and using appropriate statistical techniques, such as PGLS. That said, there are some points where I feel the authors could have taken their care a bit further and, as a result, inform the community even more about what is in their data.

      We thank the reviewer for this globally positive evaluation of our work, and we appreciate the advices to improve our manuscript.

      The introduction sets up the contrast of 'ecological' (mostly foraging) and social variables of a primate's life that can be reflected in the relative size of brain regions. This debate is for a large part a relic of the literature and the authors themselves state in a number of places that perhaps the contrast is a bit artificial. I feel that they could go further in this. Social behavior could easily be a solution to foraging problems, making them variables that are not in competition, but simply different levels of explanation. This point has been made in some of the recent work by Robin Dunbar and Susanne Shultz.

      Thank you for this constructive comment, and we acknowledge that the contrast between social vs ecological brain is relatively marginal here. Based also on the first remark by reviewer 3, we have reformulated the introduction to emphasize what we think is actually more critical: the link between cognitive functions as defined in laboratory conditions and socio-ecological variables measured in natural conditions. And the fact that here, we use brain measures as a potential tool to relate these laboratory vs natural variables through a common scenario. Also, we were already mentioning the potential interaction between social and foraging processes in the discussion, but we are happy to add a reference to recent studies by S. Shultz and R. Dunbar (2022), which is indeed directly relevant. We thank the reviewer for pointing out this literature.

      In a similar vein, the hypotheses of relating frontal pole to 'meta-cognition' and dorsolateral PFC to 'working memory' is a dramatic oversimplification of the complexity of cognitive function and does a disservice to the careful approach of the rest of the manuscript.

      We agree that the formulation of which functions we were attributing to the distinct brain regions might not have been clear enough, but the functional relation between frontal pole and metacognition in the one hand, and DLPFC and working memory on the other hand, have been firmly established in the literature, both through laboratory studies and through clinical data. Clearly, no single brain region is necessary and sufficient for any cognitive operation, but decades of neuropsychology have demonstrated the differential implication of distinct brain regions in distinct functions, which is all we mean here. We have made a specific point on that topic in the discussion (cf p. 16). We have also reformulated the introduction to clarify that, even if the relation between these regions and their functions (FP/ metacognition; DLPFC/ working memory) was clear in laboratory conditions, it was not clear whether this mapping could be used for real life conditions. And therefore whether that simplification was somehow justified beyond the lab (and the clinics), and whether these neuro-cognitive concepts could be applied to natural conditions, are indeed critical questions that we wanted to address. The central goal of the present study was precisely to evaluate the extent to which this brain/cognition relation could be used to understand more natural behaviors and functions, and we hope that it appears more clearly now.

      One can also question the predicted relationship between frontal pole meta-cognition and social abilities versus foraging, as Passingham and Wise show in their 2012 book that it is frontal pole size that correlates with learning ability-an argument that they used to relate this part of the brain to foraging abilities. I would strongly suggest the authors refrain from using such descriptive terms. Why not simply use the names of the variables actually showing significant correlations with relative size of the areas?

      We basically agree with the reviewer, and we acknowledge the lack of clarity in the introduction of the previous manuscript. There were indeed lots of ambiguity in what we were referring to as ‘function’, associated with a given brain region. « Function » referred to way to many things! We have reformulated the introduction not only to clarify the different types of functions that were attributed to distinct brain regions in the literature but also to clarify how this study was addressing the question: by trying to articulate concepts from neuroscience laboratory studies with concepts from behavioral ecology and evolution using intuitive scenarios. We hope that the present version of the introduction makes that point clearer.

      The major methodological judgements in this paper are of course in the delineation of the frontal pole and dorsolateral prefrontal cortex. As I said above, I appreciate how carefully the authors describe their anatomical procedure, allowing researchers to replicate and extend their work. They are also careful not to relate their regions of interest to precise cytoarchitectonic areas, as such a claim would be impossible to make without more evidence. That said, there is a judgement call made in using the principal sulcus as a boundary defining landmark for FP in monkeys and the superior frontal sulcus in apes. I do not believe that these sulci are homologous. Indeed, the authors themselves go on to argue that dorsolateral prefrontal cortex, where studied using cytoarchitecture, stretches to the fundus of principal sulcus in monkeys, but all the way to the inferior frontal sulcus in apes. That means that using the fundus of PS is not a good landmark.

      We thank the reviewer for his kind remarks on our careful descriptions. But then, it is not clear whether our choice of using the principal sulcus as a boundary for FP in monkeys vs the superior frontal sulcus in apes is actually a judgement call. First, and foremost, there is no clear and unambiguous definition of what should be the boundaries of the FP. By contrast with cytoarchitectonic maps, but clearly this is out of reach here. In humans and great apes we used Bludau et al 2014 (i.e. sup frontal sulcus), and in monkeys, we chose a conservative landmark that eliminated area 9, which is traditionally associated with the DLPFC (Petrides, 2005; Petrides et al, 2012; Semendeferi et al, 2001).

      Of course, any definition will attract criticism, so the best solution might be to run the analysis multiple times, using different definitions for the areas, and see how this affects results.

      Indeed, functional maps indicate that dorsal part of anterior PFC in monkeys is functionally part of FP. But again, cytoarchitectonic maps also indicate that this part of the brain includes BA 9, which is traditionally associated with DLPFC (Petrides, 2005; Petrides et al, 2012). As already pointed out in the discussion, there is a functional continuum between FP and DLPFC and our goal when using PS as dorsal border was to be very conservative and to exclude the ambiguous area. But we agree with the reviewer that given that this decision is arbitrary, it was worth exploring other definitions of the FP volume. So, we did complete a new analysis with a less conservative definition of the FP, to include this ambiguous dorsal area, and it is now included in the supplementary material. Maybe as expected, including the ambiguous area in the FP volume shifted the relation with socio-ecological variables towards the pattern displayed by the DLPFC (ie the influence of population density decreased). The most parsimonious interpretation of this results is that when extending the border of the FP region to cover a part of the brain which might belong to the DLPFC, or which might be somehow functionally intermediate between the 2, the specific relation of the FP with socio-ecological variables decreases. Thus, even if we agree that it was important to conduct this analysis, we believe that it only confirms the difficulty to identify a clear boundary between FP and DLPFC. Again, we have clearly explained throughout the manuscript that we admit the lack of precision in our definitions of the functional brain regions. In that frame, the conservative option seems more appropriate and for the sake of clarity, the results of the additional analysis of a FP volume that includes the ambiguous area is only included in the supplementary material.

      If I understand correctly, the PGLS was run separately for the three brain measure (whole brain, FP, DLPFC). However, given that the measures are so highly correlated, is there an argument for an analysis that allows testing on residuals. In other words, to test effects of relative size of FP and DLPFC over and above brain size?

      Generally, using residuals as “data” (or pseudo-data) is not recommended in statistical analyses. Two widely cited references from the ecological literature are:

      Garcia-Berthou E. (2001) On the Misuse of Residuals in Ecology: Testing Regression Residuals vs. the Analysis of Covariance. Journal of Animal Ecology, 70 (4): 708-711.

      Freckleton RP. (2002). On the misuse of residuals in ecology: regression of residuals vs. multiple regression. Journal of Animal Ecology 71: 542–545. https://doi.org/10.1046/ j.1365-2656.2002.00618.x.

      The main reason for this recommendation is that residuals are dependent on the fitted model, and thus on the particular sample under consideration and the eventual significant effects that can be inferred.

      In the discussion and introduction, the authors discuss how size of the area is a proxy for number of neurons. However, as shown by Herculano-Houzel, this assumption does not hold across species. Across monkeys and apes, for instance, there is a different in how many neurons can be packed per volume of brain. There is even earlier work from Semendeferi showing how frontal pole especially shows distinct neuron-to-volume ratios.

      We appreciate the reviewer’s comment, but the references to Herculano-Houzel that we have in mind do indicate that the assumption is legitimate within primates.

      Herculano-Houzel et al (2007) show that the neuronal density of the cortex is well conserved across primate species (but only monkeys were studied). The conclusion of that study is that using volumes as a proxy for number of neurons, as a measure of computational capacity, should be avoided between rodents and primates (and as they showed later, even more so with birds, for which neuronal density is higher). BUT within primates, since neuronal densities are conserved, volume is a good predictor of number of neurons. Gabi et al (2016) provide evidence that the neuronal density of the PFC is well conserved between humans and non-human primates, which implies that including humans and great apes in the comparison is legitimate. In addition, the brain regions included in the analysis presumably include very similar architectonic regions (e.g. BA 10 for FP, BA 9/46 for DLPFC), which also suggests that the neuronal density should be relatively well conserved across species. Altogether, we believe that there is sufficient evidence to support the idea that the volume of a PFC region in primates is a good proxy for the number of neurons in that region, and therefore of its computational capacity.

      Semendeferi and colleagues (2001) pointed out some differences in cytoarchitectonic properties across parts of the FP and discussed how these properties could 1) be used to identify area 10 across species 2) be associated with distinct computational properties, with the idea that thicker ‘cell body free’ layers would leave more space for establishing connections (across dendrites and axons). This pioneering work, together with more recent imaging studies on functional connectivity (e.g. Sallet et al, 2013) emphasize the critical contribution of connectivity pattern as a tool for comparative anatomy. But unfortunately, as pointed out in the discussion already, this is currently out of reach for us.

      We acknowledge the limitations, and to be fair, the notion of computational capacity itself is hard to define operationally. Based on the work of Herculano-Houzel et al, average density is conserved enough across primates (including humans) to justify our approximation. We have tried to define our regions of interest using both anatomical and functional maps and, thanks to the reviewer’s suggestions, we even tried several ways to segment these regions. Functional maps in macaques and humans do not exactly match cytoarchitectonic maps, presumably because functions rely not only upon the cytoarchitectonics but also on connectivity patterns (e.g. Sallet et al, 2013).

      In sum, we appreciate the reviewer’s point but feel that, given the current understanding of brain functions and the relative conservation of neuronal density across primate PFC regions, the volume of a PFC region seems to be reasonable proxy for its number of neurons, and therefore its computational capacity. We have added these points to the discussions, and we hope that the reader will be able to get a fair sense of how legitimate is that position, given the literature.

      Overall, I think this is a very valuable approach and the study demonstrates what can now be achieved in evolutionary neuroscience. I do believe that they authors can be even more thorough and precise in their measurements and claims.

      Reviewer #2 (Public Review):

      In the manuscript entitled "Linking the evolution of two prefrontal brain regions to social and foraging challenges in primates" the authors measure the volume of the frontal pole (FP, related to metacognition) and the dorsolateral prefrontal cortex (DLPFC, related to working memory) in 16 primate species to evaluate the influence of socio-ecological factors on the size of these cortical regions. The authors select 11 socio-ecological variables and use a phylogenetic generalized least squares (PGLS) approach to evaluate the joint influence of these socio-ecological variables on the neuro-anatomical variability of FP and DLPFC across the 16 selected primate species; in this way, the authors take into account the phylogenetic relations across primate species in their attempt to discover the influence of socio-ecological variables on FP and DLPF evolution.

      The authors run their studies on brains collected from 1920 to 1970 and preserved in formalin solution. Also, they obtained data from the Mussée National d´Histoire Naturelle in Paris and from the Allen Brain Institute in California. The main findings consist in showing that the volume of the FP, the DLPFC, and the Rest of the Brain (ROB) across the 16 selected primate species is related to three socio-ecological variables: body mass, daily traveled distance, and population density. The authors conclude that metacognition and working memory are critical for foraging in primates and that FP volume is more sensitive to social constraints than DLPFC volume.

      The topic addressed in the present manuscript is relevant for understanding human brain evolution from the point of view of primate research, which, unfortunately, is a shrinking field in neuroscience.

      We must not have been clear enough in our manuscript, because our goal is precisely not to separate humans from other primates. This is why, in contrast to other studies, we have included human and non-human primates in the same models. If our goal had been to study human evolution, we would have included fossil data (endocasts) from the human lineage.

      But the experimental design has two major weak points: the absence of lissencephalic primates among the selected species and the delimitation of FP and DLPFC. Also, a general theoretical and experimental frame linking evolution (phylogeny) and development (ontogeny) is lacking.

      We admit that lissencephalic species could not be included in this study because we use sulci as key landmarks. We believe that including lissencephalic primates would have introduced a bias and noise in our comparisons, as the delimitations and landmarks would have been different for gyrencephalic and lissencephalic primates. Concerning development, it is simply beyond the scope of our study.

      Major comments.

      1) Is the brain modular? Is there modularity in brain evolution?: The entire manuscript is organized around the idea that the brain is a mosaic of units that have separate evolutionary trajectories:

      "In terms of evolution, the functional heterogeneity of distinct brain regions is captured by the notion of 'mosaic brain', where distinct brain regions could show a specific relation with various socio-ecological challenges, and therefore have relatively separate evolutionary trajectories".

      This hypothesis is problematic for several reasons. One of them is that each evolutionary module of the brain mosaic should originate in embryological development from a defined progenitor (or progenitors) domain [see García-Calero and Puelles (2020)]. Also, each evolutionary module should comprise connections with other modules; in the present case, FP and DLPFC have not evolved alone but in concert with, at least, their corresponding thalamic nuclei and striatal sector. Did those nuclei and sectors also expand across the selected primate species? Can the authors relate FP and DLPFC expansion to a shared progenitor domain across the analyzed species? This would be key to proposing homology hypotheses for FP and DLPFC across the selected species. The authors use all the time the comparative approach but never explicitly their criteria for defining homology of the cerebral cortex sectors analyzed.

      We do not understand what the referee is referring to with the word ‘module’, and why it relates to development. Same thing for the anatomical relation with subcortical structures. Yes, the identity of distinct functional cortical regions relies upon subcortical inputs during development, but clearly this is neither technically feasible, nor relevant here anyways.

      We acknowledge, however, that our definition of functional regions was not precise enough, and we have updated the introduction to clarify that point. In short, we clearly do not want to make a strong case for the functional borders that we chose for the regions of interest here (FP and DLPFC), but rather use those regions as proxies for their corresponding functions as defined in laboratory conditions for a couple of species (rhesus macaques and humans, essentially).

      Contemporary developmental biology has showed that the selection of morphological brain features happens within severe developmental constrains. Thus, the authors need a hypothesis linking the evolutionary expansion of FP and DLPFC during development. Otherwise, the claims form the mosaic brain and modularity lack fundamental support.

      Once again, we do not think that our definition of modules matches what the reviewer has in mind, i.e. modules defined by populations of neurons that developed together (e.g. visual thalamic neurons innervating visual cortices, themselves innervating visual thalamic neurons). Rather, the notion of mosaic brain refers to the fact that different parts of the brain are susceptible to distinct (but not necessarily exclusive) sources of selective pressures. The extent to which these ‘developmental’ modules are related to ‘evolutionary’ modules is clearly beyond the scope of this paper.

      Our goal here was to evaluate the extent to which modules that were defined based on cognitive operations identified in laboratory conditions could be related (across species) to socio-ecological factors as measured in wild animals. Again, we agree that the way these modules/ functional maps were defined in the paper were confusing, and we hope that the new version of the manuscript makes this point clearer.

      Also, the authors refer most of the time to brain regions, which is confusing because they are analyzing cerebral cortex regions.

      We do not understand why the term ‘brain’ is more confusing than ‘cerebral cortex’, especially for a wide audience.

      2) Definition and delimitation of FP and DLPFC: The precedent questions are also related to the definition and parcellation of FP and DLPFC. How homologous cortical sectors are defined across primate species? And then, how are those sectors parcellated?

      The authors delimited the FP:

      "...according to different criteria: it should match the functional anatomy for known species (macaques and humans, essentially) and be reliable enough to be applied to other species using macroscopic neuroanatomical landmarks".

      There is an implicit homology criterion here: two cortical regions in two primate species are homologs if these regions have similar functional anatomy based on cortico-cortical connections. Also, macroscopic neuroanatomical landmarks serve to limit the homologs across species.

      This is highly problematic. First, because similar function means analogy and not necessarily homology [for further explanation see Puelles et al. (2019); García-Cabezas et al. (2022)].

      We are not sure to follow the Reviewer’s point here. First, it is not clear what would be the evolutionary scenario implied by this comment (evolutionary divergence followed by reversion leading to convergence?). Second, based on the literature, both the DLPFC and the FP display strong similarities between macaques and humans, in terms of connectivity patterns (Sallet et al, 2013), in terms of lesion-induced deficit and in terms of task-related activity (Mansouri et al, 2017). These criteria are usually sufficient to call 2 regions functionally equivalent. We do not see how this explanation is "highly problematic" as it is clearly the most parsimonious based on our current knowledge.

      Second, because there are several lissencephalic primate species; in these primates, like marmosets and squirrel monkeys, the whole approach of the authors could not have been implemented. Should we suppose that lissencephalic primates lack FP or DLPFC?

      We understand neither the reviewer’s logic, nor the tone. We understand that the reviewer is concerned by the debate on whether some laboratory species are more relevant than others for studying the human prefrontal cortex, but this is clearly not the objective of our work. As explained in the manuscript, we identified FP and DLPFC based on functional maps in humans and laboratory monkeys (macaques), and we used specific gyri as landmarks that could be reliably used in other species. And, as rightfully pointed out by reviewer 1, this is in and off itself not so trivial. Of course, lissencephalic animals could not be studied because we could not find these landmarks, but why would it mean that they do not have a prefrontal cortex? The reviewer implies that species that we did not study do not have a prefrontal cortex, which makes little sense. Standards in the field of comparative anatomy of the PFC, especially when it implies rodents (lissencephalic also) include cytoarchitectonic and connectivity criteria, but obviously we are not in a position to address it here. We have, however, included references to the seminal work of Angela Roberts and collaborator in the discussion on marmosets prefrontal functions, to reinforce the idea that the functional organization is relatively well conserved across all primates (with or without gyri on their brain) (Dias et al, 1996; Roberts et al, 2007).

      Do these primates have significantly more simplistic ways of life than gyrencephalic primates? Marmosets and squirrel monkeys have quite small brains; does it imply that they have not experience the influence of socio-ecological factors on the size of FP, DLPFC, and the rest of the brain?

      Again, none of this is relevant here, because we could not draw conclusions on species that we cannot study for methodological reasons. The reviewer seems to believe that an absence of evidence is equivalent to an evidence of absence, but we do not.

      The authors state that:

      "the strong development of executive functions in species with larger prefrontal cortices is related to an absolute increase in number of neurons, rather than in an increase in the ration between the number of neurons in the PFC vs the rest of the brain".

      How does it apply to marmosets and squirrel monkeys?

      Again, we do not understand the reviewer’s point, since it is widely admitted that lissencephalic monkeys display both a prefrontal cortex and executive functions (again, see the work of Angela Roberts cited above). Our goal here was certainly not to get into the debate of what is the prefrontal cortex in a handful of laboratory species, but to evaluate the relevance of laboratory based neuro-cognitive concepts for understanding primates in general, and in their natural environment.

      References:

      García-Cabezas MA, Hacker JL, Zikopoulos B (2022) Homology of neocortical areas in rats and primates based on cortical type analysis: an update of the Hypothesis on the Dual Origin of the Neocortex. Brain structure & function Online ahead of print. doi:doi.org/ 10.1007/s00429-022-02548-0

      García-Calero E, Puelles L (2020) Histogenetic radial models as aids to understanding complex brain structures: The amygdalar radial model as a recent example. Front Neuroanat 14:590011. doi:10.3389/fnana.2020.590011

      Nieuwenhuys R, Puelles L (2016) Towards a New Neuromorphology. doi:10.1007/978-3-319-25693-1

      Puelles L, Alonso A, Garcia-Calero E, Martinez-de-la-Torre M (2019) Concentric ring topology of mammalian cortical sectors and relevance for patterning studies. J Comp Neurol 527 (10):1731-1752. doi:10.1002/cne.24650

      Reviewer #3 (Public Review):

      This is an interesting manuscript that addresses a longstanding debate in evolutionary biology - whether social or ecological factors are primarily responsible for the evolution of the large human brain. To address this, the authors examine the relationship between the size of two prefrontal regions involved in metacognition and working memory (DLPFC and FP) and socioecological variables across 16 primate species. I recommend major revisions to this manuscript due to: 1) a lack of clarity surrounding model construction; and 2) an inappropriate treatment of the relative importance of different predictors (due to a lack of scaling/normalization of predictor variables prior to analysis). My comments are organized by section below:

      We thank the reviewer for the globally positive evaluation and for the constructive remarks. Introduction:

      • Well written and thorough, but the questions presented could use restructuring.

      Again, we thank the reviewer, and we believe that this is coherent with some of the remarks of reviewer 1. We have extensively revised the introduction, toning down the social vs ecological brain issue to focus more on what is the objective of the work (evaluating the relevance of lab based neuro-cognitive concepts for understanding natural behavior in primates).

      Methods:

      • It is unclear which combinations of models were compared or why only population density and distance travelled tested appear to have been included.

      The details of the model comparison analysis were presented as a table in the supplementary material (#3, details of the model comparison data), but we understand that this was not clear enough. We have provided more explanation both in the main manuscript and in the supplements. All variables were considered a priori; however, we proceeded beforehand to an exploratory analyses which led us to exclude some variables because of their lack of resolution (not enough categories for qualitative variables) or strong cross-correlations with other quantitative variables. There were much more than three variables included in the models but the combination of these 3 (body mass, daily traveled distance and population density) best predicted (had the smallest AIC) the size of the brain regions. We provide additional information about these exploratory analyses in the supplementary material, sections 2 and 3.

      • Brain size (vs. body size) should be used as a predictor in the models.

      We do not understand the theoretical reason for replacing body size by brain size in the models. Brain size is not a socio-ecological variable. And of course, that would be impossible for modeling brain size itself. Or is it that the reviewer suggests to use brain size as a covariate to evaluate the effects of other variables in the model over and above the effect on brain size? But what is the theoretical basis for this?

      • It is not appropriate to compare the impact of different predictors using their coefficients if the variables were not scaled prior to analysis.

      We thank the Reviewer for this comment; however, standardized coefficients are not unproblematic because their calculations are based on the estimated standard-deviations of the variables which are likely to be affected by sampling (in effect more than the means). We note that the methods of standardized coefficients have attracted several criticisms in the literature (see the References section in https://en.wikipedia.org/wiki/Standardized_coefficient). Nevertheless, we now provide a table with these coefficients which makes an easy comparison for the present study. We also updated tables 1, 2 and 3 to include standardized beta values.

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Reviewer #2 (Recommendations For The Authors):

      Contemporary developmental biology has showed that the brain of all mammals, including primates, develops out of a bauplan (or blueprint) made of several fundamental morphological units that have invariant topological relations across species (Nieuwenhuys and Puelles 2016).

      At some point in the discussion the authors acknowledge that:

      "Our aim here was clearly not to provide a clear identification of anatomical boundaries across brain regions in individual species, as others have done using much finer neuroanatomical methods. Such a fine neuroanatomical characterization appears impossible to carry on for a sample size of species compatible with PGLS".

      I do not think it would be impossible to carry such neuroanatomical characterization. It would take time and effort, but it is feasible. Such characterization, if performed within the framework of contemporary developmental biology, would allow for well-founded definition and delineation of cortical sectors across primate species, including lissencephalic ones, and would allow for meaningful homologies and interspecies comparisons.

      We do not see how our work would benefit from developmental biology at that point, because it is concerned with evolution, and these are very distinct biological phenomena. We do not understand the reviewer’s focus on lissencephalic species, because they are not so prevalent across primates, and it is unlikely that adding a couple of lissencephalic species will change much to the conclusions.

      Minor points:

      • Please, format references according to the instructions of the journal.

      Ok - done

      • The authors could use the same color code across Figures 1, 2, and 3.

      Ok – done

      • The authors say that group hunting "only occurs in a few primate species", but it also occurs in wolves, whales, and other mammalian species.

      We focus on primates here, these other species are irrelevant. Again, this is beside the point.

      Reviewer #3 (Recommendations For The Authors):

      My comments are organized by section below:

      Introduction:

      • Well written and thorough

      • The two questions presented towards the end of the intro are not clear and do not guide the structure of the methods/results sections. I believe one it would be more appropriate to ask if: 1) the relative proportions of the FP and DLPFC (relative to ROB) are consistent across primates; and 2) if the relative size of these region is best predicted by social and/ or ecological variables. Then, the results sections could be organized according to these questions (current results section 1 = 1; current results sections 2, 3, 4 = 2.1, 2.2, 2.3)

      As explained above, we agree with the reviewer that the introduction was somehow misleading and we have edited it extensively. We do not, however, agree with the reviewer regarding the relative (vs absolute) measure. We have discussed this in our response to reviewer 1 regarding the comparison of regional volumes as proxies for number of neurons. The best predictor of the computing capacity of a brain region is its number of neurons, but there is no reason to believe that this capacity should decrease if the rest of the brain increases, as implied by the relative measure that the reviewer proposes. That debate is probably critical in the field of comparative neuroanatomy, and confronting different perspectives would surely be both interesting and insightful, but we feel that it is beyond the scope of the present article.

      Methods:

      • While the methods are straightforward and generally well described, it is unclear which combinations of models were compared or why only population density and distance travelled tested appear to have been included (in e.g., Fig SI 3.1) even though many more variables were collected.

      We agree that this was not clear enough, and we have tried to improve the description of our model comparison approach, both in the main text and in the supplementary material.

      • Why was body mass rather than ROB used as a predictor in the models? The authors should instead/also include analyses using ROB (so the analysis is of FP and DLPFC size relative to brain size). Using body mass confounds the analyses since they will be impacted by differences in brain size relative body size.


      Again, we have addressed this issue above. First, body size is a socio-ecological variable (if anything, it especially predicts energetic needs and energy expenditure), but ROB is clearly not. We do not see the theoretical relevance of ROB in a socio-ecological model. Second, from a neurobiological point of view, since within primates the volume of a given brain region is directly related to its number of neurons (again, see work of Herculano-Houzel), which is a good proxy for its computing capacity, we do not see the theoretical reason for considering ROB.

      • It is not appropriate to compare the impact of different predictors using their coefficients if the variables were not scaled prior to analysis. The authors need to implement this in their approach to make such claims.

      We thank the reviewer again for pointing that out. We have addressed this question above.

      • Differences across primates in terms of frontal lobe networks throughout the brain should be acknowledged (e.g., Barrett et al. 2020, J Neurosci).

      We have added that reference to the discussion, together with other references showing that the difference between human and non-human primates is significant, but essentially quantitative, rather than qualitative (the building blocks are relatively well conserved, but their relative weight differs a lot). Thank you for pointing it out.

      I hope the authors find my comments helpful in revising their manuscript.

      And we thank again the reviewer for the helpful and constructive comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study identifies the homeodomain transcription factor and suspected autism-candidate gene Meis2 as transcriptional regulators of maturation and end-organ innervation of low-threshold mechanoreceptors (LTMRs) in the dorsal root ganglia (DRG) of mice. For a few years, the view on autism spectrum disorders (ASD) has shifted from a disorder that exclusively affects the brain to a condition that also includes the peripheral somatosensory system, even though our knowledge about the genes involved is incomplete. The study by Desiderio and colleagues is therefore not only scientifically interesting but may also have clinical relevance. The work is convincing, with appropriate and validated methodology in line with current state-of-the-art and the findings contribute both to understanding and potential application.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work examined transcription factor Meis2 in the development of mouse and chick DRG neurons, using a combination of techniques, such as the generation of a new conditional mutant strain of Meis2, behavioral assays, in situ hybridization, transcriptomic study, immunohistochemistry, and electrophysiological (ex vivo skin-nerve preparation) recordings. The authors found that Meis2 was selectively expressed in A fiber LTMRs and that its disruption affects the A-LTMRs' end-organ innervation, transcriptome, electrophysiological properties, and light touch-sensation.

      Strengths:

      1) The authors utilized a well-designed mouse genetics strategy to generate a mouse model where the Meis2 is selectively ablated from pre- and post-mitotic mouse DRG neurons. They used a combination of readouts, such as in situ hybridization, immunhistochemistry, transcriptomic analysis, skin-nerve preparation, electrophysiological recordings, and behavioral assays to determine the role of Meis2 in mouse DRG afferents.

      2) They observed a similar preferential expression of Meis2 in large-diameter DRG neurons during development in chicken, suggesting evolutionarily conserved functions of this transcription factor.

      3) Conducted severe behavioral assays to probe the reduction of light-touch sensitivity in mouse glabrous and hairy skin. Their behavioral findings support the idea that the function of Meis2 is essential for the development and/or maturation of LTMRs.

      4) RNAseq data provide potential molecular pathways through which Meis2 regulates embryonic target-field innervation.

      5) Well-performed electrophysiological study using skin-nerve preparation and recordings from saphenous and tibial nerves to investigate physiological deficits of Meis2 mutant sensory afferents.

      6) Nice whole-mount IHC of the hair skin, convincingly showing morphological deficits of Meis2 mutant SA- and RA- LTMRs.

      Overall, this manuscript is well-written. The experimental design and data quality are good, and the conclusion from the experimental results is logical.

      Weaknesses:

      1) Although the authors justify this study for the involvement of Meis2 in Autism and Autism associated disorders, no experiments really investigated Autism-like specific behavior in the Meis2 ablated mice.

      Indeed, in the first version of the manuscript, we use current understanding of ASD in mouse models and associated sensory defects to articulate our introduction and discussion. As noticed by reviewer 1, none of our experiments really investigated ASD. To avoid over-interpretation of the data, we have now removed sentences mentioning ASD and related references throughout the manuscript.

      2) For mechanical force sensing-related behavioral assays, the authors performed VFH and dynamic cotton swabs for the glabrous skin, and sticky tape on the back (hairy skin) for the hairy skin. A few additional experiments involving glabrous skin plantar surfaces, such as stick tape or flow texture discrimination, would make the conclusion stronger.

      We fully agree on that performing more behavioral analysis investigating with more details the primary sensory defects as well as some ASD-related behavior would re-inforce our conclusions. Our behavioral analysis clearly showed a loss of sensitivity in response to mechanical stimuli within the light touch range but not for higher range mechanical or noxious thermal stimuli. While the experiments suggested by the reviewer are interesting and would strengthen our conclusions, they are far from trivial and require large cohorts. Given the current laboratory conditions as stated at the outset, these unfortunately are not within reach.

      3) The authors considered von Frey filaments (1 and 1.4 g) as noxious mechanical stimuli (Figure 1E and statement on lines 181-183), which is questionable. Alligator clips or pinpricks are more certain to activate mechanical nociceptors.

      To avoid misinterpretation of the higher Von Frey filament tests, we deleted the two following statement in page 7: “In the von Frey test, the thresholds for paw withdrawal were similar between all genotypes when using filaments exerting forces ranging from 1 to 1.4g, which likely reflects the activation of mechanical nociception suggesting that Meis2 gene inactivation did not affect nociceptor function.”. The sentence “… while sparing other somatosensory behaviors” was also deleted.

      4) There are disconnections and inconsistencies among findings from morphological characterization, physiological recordings, and behavior assays. For example, Meis2 mutant SA-LTMRs show a deficiency in Merkel cell innervation in the glabrous skin but not in hairy skin. With no clear justification, the authors pooled recordings of SA-LTMRs from both glabrous and hairy skin and found a significant increase in mean vibration threshold. Will the results be significantly different if the data are analyzed separately? In addition, whole-mount IHC of Meissner's corpuscles showed morphological changes, but electrophysiological recordings didn't find significant alternation of RAI LTMRs. What does the morphological change mean then? Since the authors found that Meis2 mice are less sensitive to a dynamic cotton swab, which is usually considered as an RA-LTMR mediated behavior, is the SAI-LTMR deficit here responsible for this behavior? Connections among results from different methods are not clear, and the inconsistency should be discussed.

      We thank Reviewer 1 for the careful review of our data and fully agree with the weaknesses identified, weaknesses we were ourselves aware of at the time of submission. In particular on the lack of stronger connections between histological and electrophysiological data. Electrophysiological studies were conducted on a first cohort of mice where we mostly emphasize on WT and Meis2 mutant mice. The goal was to describe differences in electrophysiological properties of identified mechanoreceptors from these two genotypes. While substantial differences between WT and Islet1-Cre mice were not expected, only very few mice with this genotype were examined at that time to confirm this assumption. We fully agree with reviewer 1 that confirming differences in SA-LTMRs responses in the hairy and glabrous at electrophysiological levels would be interesting and worthwhile. It is assumed that the physiological properties of SA-LTMRs from glabrous and hairy skins are equivalent in both skin types. Indeed direct comparisons have been made between glabrous and hairy skin SA-LTMRs revealing that they have equivalent receptor properties (see Walcher et al J Physiol quoted in the manuscript). We had not recorded from a sufficient number of hairy and glabrous skin SA-LTMRs to make any meaningful comparison statistically. When we noticed the dramatic differences in the innervation patterns of Merkel cell complexes between glabrous and hairy skin, we immediately planned a second mice cohort, but as explained in the onset to the Public Review, this cohort was sacrificed due to the pandemic lockdown. However, the obtained dataset clearly shows that in Meis2 mutant mice many SA-LTMRs had similar vibration thresholds to those of wild types.

      For Meissner corpuscle, histological analysis evidenced clear morphological differences that could of course be investigated at the level of the dual innervation previously reported by Neubarth et al. It is uncertain whether differences in their electrophysiological responses would be revealed by increasing the number of recorded fibers. For this reason, we clearly stated this limitation in the results section page 7 “There was a tendency for RA-LTMRs in Isl1Cre/+::Meis2LoxP/LoxP mutant mice to fire fewer action potentials to sinusoids and to the ramp phase of a series 2 second duration ramp and hold stimuli, but these differences were not statistically significant (Figure 5B). Nevertheless it is important to point out that an electrical search strategy revealed that many Aβ-fibers did not have mechanosensitive receptive fields. Thus by focusing on LTMRs with a mechanosensitive receptive field, we ignore the fact that fewer fibers are mechanosensitive. This is now more extensively discussed in the discussion section of the manuscript page 13:

      “Indeed, the electrophysiology methods used here can only identify sensory afferents that have a mechanosensitive receptive field. Primary afferents that have an axon in the skin but no mechanosensitvity can only be identified with a so-called electrical search protocol (45, 46) which was not used here. It is therefore quite likely that many primary afferents that failed to form endings would not be recorded in these experiments e.g. SA-LTMRs and RA-LTMRs that fail to innervate end-organs (Fig.4-6).”

      “From our data, we could not conclude whether SA-LTMR electrophysiological responses are differentially affected in the glabrous versus hairy skin of Meis2 mutant as suggested by histological analysis. Further electrophysiological analysis focused on SA-LTMR selectively innervating the glabrous or hairy skin would be necessary to answer this question. Similarly, the decreased sensitivity of Meis2 mutant mice in the cotton swab assay and the morphological defects of Meissner corpuscles evidenced in histological analysis do not correlate with RA-LTMR electrophysiological responses for which a tendency to decreased responses were however measured. The later might result from an insufficient number of fibers recording, whereas the first may be due of pooling SA-LTMR from both the hairy and glabrous skin.”.

      Reviewer #2 (Public Review):

      Summary:

      Desiderio and colleagues investigated the role of the TALE (three amino acid loop extension) homeodomain transcription factor Meis2 during maturation and target innervation of mechanoreceptors and their sensation to touch. They start with a series of careful in situ hybridizations to examine Meis2 transcript expression in mouse and chick DRGs of different embryonic stages. By this approach, they identify Meis2+ neurons as slowly- and rapidly adapting A-beta LTMRs, respectively. Retrograde tracing experiments in newborn mice confirmed that Meis2-expressing sensory neurons project to the skin, while unilateral limb bud ablations in chick embryos in Ovo showed that these neurons require target-derived signals for survival. The authors further generated a conditional knock-out (cKO) mouse model in which Meis2 is selectively lost in Islet1-expressing, postmitotic neurons in the DRG (IsletCre/+::Meis2flox/flox, abbreviated below as cKO). WT and Islet1Cre/+ littermates served as controls. cKO mice did not exhibit any obvious alteration in volume or cellular composition of the DRGs but showed significantly reduced sensitivity to touch stimuli and various innervation defects to different end-organ targets. RNA-sequencing experiments of E18.5 DRGs taken from WT, Islet1Cre/+, and cKO mice reveal extensive gene expression differences between cKO cells and the two controls, including synaptic proteins and components of the GABAergic signaling system. Gene expression also differed considerably between WT and heterozygous Islet1Cre/+ mice while several of the other parameters tested did not. These findings suggest that Islet1 heterozygosity affects gene expression in sensory neurons but not sensory neuron functionality. However, only some of the parameters tested were assessed for all three genotypes. Histological analysis and electrophysiological recordings shed light on the physiological defects resulting from the loss of Meis2. By immunohistochemical approaches, the authors describe distinct innervation defects in glabrous and hairy skin (reduced innervation of Merkel cells by SA1-LTMRs in glabrous but not hairy skin, reduced complexity of A-beta RA1-LTMs innervating Meissner's corpuscles in glabrous skin, reduced branching and innervation of A-betA RA1-LTMRs in hairy skin). Electrophysiological recordings from ex vivo skin nerve preparations found that several, but not all of these histological defects are matched by altered responses to external stimuli, indicating that compensation may play a considerable role in this system.

      Strengths:

      This is a well-conducted study that combines different experimental approaches to convincingly show that the transcription factor Meis2 plays an important role in the perception of light touch. The authors describe a new mouse model for compromised touch sensation and identify a number of genes whose expression depends on Meis2 in mouse DRGs. Given that dysbalanced MEIS2 expression in humans has been linked to autism and that autism seems to involve an inappropriate response to light touch, the present study makes a novel and important link between this gene and ASD.

      Weaknesses:

      The authors make use of different experimental approaches to investigate the role of Meis2 in touch sensation, but the results obtained by these techniques could be connected better. For instance, the authors identify several genes involved in synapse formation, synaptic transmission, neuronal projections, or axon and dendrite maturation that are up- or downregulated upon targeted Meis2 deletion, but it is unresolved whether these chances can in any way explain the histological, electrophysiological, or behavioral deficits observed in cKO animals. The use of two different controls (WT and Islet1Cre/+) is unsatisfactory and it is not clear why some parameters were studied in all three genotypes (WT, Islet1Cre/+ and cKO) and others only in WT and cKO. In addition, Meis2 mutant mice apparently are less responsive to touch, whereas in humans, mutation or genomic deletion involving the MEIS2 gene locus is associated with ASD, a condition that, if anything, is associated with an elevated sensitivity to touch. It would be interesting to know how the authors reconcile these two findings. A minor weakness, the first manuscript suffers from some ambiguities and errors, but these can be easily corrected.

      We thank the reviewer for the insightful comments and suggestions.

      The use of two different controls (WT and Islet1Cre/+) is unsatisfactory and it is not clear why some parameters were studied in all three genotypes (WT, Islet1Cre/+ and cKO) and others only in WT and cKO.

      First, we identified a labelling mistake in figures 4D, 5A and 6A where the control shown are from Islet1+/Cre mice and not from WT as reported in the first version. We apologize for this mistake which has now been corrected. This typographical error does not in any way affect our conclusion, on the contrary, it shows that innervation defects are not the consequence of Islet1 heterozygosity.

      The reviewer wonders why for some data both control genotypes are presented, and for some others only one is presented. It is quite possible that genes expression changes happen due to a synergistic effect of both heterozygous Meis2 deletion and heterozygous Islet1 deletion. However, we found no evidence that this led to defects in target-field innervation or to changes in the physiological properties of sensory neurons.

      Whereas it could be fairly envisaged that some gene expression is modified due to a synergistic effect of both heterozygous Meis2 deletion and heterozygous deletion of Islet1, several lines of evidence support that the defects in target-field innervation and electrophysiological responses are exclusively due to Meis2 deletion. Previous work on Islet1 specific deletion in DRG sensory neurons opens the possibility that some of the phenotypes we report here are in part due to an effect of Islet1 heterozygous deletion or a synergistic effect to Meis2 homozygous deletion.

      1) When Islet1 is conditionally deleted in mice using the Wnt1-Cre strain or at later stages using a tamoxifen inducible-Cre, homozygous pups die a few hours after birth. Early Islet1 deletion results in an increased apoptosis in the DRG, a massive loss of DRG sensory neurons and sensory defects associated to nociceptors mostly and some touch neurons while proprioceptive neurons are spared (Sun et al., 2008 now included in the revised version of the manuscript). There was a decrease in the number of Ntrk1+ and Ntrk2+ neurons whereas Ntrk3+ neurons number appeared normal. When Islet1 is inactivated later in development, the number of Ntrk1+ and Ntrk2+ neurons were normal and only the expression of nociceptor specific markers was decreased. Since neither the DRG volume, nor the number of Ntrk1+, Ntrk2+ and Ntrk3+ neurons are changed in Meis2 cKO using the Islet1-Cre strain, an early significant effect of Islet1 heterozygous deletion is very unlikely.

      2) For distal innervation defects, it is clear from the Wnt1-Cre::Meis2 data (Figure 3E) that the distal innervation phenotype occurred while Meis2 is inactivated independently of Islet1 expression.

      3) Finally, the lack of differences between WT and Islet+/Cre mice in behavioral assays and in electrophysiological characterization of RA-LTMR of the hairy skin (Figure 6C) and SA-LTMR (Figure 4B and C) argues for a lack of significant consequences of Islet1 heterozygous deletion on these parameters.

      4) For bulk RNAseq studies, all datasets has been now re-analyzed following Reviewer 2 specific comments (see below). To avoid misinterpretation of the data, the results are now presented differently (see pages 8 and 9) and more critically discussed (see pages 14 and 15). In particular, we included and discuss references on Islet1 cKO mice.

      We also agree with reviewer 2 that our RNAseq study only provides cues on potential genes expression that could impact distal innervation and electrophysiological responses. However, proving which of those genes are fully responsible for the morphological and electrophysiological defects would require extensive mouse genetic investigations such as restoring their normal expression level in a Meis2 mutant context, which is beyond the scope of the present study.

      Finally, the reviewer questioned how we could reconcile the lower touch sensitivity in Meis2 mutant mice with the exacerbated touch sensitivity found in ASD patient and mouse models of ASD. As suggested by reviewer 1, our study did not really investigate ASD specifically. Therefore, to avoid over interpretation of the data and to follow Reviewer 1 recommendation, we have removed all references to ASD in the revised version of the manuscript. Indeed, to our knowledge, none of the case reports on Meis2 mutant patients investigated sensory function in general and light touch in particular, maybe because of the severe intellectual disability characterizing these patients.

      Reviewer #1 (Recommendations For The Authors):

      In addition to the aforesaid suggestions in the section 2, there are some minor issues:

      We thank the reviewer for the careful reading and for identifying all these typos. All of them have been corrected in the revised version of the manuscript.

      1) There should not be a full stop mark in the title of the article. This has been corrected in the new version of the manuscript.

      2) Figure 1C, 1D, please correct the typo "controlateral' to "contralateral".

      This has been corrected in the new version of the manuscript.

      3) Figure 1D, lower graph, Y-axis, please correct the typo 'umber' to "number".

      This has been corrected in the new version of the manuscript.

      4) To make it easy for readers, add the names of the behavioral tests on top of the graphs in Fig 1E-H.

      The name of behavioral tests is now added to the figure.

      5) It would be easier to read the markers' names in IHC and ISH images if they were written outside of image panels. The blue staining color in image 1B could be easily mixed with the background. Suggest change colors.

      Markers for IHC and IH images are now written outside the image panel or colors have been change in figure 1 and 2 for better clarity.

      6) The font size of Genes' name in Figure 3B is too small and not readable.

      Figure 3 has now been changed following Reviewer 2 recommendation. The small font size in Figure 3B is no longer present in the figure.

      7) Quantification of Fig 3E (number of fibers innervating each dermal papilla or footpad, for example).

      Unfortunately, we did not kept the Wnt1Cre::Meis2LoxP/LoxP strain which prevents further analysis (see onset of the answer to public review).

      8) In Figure 4, please arrange IHC images and their quantification results adjacent to each other.

      The figure has been reorganized and changes in the result section and figures legend were made accordingly.

      9) For consistency, please use either LTMR or LTM (See Figure 4F, 5A, 6C), but not both.

      This has been homogenized throughout the manuscript.

      10) Add arrows/heads to mark the overlaps in Figure 4D.

      Arrows are now added in Figure 4D to point at the overlap between Nefh and CK8 staining.

      11) Figure 5A, 6A, Lines 236, 240, 247, 258, 305, 308, 313, 347, and many more in Figure legends: please check in entire manuscript and make the mouse genotype nomenclature (+/Cre?) consistent. In some places, Cre is written in all upper case (Line 657).

      This has been homogenized throughout the manuscript.

      12) Figure 4G: Histogram color could be darker for better contrast.

      The color of the histograms has been changes in figures 6 and 5 for better clarity.

      13) Please add the figure number to the Figure 6.

      The figure number is now indicated on the figure.

      1. Figure 6B: Y-axis typo, correct "Nfeh" to Nefh.

      This typo is now corrected.

      15) Either explain Figure 2B information before that of Figure 2C (In lines 204-207) in the text or change the figure panel sequence to keep the consistent flow of contents.

      The figure has been modified and the panel sequence now follows that of the main text.

      16) Line 213 has a typo: change "form" to "from".

      This typo is now corrected.

      17) Line 423 has a typo. Correct "al" to "all".

      This typo is now corrected.

      18) Line 625 has a typo. Correct "fo" to "of".

      This typo is now corrected.

      19) Line 669 has a typo. Correct "Alexa Fluo" to "Fluor".

      This typo is now corrected.

      20) Line 744: To be consistent in the entire manuscript, write "Nfh" as "Nefh".

      This typo is now corrected.

      21) 740-749: Please add host names for all primary antibodies, as some are given but some are not for the current version.

      We now indicated the host species for all primary antibodies used in the study.

      22) Line 751 has a typo: change "a" to "as".

      This typo is now corrected.

      23) Line 754: what is for 20'?

      This typo is now corrected.

      24) Line 832: change "day test" to "testing day".

      The change has been made.

      25) Please mention for how many seconds the VFH was administered on the plantar surface in the method.

      A new sentence has been added to the “Von Frey withdrawal test” Methods section (page 30): “During each application, bend filament was maintained for approximately four to five seconds”.

      26) For the sticky tape test, in lieu of hind paw attending bouts, wet-dog shake behavior, the authors also found some scratching behaviors. Did they separately quantify these behaviors? It would be interesting to see exactly which behavior significantly reduced after Meis2 inactivation.

      Unfortunately, at the time of the design of the sticky tape test, we did not consider separating the behaviors considered as “positive” reactions. As these experiments were not video recorded, we are not able to extract this kind of information without generating new mice cohort and repeating this experiment.

      27) Line 344-345: consider rephrasing the sentence.

      This sentence has been removed.

      Reviewer #2 (Recommendations For The Authors):

      This is a beautiful and well-conducted study with all the strengths listed in the paragraphs above. Nevertheless, there are still some open questions, ambiguities in the presentation, and minor errors that I would recommend addressing.

      Major Points:

      1) The authors performed RNA-seq analysis from E18.5 mouse total DEGs from three different genotypes, WT, Isle1Cre/+ and cKO. Although this approach identified several interesting Meis2-dependent candidate genes, the presentation of the results is confusing, and the publication would gain impact if the RNA-seq results were better connected to the histological, behavioral, and electrophysiological data. Specific concerns:

      1.1) The gene expression profiles of WT and Islet1Cre/+ samples are remarkably divergent. According to Yang Development 2006, Islet1-Cre was generated by knocking in Cre into the endogenous Islet1 locus and replacing the Isl1 ATG, hence resulting in a heterozygous null for Islet1. When purely technical derivations can be excluded, the RNAseq results presented here suggest that heterozygous loss of Islet1 causes considerable gene expression changes in the postnatal DRG. For analysis of the RNAseq results, the authors focus on genes that are differentially expressed between one experimental condition (Islet1Cre/+::Meis2flox/flox) and either one of two controls (WT or Islet1Cre/+). Hence, they pool the genes that are differently expressed between cKO and Islet1Cre/+ with the genes that are different between cKO and WT. This approach mixes gene expression differences that result from two different genetic alterations, heterozygosity of Islet1 and targeted deletion of Meis2, respectively. It seems much more logical to compare the results pairwise.

      We agree with reviewer 2 that heterozygous deletion of Islet1 causes a significant change in genes expression that seems to very little correlate with any of the phenotypes we investigated in the study. When Islet1 is conditionally deleted in mouse using the Wnt1-cre strain, pups die few hours after birth and display increased apoptosis in the DRG, massive loss of DRG sensory neurons and sensory defects associated to nociceptors mostly and some touch neurons while proprioceptive neurons are spared (Sun et al., 2008 now included in the revised version of the manuscript). There is a decrease numbers of Ntrk1+ and Ntrk2+ neurons whereas the numbers of Ntrk3+ neurons appear normal. Later Isl1 inactivation does not induces changes in number of neurons and does not change Ntrk1 and 2 expressions. As explained in the answer to public reviews, bulk RNAseq data have now been reanalyzed following the reviewer suggestions and presented accordingly in the related figures.

      In the study bay Sun et al. they also reported DEGs following Islet1 homozygous deletion, but data on Islet1 heterozygous deletion are not included. However, out of the 60 most dysregulated genes identified in their study, only 6 were differentially expressed in our datasets. Importantly, DEGs in their studies where identified using microarray. In another study, the same group, showed that Brn3a (another transcription factor important for DRG neurons differentiation) and Islet1 exhibit negative epistasis on sensory genes expression (Dykes et al., 2011 now included in the revised version of the manuscript). Thus we cannot rule out that similar rules apply for Islet1 and Meis2. However, given the high diversity of DRG sensory neurons, interpreting our bulk RNAseq analysis in such direction might lead to misinterpretation.

      1.2) Along the same line, gene expression changes in Islet1Cre/+ DRGs seem to have little functional consequences, at least in the cases where all three genotypes were analyzed (target dependency (Fig. 1E), behavior (Fig. 1F), innervation (Fig. 4F, 6C)). Why were some parameters measured in all three genotypes and others only for WT and cKO? The authors probably reason that parameters that do not differ between WT and cKO animals will likely also not differ between WT and Islet1Cre/+. But what about parameters that do differ? Considering that the innervation of Merkel cells (Fig. 4E) and Meissner corpuscles (Fig. 5A) differ profoundly between WT and cKO, it would be interesting to know what this innervation looks like in Islet1Cre/+ DRGs. NEFH staining together with CK8 or S100beta from existing tissue sections should easily answer this question.

      As explained in the answer for public reviews, there was a mistake in the annotation of the control in figure 4 D and E, and in Fig. 5 that has now been corrected. Concerning target-dependency, those are experiments conducted in chick embryo, and therefore no associated genotype.

      1.3) Was a minimum cut-off for gene expression applied? The up-and downregulated genes in Fig. 3B list a number of pseudogenes and predicted genes. A quick (and incomplete) check for their expression in Fig2 Supple Table 1 shows that only a few reads were detected for most of them. With such low expression, even small changes will show up as significant differences.

      In our first analysis, a cut-off of 10 reads was applied. As reviewer 2 mentioned, this cut-off included several pseudogenes and predicted genes with low expression for which small changes were significant. We now re-analyzed the dataset using a cut-off of 100 reads. This excluded most of the previous predicted genes and pseudogenes for the analysis and resulted in a much small number of DEGs for each dataset. As recommended by reviewer 2, we also now performed the David analysis separately. These results are now presented in Figure 3 and corresponding supplementary figures.

      1.4) Given that bulk RNAseq from whole embryonic DRGs was performed, it would be interesting to know what cell type(s) express the Meis2-dependent transcripts. To address this question, the authors resort to published scRNAseq data by Usoskin Nat Neurosci 2015. They correlate the expression of all 488 DEGs (different between cKO and either WT or Islet1Cre/+) with the expression of Meis2 in the sensory neuron subtypes that were classified in the Usoskin paper. From that they conclude that many Meis2-dependent genes were expressed in the same sensory neuron classes as Meis2 itself. This is not apparent from Fig. 3 Supplementary 2. Neither do the 488 DEGs seem to be in any way enriched in the MEIS2-expressing cell clusters NF2/3/4/5, nor is cluster PEP1 particularly high in Meis2 expression. Immunostaining for MEIS2 together with a few selected DEGs would be a better way to assess co-expression.

      We agree with reviewer 2 that the correlation between DEGs and the expression of Meis2 in the sensory neuron subtypes was far from striking. In our opinion, the new analysis shows now a more robust correlation. However, it has to be kept in mind that among DEGs not all are expected to be Meis2 direct target genes and therefore to be enriched in the same Meis2-expressing population. This also hold true for genes that could be de-repressed or induced following Meis2 inactivation. Finally, the scRNAseq by Usoskin et al was performed on adult sensory neurons whereas our bulk RNAseq was performed on E18.5 embryos. Thus, because gene expression in developing sensory neurons is well-known to be highly dynamic, it is not expected that the transcriptional signature of sensory neurons subclasses in E18.5 embryo perfectly matches the transcriptional signature of adult subclasses. Finally, we agree that immunostaining for Meis2 together with few selected DEGs would give a better answer on whether they co-localize or not, but our lack of experience with those antibodies together with the lack of financial support for the proposal precludes achieving this pertinent point.

      1.5) The authors identify Gabra1 and Gabra4 as upregulated and Gabrr1 as downregulated genes in MEIS2 cKO animals. Does this reflect a change in GABA-receptor subunit composition in LMTRs?

      This is an interesting point. First, in our new analysis, increasing the cut-off to 100 reads excluded Gabrr1 from the DEGs. Based on our results, we cannot conclude whereas Gabra1 and Gabra4 up-regulation reflects a change in GABA receptors composition. However, in the GEO term associated to Gabaergic synapse, whereas Gabra1 and Gabra4 were up-regulated the ionotropic glutamate receptor Grid1 was downregulated, rather claiming for an imbalanced GABA/Glutamate transmission. Finally, the increased GABAR expression in the LTMRs might be expected to increase pre-synaptic inhibition on the LTMR synapses onto target neurons in the dorsal horn, thus decreasing synaptic transmission from these neurons into spinal circuits.

      2) The authors assessed SA-LTMR innervating Merkel cells in glabrous and hairy skin by IFC staining for neurofilament H and electrophysiological recordings. Due to the small sample size, they pooled recordings, reasoning that nerves that do not successfully innervate Merkel cells (i.e. cKO glabrous skin) do not evoke electrophysiological responses following a touch stimulus.

      2.1) It is undoubtedly true that non-innervating nerves will likely not show electrophysiological responses. However, by pooling the recordings of SA-LTMRs from glabrous and hairy skin, the data obtained from the 20% successful recordings of SA-LTMRs from glabrous cKO skin (according to Fig. 4E, upper panel) will be overrepresented and hence lead to a systematic bias. How many recordings were made from the glabrous and hairy skin of each genotype? In case the number of recordings from cKO/glabrous skin is the limiting factor, does the observed difference in vibration threshold hold true when only recordings from hairy skin are compared?

      As explained in the text and in our answers to reviewer 1, data for hairy and glabrous SAMs where initially pooled as no differences between them were expected, and next planned electrophysiological experiments were compromised due to the Covid19 pandemic. We are sorry that at this point, we cannot provide additional experiments to clarify this important point. In addition, as mention

      3) From the IFC images shown in Fig. 6A, it is not clear how the authors quantified branch points and innervated hair follicles.

      Branch points correspond to every time a nerve split in 2 or more nerves. Innervated follicles correspond to follicles that are entangled by circumferential and/or lanceolate Nefh+ endings.

      4) The quality of the data is very high, but there are several ambiguities and errors in their presentation.

      We apologize for this mistake. Figure 1 Supplementary 1 that reports data from Cat walk analysis is now appropriately included in the files.

      4.2) Fig. 3A is confusing and the figure legend just repeats what is already said in the text. What do yellow, blue, and pink represent?

      Figure 3 is now fully remade. Legend is now better indicated in Figure 3A. We hope it is now more clear.

      4.3) What genotype do the black, grey, and white boxplots in Fig. 6C Fig. 3 Supplementary 1B correspond to?

      The legends were missing for Figure 6C and Figure 3 supplementary 1B. They are now appropriately included.

      4.4) Up- and downregulated genes are assigned differently in Fig. 3 and Fig. 3 Supplementary 2. The figure legend of Fig. 3 Suppl 2 lists panel B as up-regulated genes but the same genes are labeled down-regulated in Fig. 3.

      We apologize for this previous mistake. Figure 3 and corresponding supplementary figures have been redone in the new version.

      4.5) Fig. 3E would benefit from a more detailed description. One can easily appreciate that the neurofilament H staining in the cKO sample is different from that of the WT sample but what exactly can be seen here?

      We added the following sentence in the results section: “In WT newborn mice, numerous Nefh+ sensory fibers surround all dermal papillae of the hairy skin and footpad of the glabrous skin, whereas in Wnt1Cre::Meis2LoxP/LoxP littermates, very few Nefh+ sensory fibers are present and they poorly innervate the dermal papillae and footpads.“.

      4.6) The figure legend to Fig. 4A is unclear. Does the graph show the sum of all recordings performed? From the text, one would guess that the bars correspond to the cKO samples, but this is not specified. Do the controls correspond to WT, Islet1Cre/+ or a mixture of both? In addition, the graph in the lower panel is labeled % Ab fibers, the figure legend reads % of tap units among Ab fibers.

      The graphs show the number of tap units identified among all recorded Afibers. Numbers show the number of tap units over the number of recorded fibers. This as been now reformulated in the last version of the manuscript.

      4.7) The abbreviation SAM in figure legends 4F, G is not introduced.

      This is now indicated in the figure legend.

      4.8) Readers who are not familiar with the traces above the graphs in 4F and 4G will find a more detailed description helpful.

      This is now indicated in the figure legend.

      4.9) Lines 274-275: Does the statement "Finally, consistent with the lack of neuronal loss in Isl1Cre/+::Meis2LoxP/LoxP, the number of recorded fibers were identical in WT and Isl1Cre/+::Meis2LoxP/LoxP." refer to Fig. 4G? This is not specified in the text.

      These data were not included in the first version of the manuscript as we though they were not significantly informative. They just indicate the overall numbers of fibers that were recorded in electrophysiological experiments. The sentence has been now removed in the last version of the manuscript to avoid misunderstanding.

      4.10) There is no Fig. 6 supplementary 1.

      The typo is now corrected. The corresponding data were in fact in Figure 5 Supplementary 1.

      Minor points:

      • Gangfuß et al. report that a patient previously diagnosed with a range of neurological deficits including the diagnosis of severe infantile autism is heterozygous mutant for MEIS2. Although this study links MEIS2 gene function to ASD in the wider sense, adding a few additional references will make the link stronger. Examples are Shimojima et al., Hum Genome Var 2017 or Bae et al., Science 2022.

      These two references have been now included in the introduction section of the manuscript.

      • In some figures (e.g. Fig. 4) the numbering of the panels does not follow the order in which the respective data are mentioned in the text.

      Figure 4 is now re-organized so that panels follow the same order as in the results section.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitrogen metabolism is of fundamental importance to biology. However, the metabolism and biochemistry of guanidine and guanidine containing compounds, including arginine and homoarginine, have been understudied over the last few decades. Very few guanidine forming enzymes have been identified. Funck et al define a new type of guanidine forming enzyme. It was previously known that 2-oxogluturate oxygenase catalysis in bacteria can produce guanidine via oxidation of arginine. Interestingly, the same enzyme that produces guanidine from arginine also oxidises 2-oxogluturate to give the plant signalling molecule ethylene. Funck et al show that a mechanistically related oxygenase enzyme from plants can also produce guanidine, but instead of using arginine as a substrate, it uses homoarginine. The work will stimulate interest in the cellular roles of homoarginine, a metabolite present in plants and other organisms including humans and, more generally, in the biochemistry and metabolism of guanidines.

      1) Significance

      Studies on the metabolism and biochemistry of the small nitrogen rich molecule guanidine and related compounds including arginine have been largely ignored over the last few decades. Very few guanidine forming enzymes have been identified. Funck et al define a new guanidine forming enzyme that works by oxidation of homoarginine, a metabolite present in organisms ranging from plants to humans. The new enzyme requires oxygen and 2oxogluturate as cosubstrates and is related, but distinct from a known enzyme that oxidises arginine to produce guanidine, but which can also oxidise 2-oxogluturate to produce the plant signalling molecule ethylene.

      Overall, I thought this was an exceptionally well written and interesting manuscript. Although a 2-oxogluturate dependent guanidine forming enzyme is known (EFE), the discovery that a related enzyme oxidises homoarginine is really interesting, especially given the presence of homoarginine in plant seeds. There is more work to be done in terms of functional assignment, but this can be the subject of future studies. I also fully endorse the authors' view that guanidine and related compounds have been massively understudied in recent times. I would like to see the possibility that the new enzyme makes ethylene explored. Congratulations to the authors on a very nice study.

      Response: We thank the reviewer for the positive evaluation of our manuscript. In the revised version, we have emphasized more clearly that we found no evidence for ethylene production by the recombinant enzymes. The other suggestions of the reviewer are also considered in the revised version as detailed below.

      Reviewer #2 (Public Review):

      In this study, Dietmar Funck and colleagues have made a significant breakthrough by identifying three isoforms of plant 2-oxoglutarate-dependent dioxygenases (2-ODD-C23) as homo/arginine-6-hydroxylases, catalyzing the degradation of 6-hydroxyhomoarginine into 2aminoadipate-6-semialdehyde (AASA) and guanidine. This discovery marks the very first confirmation of plant or eukaryotic enzymes capable of guanidine production.

      The authors selected three plant 2-ODD-C23 enzymes with the highest sequence similarity to bacterial guanidine-producing (EFE) enzymes. They proceeded to clone and express the recombinant enzymes in E coli, demonstrating capacity of all three Arabidopsis isoforms to produce guanidine. Additionally, by precise biochemical experiments, the authors established these three 2-ODD-C23 enzymes as homoarginine-6-hydroxylases (and arginine-hydroxylase for one of them). Furthermore, the authors utilized transgenic plants expressing GFP fusion proteins to show the cytoplasmic localization of all three 2-ODD-C23 enzymes. Most notably, using T-DNA mutant lines and CRISPR/Cas9-generated lines, along with combinations of them, they demonstrate the guanidine-producing capacity of each enzyme isoform in planta. These results provide robust evidence that these three 2-ODD-C23 Arabidopsis isoforms are indeed homoarginine-6-hydroxylases responsible for guanidine generation.

      The findings presented in this manuscript are a significant contribution for our understanding of plant biology, particularly given that this work is the first demonstration of enzymatic guanidine production in eukaryotic cells. However, there are a couple of concerns and potential ways for further investigation that the authors should (consider) incorporate.

      Firstly, the observation of cytoplasmic and nuclear GFP signals in the transgenic plants may also indicate cleaved GFP from the fusion proteins. Thus, the authors should perform Western blot analysis to confirm the correct size of the 2-ODD-C23 fusion proteins in the transgenic protoplasts.

      Secondly, it may be worth measuring pipecolate (and proline?) levels under biotic stress conditions (particularly those that induce transcript changes of these enzymes, Fig S8). Given the results suggesting a potential regulation of the pathway by biotic stress conditions (eg. meJA), these experiments could provide valuable insights into the physiological role of guanidine-producing enzymes in plants. This additional analysis may give a significance of these enzymes in plant defense mechanisms.

      Response: We thank also reviewer 2 for the positive evaluation and useful suggestions. We performed the proposed GFP Western blot, which indeed indicated the presences of both, fulllength fusion proteins and free GFP, which can explain the partial nuclear localization. We fully agree that further experiments with biotic and abiotic stress will be required to determine the physiological function of the 2-ODD-C23 enzymes. However, the list of potential experiments is long and they are beyond the scope of the present manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Specific points

      Overall, I thought this was a very interesting study, comprising biochemical, cellular, and in vivo studies. Of course more could be done on each of these, and likely will be, but I think the assignment of biochemical function is very strong, across all three approaches. The one new experiment I would like to see is a clear demonstration of whether ethylene is produced - unlikely but should be tested.

      We had mentioned our failure to detect ethylene production by the plant enzymes in the previous version and have made it more prominent and reliable by including ethylene production as positive control in the new supplementary figure S5.

      Abstract

      Delete 'hitherto overlooked' - this is implicit 'but is more likely' to 'is likely'?

      Agreed and modified

      Introduction

      Second sentence - what about relevant small molecule primary metabolites including precursors of proteins/nucleic acids.

      We modified the sentence accordingly.

      Paragraph 2 - maybe also note EFE produces glutamate semi aldehyde, via arginine C-5 oxidation.

      Paragraph 2 has been re-phrased according to your suggestion.

      Overall, I thought the introduction was exceptionally well written.

      Perhaps either in the introduction, or later, note there are other 2OG oxygenases that oxidise arginine/arginine derivatives in various ways, e.g. clavaminate synthase/arginine hydroxylases/desaturases.

      We added a sentence mentioning the arginine hydroxylases VioC and OrfP to the introduction and included VioC into the sequence comparison in supplementary figure 2 to show that these enzymes, as well as NapI, are very different from EFE and the plant hydroxylases.

      Results

      Paragraph 1 - qualify similarity and refer to/give a structurally informed sequence alignment, including EFE

      A new supplemental figure S2 was added with sequence identity values and a structurally informed alignment. The text has been modified accordingly.

      Paragraph 2 - briefly state method of guanidine analysis

      We included a reference to the M&M section and mentioned LC-MS in paragraph 2.

      Figure 1 - trivial point - proteins are not expressed/genes are

      We have modified the legend to figure 1. However, we would like to point out that terms like “recombinant protein expression” are widely used in the field. A quick search with google Ngram viewer shows that “protein expression” started to appear in the mid-80ies and its use stayed constantly at 1/8th of “gene expression”.

      Define errors clearly in all figure legends, clearly defining biological/technical repeats<br /> Page 6 - was the His-tag cleared to ensure no issues with Ni contamination?

      We treat individual plants or independent bacterial cultures as biological replicates. Only in the case of enzyme activity assays with NAD(P)H, technical replicates were used and this has been indicated in the legend of figure 6.

      Lower case 'p' in pentafluorobenzyl corrected

      In Figure 2 make clear the hydroxylated intermediates are not observed

      We now use grey color for the intermediates and have put them in brackets. Additionally we state in the figure legend that these intermediates were not detected.

      Pages 6-7 - I may have missed this but it's important to investigate what happens to the 2OG. Is succinate the only product or is ethylene also produced? This possibility should also be considered in the plant studies, i.e. is there any evidence for responses related to perturbed ethylene metabolism. The authors consider a signalling role relating to AASA/P6C, but seem to ignore a potential ethylene connection.

      As stated above, we checked for ethylene production with negative result. EFE produced 6 times more guanidine than the plant enzymes under the same condition, but even 100-fold lower ethylene production would have been clearly detected.

      Page 12 - 'plants have been shown to....' Perhaps note how hydroxy guanidine is made?

      We now mention the canavanine-γ-lyase that cleaves canavanine into hydroxyguanidine and homoserine.

      Overall, I thought the discussion was good, but perhaps a bit long/too speculative on pages 12/13 and this detracted from the biochemical assignment of the enzyme. I'd suggest shortening the discussion somewhat - the precise roles of the enzyme can be the subject of future work. As indicated above, some discussion on potential links to ethylene would be appreciated.

      Since reviewer 2 wanted more (speculative) discussion on the role of the 2-ODD-C23 enzymes and there was no detectable ethylene production, we took the liberty to leave the discussion largely unaltered.

      I'd also like to see some more consideration/metabolic analyses of guanidine related metabolism in the genetically modified plants.

      Such analyses will certainly be included in future experiments once we get an idea about the physiological role of the 2-ODD-C23 enzymes.

      Page 16 - mass spectrometry

      Corrected.

      Please add a structurally informed sequence alignment with EFE and other 2OG oxygenases acting on arginine/derivatives.

      An excerpt of the alignment is now presented in supplementary figure S2.

      Reviewer #2 (Recommendations For The Authors):

      I would like to see more discussion in the manuscript about the possible interconnection/roles between 2-ODD-C23 guanidine-producing, lysine- ALD1-Pipecolate producing, and proline metabolism pathways during both biotic and abiotic stresses.

      Since we were unable to detect pipecolate in any of our plant samples and also our preliminary results with biotic stress did not produce any evidence for a function of the 2ODD-C23 enzymes in the tested defense responses, we would like to postpone such extended discussion until we find a condition where the physiological function of these enzymes is evident.

      Fig. 4: Authors should change colors for Col-0, 0.2 HoArg and ctrl? They look too similar in my pdf file.

      We changed the colors in figure 4 and hope that the enhanced contrast is maintained during the production of the final version of our article.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      The single-mutant and double-mutant crp/rpoB strains were made by co-transduction with a nearby gene deletion (kanR-marked). I couldn't tell from the methods section whether these mutants, e.g., crp-H22N delta-chiA, were compared to wild-type cells or deletion mutants, e.g., delta chiA, in the proteomics experiments. I encourage the authors to explain this more clearly in the methods section, and to briefly mention in the Results section and relevant figure legends that the crp/rpoB mutant strains (and possibly the "wild-type" strains) also have gene deletions. If the comparison "wild-type" strains are fully wild-type (i.e., not deleted for chiA/yjaH), it is especially important to mention this in the Results section and the figure legends since the phenotypic changes could be due to the gene deletions rather than the mutations in crp/rpoB

      We appreciate and agree with the editor's suggestion to clarify this point.

      Accordingly, we have made the following changes to the text:

      p11 L30-34 in the main text:

      "The second experiment similarly compared an engineered BW25113 (BW) strain, containing the two regulatory mutations from the compact set (i.e., crp H22N and rpoB A1245V) together with the deletions used to insert them (see methods and DataS1 file), to a “wild type” BW strain (a corresponding knockout strain without the mutations, see methods)."

      p28 under Chemostat proteomics experiment L13-16 in methods:

      "The starting volume of each bioreactor was 150 ml M9 media supplemented with either 30 mM and 10mM D-xylose for the evolved and ancestor samples or only 10mM D-xylose for BW including compact set mutations and/or the deletions used for their insertions (DataS1 file). The minimal media also included trace elements and vitamin B1 was omitted."

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Sender et al describe a model to estimate what fraction of DNA becomes cell-free DNA in plasma. This is of great interest to the community, as the amount of DNA from a certain tissue (for example, a tumor) that becomes available for detection in the blood has important implications for disease detection.

      However, the authors' methods do not consider important variables related to cell-free DNA shedding and storage, and their results may thus be inaccurate. At this stage of the paper, the methods section lacks important detail. Thus, it is difficult to fully assess the manuscript and its results.

      Strengths:

      The question asked by the authors has potentially important implications for disease diagnosis. Understanding how genomic DNA degrades in the human circulation can guide towards ways to enrich for DNA of interest or may lead to unexpected methods of conserving cell-free DNA. Thus, the question "how much genomic DNA becomes cfDNA" is of great interest to the scientific and medical community. Once the weaknesses of the manuscript are addressed, I believe this manuscript has the potential to be a widely used resource.

      Weaknesses:

      There are two major weaknesses in how the analysis is presented. First, the methods lack detail. Second, the analysis does not consider key variables in their model.

      Issues pertaining to the methods section.

      The current manuscript builds a flux model, mostly taking values and results from three previous studies: 1) The amount of cellular turnover by cell type, taken from Sender & Milo, 2021

      2) The fractions of various tissues that contribute DNA to the plasma, taken from Moss et al, 2018 and Loyfer et al, 2023

      My expertise lies in cell-free DNA, and so I will limit my comments to the manuscripts in (2). Paper by Loyfer et al (additional context):

      Loyfer et al is a recent landmark paper that presents a computational method for deconvoluting tissues of origin based on methylation profiles of flow-sorted cell types. Thus, the manuscript provides a well-curated methylation dataset of sorted cell-types. The majority of this manuscript describes the methylation patterns and features of the reference methylomes (bulk, sorted cell types), with a smaller portion devoted to cell-free DNA tissue of origin deconvolution.

      I believe the data the authors are retrieving from the Loyfer study are from the 23 healthy plasma cfDNA methylomes analyzed in the study, and not the re-analysis of the 52 COVID-19 samples from Cheng et al (MED 2021).

      Paper by Moss et al (additional context):

      Moss et al is another landmark paper that predates the Loyfer et al manuscript. The technology used in this study (methylation arrays) is outdated but is an incredible resource for the community. This paper evaluates cfDNA tissues of origin in health and different disease scenarios. Again, I assume the current manuscript only pulled data from healthy patients, although I cannot be sure as it is not described in the methods section.

      This manuscript:

      The current manuscript takes (I think) the total cfDNA concentration from males and females from the Moss et al manuscript (pooled cfDNA; 2 young male groups, 2 old male groups, 2 young female groups, 2 old female groups, Supplementary Dataset; "total_cfDNA_conc" tab). I believe this is the data used as total cfDNA concentration. It would be beneficial for all readers if the authors clarified this point.

      The tissues of origin, in the supplemental dataset ("fraction" tab), presents the data from 8 cell types (erythrocytes, monocytes/macrophages, megakaryocytes, granulocytes, hepatocytes, endothelial cells, lymphocytes, other). The fractions in the spreadsheet do not match the Loyfer or Moss manuscripts for healthy individuals. Thus, I do not know what values the supplementary dataset represents. I also don't know what the deconvolution values are used for the flux model.

      The integration of these two methods lack detail. Are the authors here using yields (ie, cfDNA concentrations) from Moss et al, and tissue fractions from Loyfer et al? If so, why? There are more samples in the Loyfer manuscript, so why are the samples from Moss et al. being used? The authors are also selectively ignoring cell-types that are present in healthy individuals (Neurons from Moss et al, 2018). Why?

      Appraisal:

      At this stage of the manuscript, I think additional evidence and analysis is required to confirm the results in the manuscript.

      Impact:

      Once the authors present additional analysis to substantiate their results, this manuscript will be highly impactful on the community. The field of liquid biopsies (non-invasive diagnostics) has the potential to revolutionize the medical field (and has already in certain areas, such as prenatal diagnostics). Yet, there is a lack of basic science questions in the field. This manuscript is an important step forward in asking more "basic science" questions that seek to answer a fundamental biological question.

      We thank the reviewer for the valuable comments on our analysis. In response to the feedback, we have updated the analysis to address all critical points as described below and revised the text to enhance the clarity of our methodology. One notable improvement to our analysis involved ensuring better alignment between the cohort data for cfDNA plasma concentration and cell turnover estimates. To achieve this, we utilized the total plasma concentration of cfDNA from a study conducted by Meddeb et al. 2019, taking into account the influence of age and sex on these concentrations and specifically focusing on a cohort of relatively young and healthy individuals. Additionally, we considered expected variations related to sex, age, and other pertinent factors, as outlined in the studies by Meddeb et al. 2019 and Madsen et al. 2019.

      In addition, we have addressed concerns regarding the technical aspects of cfDNA analysis, providing detailed explanations of their limited impact on our analysis and the resulting conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Cell-free DNA (cfDNA) are short DNA fragments released into the circulation when cells die. Plasma cfDNA level is thought to reflect the degree of cell-death or tissue injury. Indeed, plasma cfDNA is a reliable diagnostic biomarker for multiple diseases, providing insights into disease severity and outcomes. In this manuscript, Dr. Sender and colleagues address a fundamental question: What fraction of DNA released from cell death is detectable as plasma cfDNA? The authors use public data to estimate the amount of DNA produced from dying cells. They also utilize public data to estimate plasma cfDNA levels. Their calculations showed that <10% of DNA released is detectable as plasma cfDNA, the fraction of detectable cfDNA varying by tissue sources. The study demonstrates new and fundamental principles that could improve disease diagnosis and treatment via cfDNA.

      Strengths:

      1) The experimental approach is resource-mindful taking advantage of publicly available data to estimate the fraction of detectable cfDNA in physiological states. The authors did not assess if the fraction of detectable cfDNA changes in disease conditions. Nonetheless, their pioneering study lays the foundation and provides the methods needed for a similar assessment in disease states.

      2) The findings of this study potentially explain discrepancies in measured versus expected tissue-specific cfDNA from some tissues. For example, the gastrointestinal tract is subject to high cell turnover and release of DNA. Yet, only a small fraction of that DNA ends up in plasma as gastrointestinal cfDNA.

      3) The study proposes potential mechanisms that could account for the low fraction of detectable cfDNA in plasma relative to DNA released. This includes intracellular or tissue machinery that could "chew up" DNA released from dying cells, allowing only a small fraction to escape into plasma as cfDNA. Could this explain why the gastrointestinal track with an elaborate phagosome machinery contributes a small fraction of plasma cfDNA? Given the role of cfDNA as damage-associated molecular pattern in some diseases, targeting such a machinery may provide novel therapeutic opportunities.

      Weaknesses:

      In vitro and in vivo studies are needed to validate these findings and define tissue machinery that contribute to cfDNA production. The validation studies should address the following limitations of the study design: -

      1) Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).

      2) "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.

      3) Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

      We appreciate the reviewer's comments on our analysis. In response to the feedback, we have updated to address key points and revised the text accordingly.

      1) We have incorporated several enhancements to improve the coherence of our analysis. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      There was no specific estimate for a cohort of young males in both Meddeb et al. and Loyfer et al.; however, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in literature (Meddeb et al. 2019; Madsen et al. 2019). Thus, we demonstrate that sex and age have a small effect on the cfDNA concentrations and thus are unlikely to alter our conclusions substantially when considering a healthy population. We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion.

      2) In this study, we addressed the total amount of cfDNA in healthy individuals without regard to GC content, representation of different genomic regions, or fragment length, as the goal was to understand if cell death rates are fully accounted for by cfDNA concentration. We agree that it will be interesting to study the relative representation of the genome in cfDNA and the processes that determine cfDNA concentration in pathologies beyond the rate of cell death. These topics for future research fall beyond this study's scope.

      3) We know only a few specific cases whereby DNA is released from cells that are not dying. These include the release of DNA from erythroblasts and megakaryocytes to generate anucleated erythrocytes and platelets (Moss et al. 2022, cited in our paper) and the release of NETs from neutrophils.

      The presence of cfDNA fragments originating from megakaryocytes and erythroblasts indicates the elimination of megakaryocytes and erythroblasts and the birth of erythrocytes and platelets. However, the considerations in the rest of the paper still apply: the concentration of cfDNA from these sources is far lower than expected from the cell turnover rate.

      Concerning NETosis: the presence of cfDNA originating in neutrophils that have not died would reduce the concentration of cfDNA from dying neutrophils and thus further increase the discrepancy, which is the topic of our study (under-representation of DNA from dying cells in plasma).

      We neglected mitochondrial DNA, as it is not measured in methylation cell-of-origin analysis. Similarly to the argument above, if some of the total DNA measured in plasma is in fact, mitochondrial, this would mean that genomic cfDNA concentration is actually lower than the estimates, meaning that an even smaller fraction of DNA from dying cells is measured in plasma.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      I think readers would appreciate the authors commenting or addressing the following points, in addition to addressing the concerns I raised about the methods section in the public review:

      What variables and considerations did the authors omit in this study?

      1) Cell-free DNA is found in virtually every biofluid.

      Thus, the fact that cell-free DNA is not present in the plasma does not mean it cannot be detected elsewhere. This also implies that phagocytosis may not be the only factor related to cfDNA not being present in the blood. One example (of many, many others) is neutrophil-derived cell-free DNA, which is present in the urine.

      Indeed, dying cells and their DNA can be consumed locally, released into the blood, or shed outside the body. The latter is a function of tissue topology. For example, intestinal epithelial cell turnover releases material to the lumen of the gut (i.e., stool); kidney and bladder cell turnover releases material to urine; and lung epithelium releases material to the air spaces. In these cases, the absence of cfDNA in plasma is expected. However, in cases where tissue topology dictates release to blood, low representation in cfDNA indicates local consumption or a related mechanism. In Figure 1 of the manuscript, we distinguish between tissues according to their topology, labeling organs that shed material to the outside denoted by open circles.

      Neutrophil-derived DNA in urine likely represents a local process in the kidney (neutrophils that penetrate the epithelium and fall into the urine). Neutrophils that die elsewhere in the body must release cfDNA to the blood before it can reach the urine. Hence, quantifying plasma cfDNA is a legitimate approach for assessing the relationship between cell death and cfDNA. The revised text clarifies this point. We made revisions to the initial paragraph in the results section and a paragraph within the discussion to provide clarity on this topic:

      “Based on atlases of human cell type-specific methylation signatures, Moss et al. and Loyfer et al. analyzed the main cell types contributing to plasma cfDNA. They found the primary sources of plasma cfDNA to be blood cells: granulocytes, megakaryocytes, macrophages, and/or monocytes (the signature could not differentiate between the last two), lymphocytes, and erythrocyte progenitors. Other cells that had detectable contributions are endothelial cells and hepatocytes. Qualitatively, these cells represent most of the leading cell types in cellular turnover, as shown in Sender & Milo 2021 (Sender and Milo 2021). Epithelial cells of the gastrointestinal tract, lung, kidney, bladder, and skin are other cell types that significantly contribute to cellular turnover. Dying cells in these tissues are shed into the gut lumen, the air spaces, the urine, or out of the skin (note that while DNA from gut, lung, and kidney epithelial cells can be found in stool, bronchoalveolar lavage, and urine, the fate of DNA from skin cells is not known). This arrangement may explain why DNA from these cell types is not represented in plasma cfDNA in healthy conditions. Therefore, it appears that cells with high cfDNA plasma levels are those with relatively high turnover that are not being shed out of the body.”

      “A comparison between the different types of cells shows a trend in which less DNA flux from cells with higher turnover gets to the bloodstream. In particular, a tiny fraction (1 in 3x104) of DNA from erythroid progenitors arrives at the plasma, indicating an extreme efficiency of the DNA recovery mechanism. Erythroid progenitors are arranged in erythroblastic islands. Up to a few tens of erythroid progenitors surround a single macrophage that collects the nuclei extruded during the erythrocyte maturation process (pyrenocytes) (Chasis and Mohandas 2008). The amount of DNA discarded through the maturation of over 200 billion erythrocytes per day (Sender and Milo 2021) exceeds all other sources of homeostatic discarded DNA. Our findings indicate that the organization of dedicated erythroblastic islands functions highly efficiently regarding DNA utilization. Neutrophils are another high-turnover cell type with a low level of cfDNA. When contemplating the process of NETosis (Vorobjeva and Chernyak 2020), the existence of cfDNA originating from live neutrophils would potentially diminish the concentration of cfDNA released by dying neutrophils, thereby amplifying the observed ratio for this particular cell type. The overall trend of higher turnover resulting in a lower cfDNA to DNA flux ratio may indicate similar design principles, in which the utilization of DNA is better in tissues with higher turnover. However, our analysis is limited to only several cell types (due to cfDNA test and deconvolution sensitivities), and extrapolation to cells with lower cell turnover is problematic.”

      2) Effect of biofluid storage.

      Cell-free DNA continues to degrade after it is extracted via blood draw. This is not expected to change tissue of origin predictions (although that remains to be shown in the literature), but definitely affects extraction yield. This is not accounted for (or even discussed) in the manuscript. It would be important to understand how this was done for the data presented here.

      The paper integrates data from multiple recent studies that adhered to state-of-the-art procedures requiring rapid processing of blood samples. In fact, earlier studies that were not careful to isolate plasma quickly typically reported very high concentrations due to the lysis of leukocytes and artifactual release of genomic DNA. Rapid plasma isolation and DNA extraction typically yield 5ng/ml in healthy donors, as stated in the paper (last paragraph of Results).

      3) Batch effects

      Batch effects are not discussed here and can affect cfDNA yields.

      Our analysis relies on data reported by multiple studies from different groups, which independently results in similar key findings (total concentration of cfDNA and the relative contribution of different tissues). Thus, batch effects are unlikely to affect the calculations markedly.

      4) Cell-free DNA extraction kits

      Different kits and methods extract cell-free DNA at different quantities. Importantly, much research has been done recently that most kits are not sensitive for ultrashort cell-free DNA (of lengths ~50bp). This may represent most of the DNA present in plasma. This raises an important question: are the yields that are being used in Moss et al (where I presume the total concentration is taken from) accurate? Is there more cell-free DNA that was missed? While the importance of this ultrashort cfDNA has yet to be shown, it is in the blood. Thus, the authors' model may underestimate ratios by not accounting for this. This is mentioned in the discussion, but it is not evident why it was not added into the model.

      The Qiagen cfDNA extraction kit can detect 50bp fragments. As shown in the specification sheets of the kit (https://www.qiagen.com/us/products/diagnostics-and-clinical-research/solutions-for -laboratory-developed-tests/qiasymphony-dsp-circulating-dna-kit), urine DNA contains abundant DNA fragments that peak at 50bp. In contrast, plasma cfDNA does not contain such fragments at appreciable concentrations. This suggests that small fragments, 50-150bp long, are not a major component of cfDNA, and thus, our measurements of the total concentration of cfDNA are not dramatically underestimated.

      The convention regarding the size distribution of cfDNA fragments is based on extensive evidence using multiple approaches. For example, a study that profiled the DNA released by multiple cell lines in vitro (Aucamp et al. 2017) used another kit for DNA isolation – the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, Düren, Germany). This kit does extract fragments that are 50bp long (nucleospin-gel-and-pcr-clean-up-mini). Indeed, the DNA released from cultured cells did contain a peak at 50bp, but it was minor compared with the nucleosome-size peak.

      More recently, several studies did suggest the presence of ultra-short cfDNA fragments, 50 bp long on average, and concluded that such fragments might be present at a molar concentration that is comparable to that of nucleosome-protected DNA (for example, (Hisano et al. 2021)).

      Thus, our model estimates can be off by up to 2-fold (that is, actual cfDNA concentration measured in most studies overlooks the small fragments and thus underestimates the actual concentration of cfDNA by 2-fold). This is incorporated into the revised manuscript.

      We note that we cannot exclude the presence of abundant ultra-short DNA fragments (e.g., 10bp long). However, such fragments are not measurable in cfDNA analysis. Thus, we can refine our conclusion and state that only a small fraction of DNA of dying cells appears as measured cfDNA. We included a section in the methods detailing the integration of a potential factor for the short fragments and revised the discussion:

      “The overall plasma cfDNA concentration was multiplied by a factor of 1.5 to accommodate for the presence of small fragments of approximately 50 base pairs of cfDNA in the plasma. These fragments are suggested to contribute comparable molar concentrations (Hisano, Ito, and Miura 2021). Despite having approximately one-third of the mass, it is reasonable to presume that these fragments represent a similar number of genomes. This assumption is based on the idea that their source is a broken nucleosome unit, and the fragments represent the portion that was not degraded. Given the restricted data and its interpretation, we consider factors spanning the range of 1 (negligible effect) and 2 (doubling of the amount). The chosen factor, 1.5, is selected as the midpoint within this range of uncertainty.”

      “In this study, we report a surprising, dramatic discrepancy between the measured levels of cfDNA in the plasma and the potential DNA flux from dying cells. One hypothetical explanation for that discrepancy is the limited sensitivity of typical cfDNA assays to short DNA fragments, which may contribute a significant fraction of the overall cfDNA mass. Regular cfDNA analysis shows a size distribution concentrated around a length of 165 base pairs (bp). The sizes in ctDNA vary more, but most are longer than 100 bp (Alcaide et al. 2020; Udomruk et al. 2021). Recent studies suggested a significant fraction of single-strand ultrashort fragments (length of 25-60 bp) (Cheng et al. 2022; Hisano, Ito, and Miura 2021). However, the total amount of DNA contained in these fragments is less than or comparable to that of the longer “regular” nucleosome-protected cfDNA fragments (Cheng et al. 2022; Hisano, Ito, and Miura 2021), arguing against ultrashort fragments as a dominant explanation for the “missing” cfDNA material. We integrated the estimate provided by Hisano et al. into our analysis as a modifying factor for both the total concentration and uncertainty of plasma cfDNA. Importantly, this incorporation did not alter the overall conclusions, as the discrepancy between the cfDNA plasma concentration and potential DNA flux remains on the same order of magnitude. We note that we cannot exclude the presence of abundant DNA fragments that are even shorter (e.g., 10bp long) and are not measurable in cfDNA analysis. Thus, our formal conclusion is that only a small fraction of the DNA of dying cells appears as measurable cfDNA.”

      5) Health status of samples analyzed.

      Health, sex and physical activity affects cfDNA yields. This is not accounted for or discussed in the manuscript.

      We incorporated several enhancements to improve our analysis in response to the provided feedback. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      Furthermore, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in the works of (Meddeb et al. 2019; Madsen et al. 2019). Our intent in doing so was to demonstrate that these factors are unlikely to alter our conclusions substantially when considering a healthy population. We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion:

      “Our estimates for total plasma cfDNA concentration were derived from the median concentration observed in individuals below 47 years of age (n=52), as reported by (Meddeb et al. 2019). To complement this, we integrated our total concentration estimates with data on the proportion of cfDNA originating from specific cell types, leveraging a plasma methylome deconvolution method described by (Loyfer et al. 2023), which did not provide absolute quantities of cfDNA). To quantify the uncertainty associated with our cfDNA concentration estimates, we employed a methodology that considered several sources of variation. First, we incorporated the confidence interval of the median concentration reported by Meddeb et al. as a measure of uncertainty. Additionally, we accounted for individual-specific and analytic variations based on the study by (Madsen et al. 2019), encompassing factors such as the precise timing of measurements and assay precision. These sources of uncertainty were combined using the approach outlined below.”

      “Our current analysis focused on estimating plasma cfDNA concentration and cellular turnover in a cohort of healthy, relatively young individuals. The total plasma cfDNA concentrations were sourced from healthy individuals below 47 years, as reported by (Meddeb et al. 2019). We use data analyzed based on plasma samples from healthy individuals to estimate the proportion of cfDNA originating from specific cell types (Loyfer et al. 2023). These values were then compared to the potential DNA flux resulting from homeostatic cellular turnover, estimated for reference healthy males aged between 20 and 30 (Sender and Milo 2021). In our analysis, we considered various sources of uncertainty, including inter-individual variation, variability in the timing of sample collection, and analytical precision (Madsen et al. 2019; Meddeb et al. 2019). These factors collectively contributed to an uncertainty factor of less than 3. Importantly, this level of uncertainty does not alter our conclusion regarding the relatively small fraction of DNA present in plasma as cfDNA. Furthermore, we acknowledge that age and sex can impact total cfDNA concentration, as demonstrated by (Meddeb et al. 2019), with potential variations of up to 30%. However, as the results of our analysis present a much larger difference, these effects do not change the conclusions drawn from our analysis. Nevertheless, age and health status may influence the proportion of cfDNA originating from specific cell types and their corresponding cellular turnover rates. Consequently, the ratios themselves may vary in the elderly population or individuals with underlying health conditions.”

      Reviewer #2 (Recommendations For The Authors):

      1) Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).

      We have incorporated several enhancements to improve the coherence of our analysis. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      There was no specific estimate for a cohort of young males in both Meddeb et al. and Loyfer et al.; however, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in literature (Meddeb et al. 2019; Madsen et al. 2019). Thus, we demonstrate that sex and age have a small effect on the cfDNA concentrations and thus are unlikely to alter our conclusions substantially when considering a healthy population.

      We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion.

      “Our estimates for total plasma cfDNA concentration were derived from the median concentration observed in individuals below 47 years of age (n=52), as reported by (Meddeb et al. 2019). To complement this, we integrated our total concentration estimates with data on the proportion of cfDNA originating from specific cell types, leveraging a plasma methylome deconvolution method described by (Loyfer et al. 2023), which did not provide absolute quantities of cfDNA). To quantify the uncertainty associated with our cfDNA concentration estimates, we employed a methodology that considered several sources of variation. First, we incorporated the confidence interval of the median concentration reported by Meddeb et al. as a measure of uncertainty. Additionally, we accounted for individual-specific and analytic variations based on the study by (Madsen et al. 2019), encompassing factors such as the precise timing of measurements and assay precision. These sources of uncertainty were combined using the approach outlined below.”

      “Our current analysis focused on estimating plasma cfDNA concentration and cellular turnover in a cohort of healthy, relatively young individuals. The total plasma cfDNA concentrations were sourced from healthy individuals below 47 years, as reported by (Meddeb et al. 2019). We use data analyzed based on plasma samples from healthy individuals to estimate the proportion of cfDNA originating from specific cell types (Loyfer et al. 2023). These values were then compared to the potential DNA flux resulting from homeostatic cellular turnover, estimated for reference healthy males aged between 20 and 30 (Sender and Milo 2021). In our analysis, we considered various sources of uncertainty, including inter-individual variation, variability in the timing of sample collection, and analytical precision (Madsen et al. 2019; Meddeb et al. 2019). These factors collectively contributed to an uncertainty factor of less than 3. Importantly, this level of uncertainty does not alter our conclusion regarding the relatively small fraction of DNA present in plasma as cfDNA. Furthermore, we acknowledge that age and sex can impact total cfDNA concentration, as demonstrated by (Meddeb et al. 2019), with potential variations of up to 30%. However, as the results of our analysis present a much larger difference, these effects do not change the conclusions drawn from our analysis. Nevertheless, age and health status may influence the proportion of cfDNA originating from specific cell types and their corresponding cellular turnover rates. Consequently, the ratios themselves may vary in the elderly population or individuals with underlying health conditions.”

      2) "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.

      In this study, we addressed the total amount of cfDNA in healthy individuals without regard to GC content, representation of different genomic regions, or fragment length, as the goal was to understand if cell death rates are fully accounted for by cfDNA concentration. We agree that it will be interesting to study the relative representation of the genome in cfDNA and the processes that determine cfDNA concentration in pathologies beyond the rate of cell death. These topics for future research fall beyond this study's scope.

      3) Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

      We know only a few specific cases whereby DNA is released from cells that are not dying. These include the release of DNA from erythroblasts and megakaryocytes to generate anucleated erythrocytes and platelets (Moss et al. 2022, cited in our paper) and the release of NETs from neutrophils.

      The presence of cfDNA fragments originating from megakaryocytes and erythroblasts indicates the elimination of megakaryocytes and erythroblasts and the birth of erythrocytes and platelets. However, the considerations in the rest of the paper still apply: the concentration of cfDNA from these sources is far lower than expected from the cell turnover rate.

      Concerning NETosis: the presence of cfDNA originating in neutrophils that have not died would reduce the concentration of cfDNA from dying neutrophils and thus further increase the discrepancy, which is the topic of our study (under-representation of DNA from dying cells in plasma).

      We updated a paragraph in the discussion regarding this issue:

      “A comparison between the different types of cells shows a trend in which less DNA flux from cells with higher turnover gets to the bloodstream. In particular, a tiny fraction (1 in 3x104) of DNA from erythroid progenitors arrives at the plasma, indicating an extreme efficiency of the DNA recovery mechanism. Erythroid progenitors are arranged in erythroblastic islands. Up to a few tens of erythroid progenitors surround a single macrophage that collects the nuclei extruded during the erythrocyte maturation process (pyrenocytes) (Chasis and Mohandas 2008). The amount of DNA discarded through the maturation of over 200 billion erythrocytes per day (Sender and Milo 2021) exceeds all other sources of homeostatic discarded DNA. Our findings indicate that the organization of dedicated erythroblastic islands functions highly efficiently regarding DNA utilization. Neutrophils are another high-turnover cell type with a low level of cfDNA. When contemplating the process of NETosis (Vorobjeva and Chernyak 2020), the existence of cfDNA originating from live neutrophils would potentially diminish the concentration of cfDNA released by dying neutrophils, thereby amplifying the observed ratio for this particular cell type. The overall trend of higher turnover resulting in a lower cfDNA to DNA flux ratio may indicate similar design principles, in which the utilization of DNA is better in tissues with higher turnover. However, our analysis is limited to only several cell types (due to cfDNA test and deconvolution sensitivities), and extrapolation to cells with lower cell turnover is problematic.”

      We neglected mitochondrial DNA, as it is not measured in methylation cell-of-origin analysis. Similarly to the argument above, if some of the total DNA measured in plasma is in fact mitochondrial, this would mean that genomic cfDNA concentration is actually lower than the estimates, meaning that an even smaller fraction of DNA from dying cells is measured in plasma.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We would firstly like to thank all reviewers for their comments and support of this manuscript.

      Reviewer #1 (Recommendations For The Authors):

      No further recommendations.

      Reviewer #2 (Recommendations For The Authors):

      All of my comments have been sufficiently addressed.

      Reviewer #3 (Recommendations For The Authors):

      Thanks for responding to my former recommendations constructively. I believe these points have been fully addressed in this new version.

      However, I have not seen any comments on the points I raised in my former public review concerning the I-2 dependence of the FonSIX4 cell death. Do you know whether FonSIX4 would trigger cell death in tissues not expressing any I-2?

      We are a little confused concerning this comment. I-2 is a different class of resistance protein (NLR) that recognises Avr2 and this is likely to be intracellular. From the previous public review, we believe reviewer 3 may have been asking us to clarify the dependence of I (MM or M82) on FonSIX4 cell death. We have performed these controls by expressing FonSIX4 and associated FonSIX4/Avr1 chimeras in N. benthamiana (with the PR-1 signal peptide for efficient secretion of effectors) and it does not cause cell death in the absence of the I receptor – see S11F Fig. This was not explicitly conveyed in text so we have included the following in text: “Using the N. benthamiana assay we show FonSIX4 is recognised by I receptors from both cultivars (IM82 and iMoneymaker) and cell death is dependent on the presence of IM82 or iMoneymaker (Fig 5B, S11 Fig).”

      I still recommend discussing whether the Avr1 residues crucial for Avr activity are in the same structural regions of the C-terminal domain where previous work has identified residues under diversifying selection in symbiotic fungal FOLD proteins.

      The region important for recognition does encompass some residues within the structural region identified to be under diversifying selection in FOLD effectors from Rhizophagus irregularis previously reported (two residues within one beta-strand). However, we also see residues that don’t overlap to this area. We also note that the mycFOLD proteins analysed in symbiotic fungi are heavily skewed towards strong structurally similarity with FolSIX6 (similar cysteine spacing within both N and C-domains and structural orientation of the N and C-domains) rather than Avr1. We are under the impression that Avr1 was not included in the analysis of diversifying selection in symbiotic fungal FOLD proteins, it also is unclear to us if close Avr1 homologues are present. With this in mind, and considering our already lengthy discussion (as previously highlighted during reviewer), we have decided not to include further discussion concerning this point.


      The following is the authors’ response to the original reviews.

      We would like to thank the editor(s) and reviewers for their work concerning our manuscript. Most of the suggested changes were related to text changes which we have incorporated into the revised version. Please find our response to reviewers below.

      Reviewer #1 (Recommendations For The Authors):

      I only have very minor suggestions for the authors. The first one comes from reading the manuscript and finding it very dense with so many acronyms. This will limit the audience that will read the study and appreciate its impact. This is more noticeable in the Results, with many passages that I would suggest moving to Methodology.

      We thank reviewer 1 for their very positive review. We understand that due to the nature of this study, which includes many protein alleles/mutations that were expressed with different boundaries etc., it is difficult to achieve this. Reviewer 2 asked for more details to be provided. We hope we have achieved a nice balance in the revised manuscript.

      Something else that would facilitate the reading of the manuscript is the effectors name. The authors use the SIX name or the Avr name for some effectors and it makes it difficult to follow up.

      We have tried to make this consistent for Avr1 (SIX4), Avr2 (SIX3) and Avr3 (SIX1). Other SIX effectors are not known Avrs so the SIX names were used.

      Reading the manuscript and seeing how in most of the sections the authors used a computational approach followed by an experimental approach, I wonder why Alphafold2-multimer was not used to investigate the interaction between the effector and the receptor?

      This is a great suggestion, we have certainly investigated this, however to date there is no experimental evidence to directly support the direct interaction between I and Avr1. Post review, we spent some time trying to capture an interaction using a co-immunoprecipitation approach however to date we have not been able to obtain robust data that support this. We are currently looking to study this utilising protein biophysics/biochemistry but this work will take some time.

      Reviewer #2 (Recommendations For The Authors):

      We thank reviewer 2 for the very thorough editing and recommendations. We have incorporated all minor text edits below into the manuscript.

      Line 43: perhaps "Effector recognition" instead of "Effector detection", to be consistent with line 51?

      Line 60: Change to "leads".

      Line 79: Italicise Avr2.

      Line 94: Add the acronym ETI in parentheses after "effector-triggered immunity".

      Line 106: "(Leptosphaeria Avirulence-Supressing)" should be "(Leptosphaeria Avirulence and Supressing)".

      Line 112: Change "defined" to "define".

      Line 119: Spell out the species name on first use.

      Line 205: Glomeromycota is a division rather than a genus. Consistent with Fig 2, it also does not need to italicized.

      Line 207: Change "basidiomycete" to "Division Basidiomycota", consistent with Fig 2.

      Line 214: Change "alignment of Avr1, Avr3, SIX6 and SIX13" to "alignment of the mature Avr1, Avr3, SIX6 and SIX13 sequences".

      Line 324: Change "solved structures" to "solved protein structures".

      Line 335: Spell out acronyms like "MS" on first use in figure legends. Also dpi in other figure legends.

      Line 341: replace "effector-triggered immunity (ETI)" with "(ETI)" - see comment on Line 94.

      Line 370: Change "domains" to "domain".

      Line 374: In the title, change "C-terminus" to C-domain", consistent with the rest of the figure legend.

      Line 404: Change "(basidiomycetes and ascomycetes)" to "(Basidiomycota and Ascomycota fungi)", consistent with Fig 2C.

      Line 416: Change "in" to "by".

      Line 427: un-italicize the parentheses.

      Line 519: First mention of NLR. Spell out the acronym on first use in main text. S5 and S11 figure titles should be bolded.

      Line 852: Replace "@" with "at".

      S4 Table: Gene names should be italicised.

      S5 Table: Needs to be indicated that the primer sequences are in the 5´-3´ orientation.

      With regards to the Agrobacterium tumefaciens-mediated transient expression assays involving co-expression of the Avr1 effector and I immune receptor, the authors need to make clear how many biological replicates were performed as this information is only provided for the ion leakage assay.

      We have added these data to the figure legend

      Line 57: For me, the text "Fol secretes a limited number of structurally related effectors" reads as Fol secretes structurally related effectors, but very few of them are structurally related. Perhaps it would be better to say that the effector repertoire of Fol is made up of proteins that adopt a limited number of structural folds, or that the effector repertoire can be classified into a reduced set of structural families?

      This edit has been incorporated.

      Lines 66-67: Subtle re-wording required for "The best-characterized pathosystem is F. oxysporum f. sp. lycopersici (Fol)", as a pathosystem is made up of a pathogen and its host. Perhaps "The best-characterized pathosystem involves F. oxysporum f. sp. lycopersici (Fol) and tomato".

      Sentence has been reworded.

      Line 113 and throughout: Stick with one of "resistance protein", "receptor", "immune receptor" and "immunity receptor" throughout the manuscript.

      We have decided to use both receptor and immunity receptor as not all receptors investigated in the manuscript provide immunity.

      Lines 149-150: The title does not fully represent what is shown in the figure. The text "that is unique among fungal effectors" can be deleted as there is nothing in Fig 1 that shows that the fold is unique to fungal effectors.

      Figure title has been changed.

      Line 173: The RMSD of Avr3 is stated as being 3.7 Å, but in S3 Fig it is stated as being 3.6 Å.

      This was a mistake in the main text and has been corrected.

      Lines 202-204: This sentence needs to be reworded, as the way that it is written implies that the Diversispora and Rhizophagus genera are in the Ascomycota division. Also, "Ascomycetes" should be changed to "Ascomycota fungi", consistent with Fig 2.

      Sentence has been reworded.

      Line 233: "Scores above 8". What type of scores? Z-scores?

      These are Z-scores. This has been added in text.

      Lines 242-246: It is stated that SIX9 and SIX11 share structural similarity to various RNA-binding proteins, but no scores used to make these assessments is given. The scores should be provided in the text.

      Z-scores have been added.

      Fig 4A: SIX3 should be Avr2, consistent with line 292. The gene names should be italicised in Fig 4A.

      SIX3 was changed to Avr2. Gene names have been italicised.

      Line 356: Subtle rewording required, as "co-infiltrated with both IM82 and iMoneymaker" implies that you infiltrated with protein rather than Agrobacterium strains.

      Sentence has been reworded.

      Fig 5A, Fig 5C and Line 380: Light blue is used, but this looks grey. Perhaps change colour, as grey is already used to show the pro-domain in Fig 5A (or simply change the colour used to highlight the pro-domain)?

      Colour depicting the C-domain was changed.

      Lines 530-531: This text is no longer correct. Rlm4 and Rlm3 are now known to be alleles of Rlm9. See: Haddadi, P., Larkan, N. J., Van deWouw, A., Zhang, Y., Neik, T. X., Beynon, E., ... & Borhan, M. H. (2022). Brassica napus genes Rlm4 and Rlm7, conferring resistance to Leptosphaeria maculans, are alleles of the Rlm9 wall‐associated kinase‐like resistance locus. Plant Biotechnology Journal, 20(7), 1229.

      We thank the reviewer for picking this up. This text has been updated.

      Line 553: Provide more information on what the PR1 signal peptide is.

      More information about the PR1 signal peptide has been added.

      Lines 767-781: Descriptions and naming conventions of proteins throughout the figure legend need to be consistent and better reflect their makeup. For example, I think it would be best to put the sequence range after each protein mentioned - e.g. Avr118-242 or Avr159-242 instead of Avr1, PSL1_C37S18-111 instead of PSL1_C37S, etc. Furthermore, it is often stated that a protein is full-length when it lacks a signal peptide - my thought is that if a proteins lack its signal peptide, it is not full-length. The acronym "PD" also needs to be spelled out as "pro-domain (PD)" in the figure legend.

      We have incorporated sequence range for proteins that were produced upon first use. Sequence ranges that were modelled in AlphaFold2 were not added in text because they can be found in Supplementary Table 3.

      Lines 853-845: It is stated the sizes of proteins are indicated above the chromatogram in S10 Fig, but this is not the case. It is also not clear from S10B Fig that the faint peaks correspond to the peaks in the Fig 4B chromatogram. In S10D Fig, the stick of C58S is difficult to see. Perhaps change the colour or use an arrow/asterisk?

      Protein size estimates have been added above the chromatogram. Added text to indicate that the faint peaks correspond to peaks in Fig 4B. Added an asterisk in S10D Fig to identify the location of C58.

      S14 Fig is not mentioned/referenced in the main text of the manuscript.

      This was a mistake and has been added.

      The reference list needs to be updated to accommodate those referenced bioRxiv preprints that have now been published in peer-reviewed journals.

      The reference list has been updated.

      Reviewer #3 (Recommendations For The Authors):

      It would be good to discuss whether the pro-domains affecting virulence or avirulence activity.

      Kex2, the protease that cleaves the pro-domain functions in the golgi. We therefore suspect that the pro-domain is removed prior to secretion. For recombinant protein production in E. coli we find that these pro-domains are necessary to obtain soluble protein (doi: 10.1111/nph.17516). As we require the pro-domain for protein production and can not completely removing them from our preps, we cannot perform experiments to test this and subsequently comment further. In a paper that identified SIX effectors in tomato utilising proteomics approach (https://bsppjournals.onlinelibrary.wiley.com/doi/10.1111/j.1364-3703.2007.00384.x), it appears that the pro-domains were not captured in this analysis. This supports the conclusion that they are not associated with the mature/secreted protein.

      The authors stated that the C-terminal domain of SIX6 has a single disulfide bond unique to SIX6. Please clarify in which context is it unique: in Fusarium or across all FOLD proteins?

      This is in direct comparison to Avr1 and Avr3. The disulfide in the C-domain of SIX6 is unique compared to Avr1 and Avr3. This has been made clear in text.

      The structural similarity of FOLD proteins to other known structures have been discussed (lines 460ff), but it is not clear whether all structures and models identified in this work would yield cysteine inhibitor and tumor necrosis factors as best structural matches in the database or whether this is specific to a single FOLD protein. Please consider discussing recently published findings by others (Teulet et al. 2023, New Phytologist) on this aspect.

      This analysis was performed for Avr1, we obtained relatively low similarity hits for Avr3/Six6. We have updated this text accordingly… “Unfortunately, the FOLD effectors share little overall structural similarity with known structures in the PDB outside of the similarity with each other. At a domain level, the N-domain of the FOLD effector Avr1 has some structural similarities with cystatin cysteine protease inhibitors (PDB code: 4N6V, PDB code: 5ZC1) [60, 61], and the C-domain with tumour necrosis factors (PDB code: 6X83) [62] and carbohydrate-binding lectins (PDB code: 2WQ4) [63]. Relatively weak hits were observed for Avr3/Six6.”

      It might be useful to clearly point out that the ToxA fold and the C-terminus of the FOLD fold are different.

      We have secondary structural topology maps of the FOLD and ToxA-like families in S8 Fig which highlight the differences in topology between these two families.

      Please add information to Fig.S8 listing the approach to generate the secondary structure topology maps.

      We have added this information in the figure caption.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors found that nifuroxazide has the potential to augment the efficacy of radiotherapy in HCC by reducing PD-L1 expression. This effect may be attributed to increased degradation of PD-L1 through the ubiquitination-proteasome pathway. The paper provides new ideas and insights to improve treatment effectiveness, however, there are additional points that could be addressed.

      • The paper highlights that the combination of nifuroxazide increases tumor cell apoptosis. A discussion regarding the potential crosstalk or regulatory mechanisms between apoptotic pathways and PD-L1 expression would be valuable.

      Response: Thank you very much for your suggestion. Research has shown that regulating the STAT3/PD-L1 pathway can effectively increase apoptosis in lung cancer cells (1). Our study confirmed that nifuroxazide can effectively inhibit the expression of p-STAT3 and PD-L1 in liver cancer cells, which may be the reason for the increased apoptosis of these cells. We have added relevant descriptions in the discussion.

      • The benefits and advantages of nifuroxazide combination could be compared to the current clinical treatment options.

      Response: Thank you greatly for your insightful feedback. The primary objective of this study is to explore whether nifuroxazide can effectively enhance the degradation of PD-L1, thereby increasing the radiosensitivity of HCC. Our research reveals that compared to radiation therapy alone, combination therapy involving nifuroxazide and radiation significantly inhibits tumor growth in mice and boosts the anti-tumor immune response. This finding could potentially provide a valuable strategy for patients who exhibit resistance to radiation therapy in clinical practice. Moreover, clinical trial investigations have demonstrated that nivolumab, a PD-1 monoclonal antibody, when combined with radiation therapy for HCC, exhibits promising safety and efficacy (2). This evidence supports the future application of nifuroxazide in the treatment of HCC. However, to reach this objective, we must continue to conduct extensive research, including comparing nifuroxazide with existing therapies in clinical practice. We believe that nifuroxazide not only significantly inhibits the expression of PD-L1 protein in HCC cells but also functions as a PD-L1 inhibitor. Furthermore, it effectively curbs the proliferation and migration of HCC cells, induces tumor cell apoptosis, and may exhibit enhanced anti-tumor effects, making it a promising candidate for clinical use. We have incorporated relevant discussion content in the article to address these points.

      Reviewer #2 (Public Review):

      Summary:

      Zhao et al. aimed to explore an important question - how to overcome the resistance of hepatocellular carcinoma cells to radiotherapy? Given that the immune-suppressive microenvironment is a major mechanism underlying resistance to radiotherapy, they reasoned that a drug that blocks the PD-1/PD-L1 pathway could improve the efficacy of radiation therapy and chose to investigate the effect of Nifuroxazide, an inhibitor of stat3 activation, on radiotherapy efficacy in treating hepatocellular carcinoma cells. From in vitro experiments, they find combination treatment (Nifuroxazide+ radiotherapy) increases apoptosis and reduces proliferation and migration, in comparison to radiotherapy alone. From in vivo experiments, they demonstrate that combined treatment reduces the size and weight of tumors in vivo and enhances mice survival. These data indicate a better efficacy of combination therapy compared to radiotherapy alone. Moreover, they also determined the effect of combination therapy on tumor microenvironment as well as peripheral immune response. They find that combination therapy increases infiltration of CD4+ and CD8+ cells as well as M1 macrophages in the tumor microenvironment. Interestingly, they find that the ratio of Treg cells in spleen is increased by radiotherapy but decreased by Nifuroxazide. Considering the immune-suppressive role of Treg cells, this finding is consistent with reduced tumor growth by combination therapy. However, it is unclear whether the combined therapy affects the ratio of Treg cells in the tumors or not. The most intriguing part of the study is the determination of the effect of Nifuroxazide on PD-L1 expression in the context of radiotherapy. Considering Nifuroxazide is a stat3 activation inhibitor and stat3 inhibition leads to reduced expression of PD-L1, one would expect Nifuroxazide decreases PD-L1 expression through stat3. However, they found that the effect of Nifuroxazide on PD-L1 is dependent on GSK3 mediated Proteasome pathways and independent of stat3, in the given experimental context. To determine the relevance to human hepatocellular carcinoma, they also measured the PD-L1 expression in human tumor tissues of HCC patients pre- and post-radiotherapy. The increased PD-L1 expression level in HCC after radiotherapy is impressive. However, it is unclear whether the patients being selected in the study had resistant disease to radiotherapy or not.

      Overall, the data are convincing and supportive to the conclusions.

      Strengths:

      1) Novel finding: Identified novel mechanism underlying the effect of Nifuroxazide on PD-L1 expression in hepatocellular carcinoma cells in the context of radiotherapy.

      2) Comprehensive experimental approaches: using different approaches to prove the same finding. For example, in Fig 4, both IHC and WB were used. In Fig 5, both IF and WB were used.

      3) Human disease relevance: Compared observations in mice with human tumor samples.

      The question in the summary, “However, it is unclear whether the combined therapy affects the ratio of Treg cells in the tumors or not”.

      Response: Thank you very much for your valuable feedback. We have included additional flow cytometry results regarding the expression of relevant Treg cells (CD4+CD25+Foxp3+ T lymphocytes) in tumor tissues (Supplementary Fig 2). Our findings indicate that the number of Treg cells in tumor tissues significantly decreased following combination therapy with nifuroxazide and radiotherapy.

      The question in the summary, “However, it is unclear whether the patients being selected in the study had resistant disease to radiotherapy or not”.

      Response: Thank you very much for your valuable feedback. All the HCC patients selected in this study experienced recurrence after radiation treatment.

      Weaknesses:

      1) It is hard to tell whether the observed phenotype and mechanism are generic or specific to the limited cell lines used in the study. The in vitro experiments were performed in one human cell line and the in vivo experiments were performed in one mouse cell line.

      Response: Thank you very much for your feedback. We have included additional experimental data from another human cell line Huh7 (Supplementary Fig 3).

      2) The study did not distinguish the effect of increased radiosensitivity by nifuroxazide from combined anti-tumor effects by two different treatments.

      Response: Thank you greatly for your insightful feedback. In this study, we primarily compared the antitumor effects of nifuroxazide combined with radiotherapy versus either nifuroxazide or radiotherapy alone, and confirmed that the combined treatment demonstrated a more potent anti-hepatocellular carcinoma effect compared to single therapy. Furthermore, to achieve the goal of utilizing nifuroxazide for the treatment of clinical hepatocellular carcinoma, additional research is necessary, including comparisons with other clinically established therapies. We have also incorporated relevant discussions in our analysis.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors embarked on an exploration of how nifuroxazide could enhance the responsiveness to radiotherapy by employing both an in vitro cell culture system and an in vivo mouse tumor model.

      Strengths:

      The researchers conducted an array of experiments aimed at revealing the function of nifuroxazide in aiding the radiotherapy-induced reduction of proliferation, migration, and invasion of HepG2 cells.

      Weaknesses:

      The authors did not provide the molecular mechanism through which nifuroxazide collaborates with radiotherapy to effectively curtail the proliferation, migration, and invasion of HCC cells. Moreover, the evidence supporting the assertion that nifuroxazide contributes to the degradation of radiotherapy-induced upregulation of PD-L1 via the ubiquitin-proteasome pathway appears to be insufficient. Importantly, further validation of this discovery should involve the utilization of an additional syngeneic mouse HCC tumor model or an orthotopic HCC tumor model.

      Response: Thank you very much for your insightful comments. Nifuroxazide has been demonstrated to inhibit the expression of p-STAT3, thereby suppressing tumor cell proliferation and migration (3, 4). In our study, we observed that after 48 hours of treatment with Nifuroxazide, the expression of p-STAT3 in irradiated cells was significantly inhibited. Furthermore, compared to radiation alone, combined Nifuroxazide and radiotherapy resulted in a more pronounced decrease in PCNA expression. Simultaneously, we performed additional detection of migration-related protein MMP2 expression (revised Fig 2B), confirming that combined Nifuroxazide and radiotherapy led to a more significant inhibition of MMP2 expression. These findings suggest that the combined treatment may be responsible for the synergistic suppression of HCC cell proliferation and migration. We have included relevant discussions in our manuscript.

      Our initial results indicate that Nifuroxazide inhibits the expression of PD-L1 at the protein level, but does not affect its mRNA level. Interestingly, upon treatment with a proteasome inhibitor MG132, the inhibitory effect of Nifuroxazide on PD-L1 was eliminated, suggesting that Nifuroxazide may enhance the degradation of PD-L1 protein. Our experiments have demonstrated the inhibitory effect of Nifuroxazide on PD-L1 in both human and mouse cell lines. However, to translate these findings into clinical application for the treatment of hepatocellular carcinoma, additional research is necessary, including validation in genetically engineered mouse models of HCC. We have addressed these points in the discussion section of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) Please improve the quality of Figure 3E. It is hard to figure out the bar and details.

      Response: Thank you for your valuable feedback. We have meticulously revised the figures to enhance their clarity and presentation (revised Fig 3E).

      2) In Figure 7E, please elucidate the methods used for calculating the amount of PD-L1 mRNA level. Please adjust the picture angle and label the marker size on the left as well

      Response: Thank you for your feedback. We have incorporated a method for calculating PD-L1 mRNA levels and revised the corresponding figures accordingly (revised Fig 7E).

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      1) What is the advantage of using a combination of nifuroxazide and radiotherapy in comparison to using a combination of anti-PD1/PDL1 and radiotherapy?

      Response: Thank you very much for your insightful comments. We believe that the advantage of nifuroxazide over PD-1 or PD-L1 antibodies lies in its ability not only to effectively inhibit PD-L1 expression but also to suppress tumor cell proliferation, migration, and promote cell apoptosis (Supplementary Fig 1). We have also expanded on these aspects in the discussion section of the manuscript.

      2) For the characterization of tumor microenvironment and immune cells in the spleen, were the same cell populations being investigated? What about NK and Treg cells in tumors? What about M1 macrophages in spleen?

      Response: Thank you very much for your insightful suggestion. We have measured the infiltration of NK and Treg cells in tumor tissues (Supplementary Fig 2), as well as the abundance of M1 macrophages (revised Fig 6) in the spleen, and provided additional relevant data to strengthen our study.

      Other comments:

      1) The data in Fig 1 is solid. However, it is hard to distinguish the effect of increased radiosensitivity by nifuroxazide from combined anti-tumor effects by two different treatments. The anti-tumor role of Nifuroxazide has been reported in melanoma, colorectal carcinoma, and hepatocellular carcinoma previously (PMID: 26830149; 28055016, 26154152). Therefore, the increased apoptosis and decreased proliferation and migration could be caused by nifuroxazide and not related to the sensitivity of cells to radiation therapy.

      Response: Thank you very much for your constructive feedback. As you suggested, the anti-tumor role of nifuroxazide has been reported. However, the innovation of our study does not lie in confirming its antitumor effects but rather in demonstrating how nifuroxazide can enhance radiotherapy's efficacy in treating hepatocellular carcinoma by inhibiting PD-L1 levels.

      We compared the efficacy of combined therapy versus radiotherapy and found that compared to radiation alone, combined therapy more significantly inhibited hepatocellular carcinoma cell proliferation and migration. In our animal model, we compared the therapeutic effects of combined therapy, nifuroxazide, and radiotherapy on hepatocellular carcinoma-bearing mice. We observed that compared to individual treatment groups, combined therapy more profoundly suppressed tumor growth and enhanced the antitumor effects in the mice.

      In response to your feedback, we have expanded the discussion on the impact of combined therapy versus nifuroxazide or radiotherapy on hepatocellular carcinoma cell proliferation, migration, and apoptosis (Supplementary Fig 1). The data show that compared to either individual therapy, combined therapy further inhibited cell proliferation and migration while promoting apoptosis.

      2) There is no direct evidence to show the improved efficacy of radiation therapy by nifuroxazide through the degradation of PD-L1.

      Response: Thank you very much for your valuable suggestions. In our cell experiments, we found that nifuroxazide inhibits the increased expression of PD-L1 in cells induced by radiation therapy, and this inhibitory effect is counteracted when using the proteasome inhibitor MG132. Therefore, we speculate that nifuroxazide may inhibit PD-L1 expression through a proteasome-dependent mechanism. To better reflect this, we have revised the title of our manuscript to "Nifuroxazide Suppresses PD-L1 Expression and Enhances the Efficacy of Radiotherapy in Hepatocellular Carcinoma."

      3) "The oncogene Stat3.....was effectively inhibited by radiotherapy in cells" - this sentence may be rephrased to make the point clear. The authors might mean to say "activation of the oncogene stat3...."

      "The results demonstrated that the combination therapy increased the expression of PARP," the authors might mean to say "expression of c-PARP"

      Response: Thank you very much for your feedback. We have revised the relevant sentence descriptions to improve clarity and accuracy.

      4) "histomorphology significantly improved after the treatment with nifuroxazide and radiation therapy (Fig 3E)." How to define "improved histomorphology"? The authors may want to provide more details to clarify "improved".

      Response: Thank you very much for your feedback. We have revised the relevant sentence descriptions to improve clarity and accuracy.

      5) In addition to normalizing protein expression by tubulin, the authors may consider normalizing p-stat3 expression level by stat3.

      Response: Thank you very much for your feedback. We have conducted a quantitative analysis of the expression levels of p-STAT3 and STAT3 (revised Fig 2A).

      6) Figure 3C and D, using a different color to represent each group might help the readers to better differentiate each group.

      Response: Thank you very much for your feedback. Following your suggestion, we have revised the figures accordingly (revised Fig 3C and 3D).

      Reviewer #3 (Recommendations For The Authors):

      In this study, the authors revealed the pivotal role of nifuroxazide in augmenting the efficacy of radiotherapy. This was evidenced by its synergistic effect in suppressing the proliferation and migratory capabilities of HCC cells, alongside its capacity to induce apoptosis in these cells. Furthermore, their findings underscored the substantial synergy between nifuroxazide and radiotherapy in retarding tumor growth, thereby extending survival rates in a tumor-bearing murine model. Moreover, the authors observed that nifuroxazide combined with radiotherapy significantly increases the tumor-infiltrating CD4+ T cells, CD8+ T cells, and M1 macrophages. Finally, the authors found that nifuroxazide countered the radiotherapy-induced upregulation of PD-L1 through the ubiquitin-proteasome pathway. However, the evidence for supporting the main claims is only partially supported. The following are my concerns and suggestions.

      1) In Figures 1 and 2, the authors convincingly demonstrate the synergistic impact of nifuroxazide and radiotherapy on curtailing the proliferation, colony formation, and migratory capabilities of HCC cells, while also instigating apoptosis in these cells. However, the underlying molecular mechanism remains elusive. A recent study highlighted nifuroxazide's potential to impede the proliferation of glioblastoma cells and induce apoptosis via the MAP3K1/JAK2/STAT3 pathway (Wang X., et al., Int Immunopharmacol. 2023 May;118:109987. doi: 10.1016/j.intimp.2023.109987). It would be valuable for the authors to investigate whether nifuroxazide employs a similar molecular mechanism to regulate proliferation and apoptosis in the context of HCC. This could offer deeper insights into the mechanisms at play in their observed effects.

      Response: Thank you very much for your insightful comments. As you pointed out, previous studies have reported that nifuroxazide exerts antitumor effects by inhibiting the STAT3 pathway. However, in our experiments, we observed that radiation therapy significantly increased the expression of PD-L1, but showed a trend of decreased p-STAT3 expression. Therefore, we believe that nifuroxazide does not inhibit PD-L1 expression through the STAT3 pathway. Subsequently, our further research revealed that the inhibitory effect of nifuroxazide on PD-L1 can be counteracted by a proteasome inhibitor. Thus, we propose that nifuroxazide inhibits PD-L1 expression through a proteasome-dependent mechanism, thereby enhancing the efficacy of radiation therapy in hepatocellular carcinoma.

      2) Figures 1 and 2 solely rely on the HepG2 cell line to establish their conclusions. To validate these findings robustly, it is recommended that another HCC cell line be included in the study. This additional cell line will contribute to the generalizability and reliability of the results, enhancing the overall credibility of the study's conclusions.

      Response: Thank you very much for your suggestion. We have included additional experimental results with the relevant cell line Huh7 (supplementary Fig 3).

      3) Figure 3 demonstrates the use of only one syngeneic mouse H22 tumor model. To ensure the robustness and validity of this finding, it would be advisable to incorporate at least one more syngeneic mouse HCC tumor model or even an orthotopic mouse tumor model. The inclusion of additional models would bolster the significance and reliability of the observed results, contributing to a more comprehensive understanding of the phenomenon under investigation.

      Response: Thank you for your valuable suggestion. In the H22 mouse tumor model, we conducted relevant assessments of survival rate and tumor growth. The results confirm that the combination of nifuroxazide and radiation therapy exhibits a promising synergistic antitumor effect. However, to achieve the goal of applying nifuroxazide combined with radiation therapy for the treatment of clinical hepatocellular carcinoma, we still need to undertake extensive research, including validation on genetically identical mouse HCC tumor models. We have also included relevant discussions in our ongoing discussions.

      4) In Figure 5, employing an alternative method, such as the flow cytometry assay, to analyze and corroborate the tumor-infiltrating immune cell profiling following various treatments would enhance the rigor of the study. This additional approach would provide a complementary perspective and validate the findings, strengthening the overall reliability and impact of the results presented.

      Response: Thank you for your insightful suggestion. We have included additional experimental data to strengthen our study (supplementary Fig 2).

      5) In Figure 7, the conclusion drawn regarding nifuroxazide's impact on PD-L1 expression through ubiquitination-proteasome mechanisms seems to lack the robust evidence needed to firmly establish nifuroxazide's role in regulating PD-L1 ubiquitination. To reinforce this aspect of the study, the authors may conduct comprehensive in vitro and in vivo ubiquitination assays. Performing these assays would offer direct insights into whether nifuroxazide genuinely influences PD-L1 ubiquitination, thus fortifying the credibility and importance of the reported findings.

      Response: Thank you for your valuable feedback. Our initial findings suggest that nifuroxazide inhibits the expression of PD-L1 protein levels, but does not affect the mRNA levels. Moreover, upon treatment with the proteasome inhibitor MG132, the inhibitory effect of nifuroxazide on PD-L1 was found to be abolished. Concurrently, we observed that nifuroxazide significantly enhances GSK-3β expression in both cell and animal experiments. Consequently, we propose that nifuroxazide augments the degradation of PD-L1 protein.

      6) Statistical methods should be included in the captions of all the figures with statistical graphs. The size of the scale should be supplemented with a description in the captions.

      Response: Thank you for your valuable suggestion. We have made the appropriate modifications to our study based on your recommendations.

      7) Considering the outcomes presented in the study, it appears that the title "Nifuroxazide enhances radiotherapy efficacy against hepatocellular carcinoma by upregulating PD-L1 degradation via the ubiquitin-proteasome pathway" may not accurately reflect the findings.

      Response: Thank you for your insightful feedback. We have revised the title to read, "Inhibitory Effects of Nifuroxazide on PD-L1 Expression and Enhanced Radiotherapy Efficacy in Hepatocellular Carcinoma".

      References

      1) Xie C, Zhou X, Liang C, Li X, Ge M, Chen Y, et al. Apatinib triggers autophagic and apoptotic cell death via VEGFR2/STAT3/PD-L1 and ROS/Nrf2/p62 signaling in lung cancer. Journal of experimental & clinical cancer research : CR. 2021;40(1):266. doi: 10.1186/s13046-021-02069-4.

      2) de la Torre-Alaez M, Matilla A, Varela M, Inarrairaegui M, Reig M, Lledo JL, et al. Nivolumab after selective internal radiation therapy for the treatment of hepatocellular carcinoma: a phase 2, single-arm study. Journal for immunotherapy of cancer. 2022;10(11). doi: 10.1136/jitc-2022-005457.

      3) Yang F, Hu M, Lei Q, Xia Y, Zhu Y, Song X, et al. Nifuroxazide induces apoptosis and impairs pulmonary metastasis in breast cancer model. Cell Death Dis. 2015;6(3):e1701. doi: 10.1038/cddis.2015.63.

      4) Nelson EA, Walker SR, Kepich A, Gashin LB, Hideshima T, Ikeda H, et al. Nifuroxazide inhibits survival of multiple myeloma cells by directly inhibiting STAT3. Blood. 2008;112(13):5095-102. doi: 10.1182/blood-2007-12-129718.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript aimed at elucidating the substrate specificity of two M23 endopeptidase Lysostaphin (LSS) and LytM in S. aureus. Endopeptidases are known to cleave the glycine-bridges of staphylococcal cell wall peptidoglycan (PG). To address this question, various glycine-bridge peptides were synthesized as substrates, the catalytic domain of LSS and LytM were recombinantly expressed and purified, and the reactions were analyzed using solution-state NMR. The major finding is that LytM is not only a Gly-Gly endopeptidase, but also cleaves D-Ala-Gly. Technically, the advantage of using real-time NMR was emphasized in the manuscript. The study explores an interesting aspect of cell wall hydrolases in terms of substrate-level regulation. It potentially identified new enzymatic activity of LytM. However, the biological significance and relevance of the conclusions remain clear, as the results are mostly from synthetic substrates.

      Strengths:

      The study explores an interesting aspect of cell wall hydrolases in terms of substrate-level regulation. It potentially identified new enzymatic activity of LytM.

      Weaknesses:

      1) Significance: while the current study provided a detailed analysis of various substrates, the conclusions are mainly based on synthesized peptides. One experiment used purified muropeptides (Fig. 3H); however, the results were unclear from this figure.

      We acknowledge the Reviewer for comments and concerns regarding the potential weaknesses of this study.

      Because peptidoglycan is insoluble, as such it is not amenable to solution-state NMR studies. However, soluble peptidoglycan (PG) fragments for NMR analyses can be obtained by digesting bacterial sacculi or via chemical synthesis. Whereas digestion results in mixtures of products, synthesis yields pure molecules. Analysis of NMR spectra of muropeptide-mimicking synthetic peptides before and after enzyme addition provides tools to identify peaks in the much more complex spectra of mutanolysin-treated sacculus.

      We will improve data presentation in Figure 3H in the revised version of our manuscript and emphasize the similarity of product peaks in spectra acquired from experiments using either synthetic peptides or mutanolysin-digested sacculus.

      The results from synthesized peptides may not necessarily correlate with their biological functions in vivo.

      The Reviewer refers several times to the use of synthetic peptides in this study. While it is unclear to us whether the concern is about the synthetic nature of the molecules or because the peptides are devoid of PG disaccharide units, it is true that PG fragments lack the 3D architecture present in intact sacculus, and thus cannot perfectly mimic the in vivo milieu. The fragments, as well as purified sacculus, also lack all other components present in an intact bacterial cell wall. Our largest synthetic peptide (7), however, represents a crosslinked muropeptide (stem-pentaGly-stem) which according to the structural model recently presented by Razew et al. (2023) (Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706) is large enough to cover the peptidic interaction interface between substrate and enzyme.

      Secondly, the study used only the catalytic domain of both proteins. It is known that the substrate specificity of these enzymes is regulated by their substrate-binding domains. There is no mention of other domains in the manuscript and no justification of why only the catalytic domain was studied. In short, the relevance of the results from the current study to the enzymes' actual physiological functions remains to be addressed, which attenuated the significance of the study.

      Lysostaphin catalytic domain was used for experimental simplicity and to allow direct comparison with LytM catalytic domain. Because lysostaphin cell-wall targeting (SH3b) domain interacts with the substrate with variable affinities depending on the substrate structure (Tossavainen et al., Structural and functional insights into lysostaphin-substrate interaction, Front. Mol. Biosci. 5, 60 (2018) and Gonzalez-Delgado et al., Two-site recognition of Staphylococcus aureus peptidoglycan by lysostaphin SH3b, Nat. Chem. Biol. 16, 24-30 (2020)), we would have had skewed results on kinetics because of this interaction.

      Catalytic domains were used also in the article by Razew et al. (Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706 (2023)). They showed that mature lysostaphin and lysostaphin catalytic domain hydrolysed the same Gly-Gly bonds.

      Moreover, full-length LytM is catalytically inactive. This is because the linker between its N-terminal and catalytic domains occludes the catalytic site (Odintsov et al. Latent LytM at 1.3 Å resolution. J. Mol. Biol. 225, 775 (2004)). LytM catalytic domain without its N-terminal segment is active (Odintsov et al (2004) and Firczuk et al. Crystal structure of active LytM. J. Mol. Biol 354, 578 (2005)).

      2) Impact and novelty:

      (1) the current study provided evidence suggesting the novel function of LytM in cleaving D-Ala-Gly. The impact of this finding is unclear. The manuscript discussed Enterococcus faecalis EnpA. But how about other M23 endopeptidases? What is biological relevance?

      EnpA was specifically mentioned because it has been reported to also cleave the D-Ala-Gly bond. Structural similarities between the enzymes could reveal the basis for this bond specificity. Moreover, the focus of the study was not to reveal the biological function of LytM but rather to understand which amino acid substitutions lead to differences in specificities in the two structurally very similar enzymes.

      (2) A very similar study published recently showed that the activity of LSS and LytM is regulated by PG cross-linking: LSS cleaves more cross-linked PG and LytM cleaves less cross-linked PG (Razew, A., Laguri, C., Vallet, A., et al. Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706 (2023). The results of this paper are different from the current study whereby both LSS and LytM prefer cross-linked substrates (Fig, 2JKL). Moreover, no D-Ala-Gly cleavage was observed by LytM using purified PG substrate from Razew A et al. An explanation of inconsistent results is needed here. In my opinion, the knowledge generated from the current study has not been fully settled. If the results can be validated, the contribution to the field is incremental, but not substantial.

      Another point raised by the Reviewer concerned the inconsistent results between our study and the recent paper by Razew et al. (2023) regarding LytM D-Ala--Gly cleavage. The explanation might lie in the type of NMR data acquired and its interpretation. We identified all hydrolysis products using 1H, 13C multiple bond correlation NMR spectra acquired from samples dissolved in deuterated buffers. Use of C-H signals is advantageous in that they are not prone to chemical exchange phenomena and enable unambiguous chemical shift assignment. Based on shown NMR spectra, Razew and co-workers identified cleaved muropeptide bonds by observing product glycine peaks in 1H, 15N correlation spectra, specifically amide peaks of product C-terminal glycines appearing in the 114-117 ppm 15N region of spectra of samples treated with LytM/LSS. D-Ala--Gly cleavage, however produces an N-terminal glycine, whose signal due to chemical exchange is not typically observed in regular N,H correlation spectra. Razew and co-workers validated their observations with UPLC-MS analysis. However, to our understanding, their data analysis was based on the assumption that LytM cleaves between Gly4-Gly5 (or Gly1-Gly2 using our numbering), and accordingly only masses corresponding to potential products containing 1 to 4 glycines anchored to the lysine side chain were considered.

      (3) The authors emphasized a few times in the text that it is superior to use NMR technology. In my opinion, NMR has certain advantages, such as measuring the efficacy of cleavage, but it is not that superior. It should be complementary to other methods such as mass spectrometry. In addition, more relevant solid-state NMR using intact PG or bacterial cells was not discussed in the study. I am of the opinion that the corresponding text should be revised.

      We value and agree with the Reviewer’s opinion that NMR spectroscopy is complementary to other methods e.g., mass spectrometry. However, in this particular case, NMR provided simultaneously information on reaction kinetics as well as scissile bonds in the substrates, which allowed us to compare rates of hydrolysis in different PG fragments and reshape the substrate specificities of LytM/LSS. We also agree that solid-state NMR is a wonderful technique. In our revised manuscript, we will edit the text accordingly.

      3) The conclusions are not fully supported by the data

      As mentioned above, the conclusions from synthesized peptide substrates may not necessarily reveal physiological functions. The conclusions need to be validated by more physiological substrates.

      As pointed out above in our response to the potential weaknesses of this study, the aim of this work was not to reveal the physiological function of LytM but to glean information on its substrate specificity that echoes its functional role in a substrate level. Hitherto LytM has been shown to cleave amide bonds between glycines without providing detailed information about the specific scissile bonds in the established PG components in S. aureus cell wall. The same holds true for lysostaphin as well. This study provides concomitantly information on the rates of hydrolysis and scissile bonds of these two enzymes. We deduced that LytM, and especially lysostaphin substrate specificity is defined by D-Ala-Gly cross-linking, which is a structural property, whereas Razew et al. (2023) discuss about “more cross-linked” and “less cross-linked PG”, which is a supramolecular asset or density.

      4) There are some issues with the presentation of the figures, text, and formatting.

      We are grateful to the Reviewer for bringing up issues in figures and text. We will address these in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates the enzymatic properties of lysostaphin (LSS) and LytM, two enzymes produced by Staphylococcus aureus and previously described as glycyl-glycyl endopeptidases. The authors use synthetic peptide substrates mimicking peptidoglycan fragments to determine the substrate specificity of both enzymes and identify the bonds they cleave.

      Strengths:

      • This work is addressing a real gap in our knowledge since very little information is available about the substrate specificity of peptidoglycan hydrolases.

      • The experimental strategy and its implementation are robust and provide a thorough analysis of LSS and LytM enzymatic activities. The results are very convincing and demonstrate that the enzymatic properties of the model enzymes studied need to be revisited.

      Weaknesses:

      • The manuscript is difficult to read in places and some figures are not always presented in a way that is easy to follow. This being said, the authors have made a good effort to present their experiments in an engaging manner. Some recommendations have been made to improve the current manuscript but these remain minor issues.

      We thank the Reviewer for providing positive feedback on our manuscript and for appreciating the systematic work behind this study which aims to unknot the substrate specificity of two S. aureus PG hydrolyzing enzymes. We are grateful for the comments aiming to improve the presentation of the current version of manuscript and we will take these into account while preparing the revised version of the manuscript.

    1. Author response

      eLife assessment

      Using a genetically controlled experimental setting, the authors find that the lack of Polycomb-dependent epigenetic programming in the oocyte and early embryo influences the developmental trajectory through gestation in the mouse. By showing a two-phase outcome of early growth restriction followed by enhancement, the authors address previous inconsistencies in the field. However, the link with placenta function and gene misregulation is not yet fully supported.

      We thank the Reviewers for their constructive comments. In response we have added significantly more data to the study and substantially rewritten the manuscript. New data include analyses of glucose, amino acid and metabolite levels in fetal and maternal blood samples, more highly resolved fetal growth analyses, a more detailed study of the hyperplastic placenta including IF analyses of labyrinth area, labyrinth to placenta and capillary to labyrinth ratios. We have also added analyses of placental DNA methylation state in offspring from oocytes lacking EED, which reveals a range of DNA methylation changes at imprinted and non-imprinted genes in HET-hom offspring compared to HET-het or WT-wt controls.

      Reviewer #1 (Public Review):

      Oberin, Petautschnig et. al investigated the developmental phenotypes that resulted from oocyte-specific loss of the EED (Embryonic Ectoderm Development) gene - a core component of the Polycomb repressive complex 2 (PRC2), which possess histone methyltransferase activity and catalyses trimethylation of histone H3 at lysine 27 (H3K27). The PRC2 complex plays essential roles in regulating chromatin structure, being an important regulator of cellular differentiation and development during embryogenesis. As novel findings, the authors find that PRC2-dependent programming in the oocyte, via loss of the core component EE2, causes placental hyperplasia and propose that the increase of placental transplacental flux of nutrients leads to fetal and postnatal overgrowth. At the mechanistic level, they show altered expression of genes previously implicated in placental hyperplasia phenotypes. They also establish interesting parallelism with the placental hyperplasia phenotype that is frequently observed in cloned mice.

      Strengths:

      The mouse breeding experiments are very well designed and are powerful to exclude potential confounding genetic effects on the developmental phenotypes that resulted from the loss of EED in oocytes. Another major strength is the developmental profiling across gestation, from pre-implantation to late gestation.

      Weaknesses:

      The evidence for 'oocyte' programming is restricted to phenotypic and gene expression analysis, without measurements of epigenetic dysregulation. It would be an added value if the authors could show evidence for altered H3K27me3 or DNA methylation in the placenta, for example.

      In an earlier previous study we identified a large number of developmentally important genes that accumulated H3K27me3 in primary-secondary stage growing oocytes and were repressed by EED (Jarred et al., 2022 Clinical Epigenetics). However, H3K27me3 was removed from all from these genes during preimplantation development, indicating that maternal inheritance of H3K27me3 at a wide range of genes is unlikely (Jarred et al., 2022 Clinical Epigenetics). Consistent with this only a small number of genes, including Slc38a4 and C2MC, have been shown to be functionally important in H3K27me3-dependent imprinting (Matoba et al., 2022 Genes and Development). Moreover, a related study showed that deletion of Setd2 and consequent loss of H3K36me3 in oocytes led to spreading of H3K27me3 into regions that were otherwise marked by H3K36me3 and DNA methylation (Xu et al. 2019 Nature Genetics 51:844–56). Based on these studies, we proposed that loss of EED and H3K27me3 may result in the ectopic spreading of H3K36me3 and DNA methylation in oocytes and that altered DNA methylation may then be transmitted to offspring and affect developmental outcomes (Jarred et al., 2022 Clinical Epigenetics)

      Given this hypothesis we analysed DNA methylation rather than H3K27me3 in the placenta of WT-wt, HET- het and HET-hom offspring. This revealed differentially methylated regions (DMRs) in HET-hom placentas at two H3K27me3 imprinted genes Sfmbt2 (C2MC) and Mbnl2, five classically imprinted genes and at 74 DMRs not associated with imprinted loci. Together, our data supports the hypothesis from Jarred et al., 2022 Clinical Epigenetics that loss of EED in oocytes results in altered DNA methylation patterning at both imprinted and non-imprinted genes in offspring and that this is likely to affect offspring growth and development. However, whether these changes result from direct alteration of DNA methylation in oocytes remains unclear.

      These new data are now included in results (Lines 387-409), Figure 6I, Supplementary File H-J and Discussion Lines 569-581.

      Reviewer Comment 1. The claim that placental hyperplasia drives offspring catch-up growth is not supported by current experimental data. The authors do not address if transplacental flux is increased in the hyperplastic placentae, measure amino acids and glucose in fetal/maternal plasma, or perform tetraploid rescue experiments to ascertain the contribution of the placenta to growth phenotypes. Furthermore, it is unclear, from the current data, if the surface area for nutrient transport is actually increased in the hyperplastic placenta and the extent to which other cell populations (i.e. spongiotrophoblasts) are affected in addition to glycogen cells. In addition, one of the supporting conclusions that the placenta is a key contributor to fetal overgrowth is based on a very crude measurement - placenta efficiency - which the authors claim is increased in the homozygous mutants compared to controls. After analysing the data carefully, I find evidence for decreased placental efficiency instead. I believe that the authors mistakenly present the data as placenta to fetal weight ratios, which led to the misinterpretation of the 'efficiency' concept.

      We thank the reviewer for pointing out our error in the placental efficiency data and we have now corrected the placental efficiency graphs (fetal/placental weight ratios) and updated the text throughout the manuscript as required (Figure 3I-K). As requested and described below, we have also added significantly more data, which support the conclusion that placental function is not enhanced in HET-hom mice and is unlikely to support fetal growth recovery.

      The new data and analyses we have added include:

      1. Further analyses of glycogen-enriched and non-glycogen-enriched cell counts in the decidua and junctional zones (Figure 4F-J)

      2. Total glycogen cell counts for male and female placentas (Figure 4 – figure supplement 1F)

      3. New analyses of fetal blood glucose levels at E17.5 and E18.5 and matching data from the mothers of each litter (Figure 4M)

      4. New analyses of the circulating amino acid levels and metabolites in fetal blood of E17.5 offspring and matching data from the mothers of each litter (Figure 8)

      5. New IF analyses of CD31 (PECAM-1) and combined this with machine learning assisted quantitative analyses of labyrinth and capillary areas using HALO (Figure 5)

      6. Separated male and female offspring and placental weights at E14.5 and E17.5 and total areas of the placenta, decidua, junctional zone and labyrinth (Figure 3 – figure supplement 1) which provide more insight into potential sex-specific differences in HET-hom offspring and placenta

      We have significantly re-written the results and discussion to reflect our new data and interpretation.

      While we did not assess transplacental flux, our new data revealed: 1. HET-hom fetuses had lower blood glucose levels at E18.5; 2. Circulating levels of amino acids and a wide range of metabolites did not differ between HET-hom and control offspring, or between the mothers of these offspring; 3. HET-hom placentas had lower total labyrinth area, labyrinth/placenta and capillary/labyrinth ratios based on analysis of total capillary and labyrinth areas, indicating that the surface area for nutrient transfer is not increased

      Together these data strongly indicate that hyperplastic HET-hom placentas do not provide greater support to HET-hom fetuses than controls, and that increased placental function in HET-hom offspring is unlikely to explain the late gestation fetal growth recovery we observed in HET-hom offspring or how HET-hom offspring were able to attain normal weights by birth.

      While we have not directly counted the spongiotrophoblast populations, we have now included analyses of both the glycogen-enriched and non-glycogen cell populations in the junctional zone and the decidua (Figure 4H-K). This revealed an increased area of both glycogen-enriched and non-glycogen cells in the junctional zone and in the decidua of HET-hom placentas, consistent with the greater junctional zone/placenta ratio observed in HET-hom placentas (Figure 4D). Together with data in Figure 4C-F and Supp. Fig. 3, our observations demonstrate that the overall decidua and junctional zone areas were increased in HET-hom offspring, but there was a disproportionate expansion of the junctional zone that was caused by increased areas of both glycogen and non-glycogen-enriched cells.

      Tetraploid rescue experiments would require a very significant amount of time and investment and are technically very demanding. While creation of complementary tetraploid offspring would be informative, unfortunately these experiments are beyond the scope of this current study.

      Reviewer Comment 1 cont. The authors do not mention alternative explanations for the observed fetal catch-up and postnatal overgrowth. Why would oocyte epigenetic programming effects be restricted to the placenta, and not include fetal organs?

      Our intention was certainly not to convey a message that effects may be placenta specific. Indeed, our ongoing work beyond the scope of this study provides evidence for effects in other tissues (brain and bones) that will be published elsewhere. Our new data clearly show low placental efficiency, fetal blood glucose, low capillary/labyrinth ratio and no impact on circulating fetal amino acid or metabolite levels in HET-hom offspring. In light of these new data, we have reinterpreted the findings of this study and substantially updated the discussion.

      Given our observations that fetal growth rate markedly increased during late gestation, but placental efficiency was reduced, our data strongly indicate that the effects of altered epigenetic oocyte programming due to loss of Eed affect both the placenta and the fetus. While our findings are significant, the precise mechanism underlying this growth response in HET-hom fetuses remains unknown. Understanding this mechanism will require substantially more work that will be the subject of future studies.

      Reviewer #2 (Public Review):

      Consistent fetal growth trajectories are vital for survival and later life health. The authors utilise an elegant and novel animal model to tease apart the role of Eed protein in the female germline from the role of somatic Eed. The authors were able to experimentally attribute placental overgrowth - particularly of the endocrine region of the placenta - to the function of Eed protein in the oocyte. Loss of Eed protein in the oocyte was also associated with dynamic changes in fetal growth and prolonged gestation. It was not determined whether the reported catch-up growth apparent on the day of birth was due to enhanced fetal growth very late in gestation, a longer gestational time ie the P0 pups are effectively one day "older" compared to the controls, or the pups catching up after birth when consuming maternal milk.

      To understand if increased growth occurred in HET-hom fetuses prior to birth, we have now included analyses of offspring weight at E18.5 (Figure 2F), all pups collected with a verified E19.5 birth date (Figure 2J) and for pups from similar litter sizes (5-7 pups) at E19.5 (Figure 2K). Together with our existing data, these additional analyses provide average weights for fetuses at E14.5, E17.5, E18.5 and pups born on E19.5. This confirmed that HET-hom offspring undergo enhanced growth in the last few days of pregnancy, resulting in the progression of substantially growth and developmentally restricted HET-hom fetuses at E14.5, to pups with normal weight at birth within the 40% of pregnancies that were born on E19.5 in a normal gestational time.

      However, in addition, gestational length was increased by one to two days in 60% of pregnancies from hom oocytes, but not in control pregnancies from het or wt oocytes. As average weights were significantly greater in all surviving HET-hom offspring at P0 (i.e. surviving pups born on E19.5-E21.5; Figure 2G), it appears that this additional gestational time contributed to the offspring overgrowth. This is logical, however it does not explain how growth and developmentally delayed fetuses at E14.5 attained normal weight and developmental stage by E19.5 (Figure 2J-K).

      Together our data clearly show that HET-hom offspring undergo enhanced growth during the late stages of pregnancy, allowing them to resolve the developmental delay and growth insufficiency observed at E14.5 so that they were born at normal weight and stage at E19.5. In addition, increased gestational time contributes to weight of pups delivered on E20.5 or 21.5, partly explaining the overgrowth phenotype observed in this model.

      The idea that increased milk consumption may explain the overgrowth of HET-hom offspring is interesting. It is possible that the increased growth rate of HET-hom offspring continues after birth and contributes to overgrowth. However, examining this outcome in a tightly controlled manner is complicated given that we cannot predict the day of birth of HET-hom litters, and that these litters are generally small and would need to be fostered on the day of birth alongside control litters. Given these challenges and that our primary observation is that HET-hom offspring underwent fetal growth recovery during pregnancies of normal length and via extension of gestational length, we have not examined the possibility of increased milk consumption after birth.

      We have updated the results to reflect the new analyses and have provided relevant discussion to address these data. Our description of these data can be found in Results (lines 165-197) and in Figure 2.

      Reviewer #3 (Public Review):

      My understanding of the main claims of the paper, and how they are justified by the data are discussed below:

      Overall, loss of PRC2 function in the developing oocyte and early embryo causes:

      1) Growth restriction from at least the blastocyst stage with low cell counts and midgestational developmental delay.

      Strengths:

      • Live embryo imaging added an important dimension to this study. The authors were able to confirm an unquantified finding from a previous lab (reduced time to 2-cell stage in oocyte-deletion Eed offspring, Inoue 2018, PMID: 30463900) as well as identify developmental delay and mortality at the blastocyst- hatching transition.

      • For the weight and morphological analysis the authors are careful to provide isogenic controls for most of the experiments presented. This means that any phenotypes can be attributed to the oocyte genotype rather than any confounding effects of maternal or paternal genotype.

      • Overall, there is good evidence that oocyte deletion of Eed results in early embryonic growth restriction, consistent with previous observations (Inoue 2018, PMID: 30463900).

      Reviewer 3, Comment 1: Weaknesses: Gaps in the reporting of specific features of the methodology make it difficult to interpret/understand some of the results.

      While we are unsure exactly which methods Reviewer 3 would like expanded, we have updated parts that we thought required further detail and allow more informed interpretation of the results. These include methods for placental histology (Lines 650-669) and immuno- histochemistry (Lines 671-690), and new methods for CD31 immunofluorescence (Lines 692-714), glucose and metabolomics (Lines 752-769) and DNA methylation (RRBS; Lines 734-750) analyses.

      To clarify the approach taken for histology, immunohistochemical and immunofluorescent staining, sections were cut in compound series from the centre of each placenta, ensuring that we collected representative data for each sample. QuPath was used to quantify the decidual and junctional zone areas in one complete, fully intact midline section for each placenta as close to the midline as possible. This provided data from 10 placentas for each genotype. In addition, glycogen-enriched and non-glycogen-enriched cells were identified and quantified using machine learning assisted QuPath analyses of the whole placenta, decidua and junctional zone regions. We have also added quantitative analyses of the labyrinth and labyrinth capillary network using immunofluorescent CD31 staining and machine learning assisted HALO software. This new analysis of placental morphology is included in the methods section.

      Moreover, as there were no sex-specific differences in placental morphology or weight, we combined the samples from both sexes to provide greater numbers for analysis in each genotype. For example, as described for the analyses of labyrinth and capillaries using CD31 IF, 4 placentas of each sex were used for data collection. This provided data from a total of 8 placentas (4 male and 4 female) for each genotype from a total of 17 WT-wt (9 male and 8 female), 21 HET-het (9 male and 12 female) and 24 HET-hom (16 male and 8 female) sections (2-3 sections/placenta).

      Reviewer 3, Comment 2: Placental hyperplasia with disproportionate overgrowth of the junctional trophoblast especially the glycogen trophoblast (GlyT) cells.

      Strengths: • The authors provide a comprehensive description of how placental and embryo weight is affected by the oocyte-Eed deletion through mid-to-late gestation development. The case for placentomegaly is clear.

      Weaknesses:

      • The placental efficiency data presented in Figure 3G-I is incorrect. Placental efficiency is calculated as embryo mass/placental mass, and it increases over the late gestation period. For e14.5 for example (Fig3G), WT-wt embryo mass = ~0.3g, placenta mass = 0.11g (from Fig 3D) = placental efficiency 2.7; HET-hom = 0.25/0.12 = 2.1. The paper gives values: WT-wt 0.5, HET-hom 0.7. Have the authors perhaps divided placenta weight by embryo mass? This would explain why the E17.5 efficiencies are so low (WT-wt 0.11 rather than a more usual figure of 8.88. If this is the case then the authors' conclusion that placental efficiency is improved by oocyte deletion of Eed is wrong - in fact, placental efficiency is severely compromised.

      The authors have performed cell type counting on histological sections obtained from placentas to discover which cells are contributing to the placentomegaly. This data is presented as %cell type area in the main figure, though the untransformed cross-sectional area for each cell type is shown in the supplementary data. This presentation of the data, as well as the description of it, is misleading because, while it emphasises the proportional increase in the endocrine compartment of the placenta it downplays the fact that the exchange area of the mutant placentas is vastly expanded. This is important for two reasons.

      Firstly, the whole placenta is increased in size suggesting that the mechanism is not placental lineage- specific and instead acting on the whole organ. Secondly in relation to embryonic growth, generally speaking, genetic manipulations that modify labyrinthine volume tend to have a positive correlation with fetal mass whereas the relationship between junctional zone volume and embryonic mass is more complex (discussed in Watson PMID: 15888575, for example). The authors should reconsider how they present this data in light of the previous point.

      We thank the reviewer for pointing out our error in the placental efficiency analysis and apologise for this error. We have corrected the presentation and interpretation of these data and have described this in detail in our response to Reviewer 1, Comment 1.

      As discussed in our response to Reviewer 1, Comment 1, we have added a range of analyses to determine whether placental efficiency was enhanced in HET-hom offspring. These include measuring fetal and maternal circulating glucose levels (Figure 4K), individual amino acids and an extensive range of metabolites (Figure 8) and providing CD31 immunofluorescent analyses of labyrinth area, labyrinth/placental ratio and capillary/labyrinth ratio in HET-hom and control placentas (Figure 5).

      We also added analyses of glycogen enriched and non-glycogen-enriched cell counts in the decidua and junctional zones. As suggested by Reviewer 3, both glycogen-enriched and non-enriched cell populations are significantly increased in HET-hom placentas.

      Combined, these new analyses significantly expand the study and support the conclusion that placental efficiency in HET-hom offspring was either compromised or not different from controls, depending on the analysis. We find no evidence that placental efficiency was increased in HET-hom offspring and have reworked our results and discussion sections to reflect these new data and interpretation.

      Reviewer 3, Comment 2 cont: Again, some of the methods are not clearly reported making interpretation difficult - especially how they have estimated their GlyT number.

      As outlined in our response to Reviewer 3 Comment 1, in the methods section we have added further detail of how we counted glycogen-enriched and non-enriched cells in the decidua and junctional zone regions of sections for the middle of WT-wt, WT-het, HET-het and HET-hom placentas (Lines 650-669).

      Reviewer 3, Comment 3: Perinatal embryonic/pup overgrowth.

      Strengths:

      • The overgrowth exhibited by the oocyte-Eed-deleted pups is striking and confirms the previous work by this group (Prokopuk, 2018). This is an important finding, especially in the context of understanding how PRC2-group gene mutations in humans cause overgrowth syndromes. It is also intriguing because it indicates that genetic/environmental insults in the mother that affect her gamete development can have long-term consequences on offspring physiology.

      Weaknesses:

      • Is the overgrowth intrauterine or is it caused by the increase in gestation length? The way the data is reported makes it impossible to work this out. The authors show that gestation time is consistently lengthened for mothers incubating oocyte-Eed-deleted pups by 1-2 days. In the supplementary material, the mutant embryos are not larger than WT at e19.5, the usual day of birth. Postnatal data is presented as day post-parturition. It would probably be clearer to present the embryonic and postnatal data as days post coitum. In this way, it will be obvious in which period the growth enhancement is taking place. This is information really important to determine whether the increased growth of the mutants is due to a direct effect of the intrauterine environment, or perhaps a more persistent hormonal change in the mother that can continue to promote growth beyond the gestation period.

      We have used embryonic day (E) to denote embryo and fetal age throughout the study – this is the same as using DPC (i.e. E19.5 is equivalent to 19.5 DPC). As described in the Methods “Collection of post-implantation embryos, placenta and postnatal offspring”, mice were time mated for two-four nights, with females plug checked daily. Positive plugs were noted as day E0.5.

      To make the data presentation clearer, we have shown the data for surviving HET-hom pups born on E19.5 (Figure 2J) separately from all HET-hom surviving pups born on E19.5-E21.5. (Figure 2G). As discussed in our response to Reviewer 2, we have also included growth data for pregnancies at E14.5, E17.5, E18.5 (Fig. 2C-F) and E19.5 (Figure 2J,K), as well as P0 (combined data for surviving pups born E19.5-E21.5), and P3 (combined data for surviving pups born E19.5-E21.5, Figure 2G,H).

      These data clearly show that HET-hom fetuses are substantially growth and developmentally delayed at E14.5 (Figure 2D), but HET-hom pups born on E19.5 are the same weight as WT-wt, WT-het and HET-het control pups (Figure 2J). This demonstrates that weight of HET-hom fetuses is normalised in utero between E14.5 and day of birth on E19.5.

      Importantly, as requested by Reviewer 3, we have separated average weight for all surviving pups with a day of birth of E19.5-21.5 (Figure 2G) from average weight of pups born on E19.5 only (Figure 2J). These analyses revealed that the average weight of surviving pups born between E19.5-21.5 was significantly higher than for controls (Figure 2G), but the average weight of pups born on E19.5 only was not. It is therefore clear that extended gestation also contributed to increased HET-hom pup birth weight. We have updated these additional analyses in Results (Lines 165-197) and Figure 2

      As revealed in Figure 2H, it is also possible/likely that growth of HET-hom pups during the three days post- partum may have contributed to the offspring overgrowth we observed in this and our previous study (Prokopuk et al., 2018 Clinical Epigenetics). However, we cannot determine whether there is a contribution from a persistent maternal hormonal change that promotes post-natal offspring growth or whether there is an innate growth benefit in HET-hom pups. As this is very difficult to dissect, separating these possibilities is beyond the scope of our study.

      Reviewer 3, Comment 4: "fetal growth restriction followed by placental hyperplasia, .. drives catch-up growth that ultimately results in perinatal offspring overgrowth".

      Here the authors try to link their observations, suggesting that i) the increased perinatal growth rate is a consequence of placentomegaly, and ii) the placentomegaly/increased fetal growth is an adaptive consequence of the early growth restriction. This is an interesting idea and suggests that there is a degree of developmental plasticity that is operating to repair the early consequences of transient loss of Eed function.

      Strengths:

      • Discrepancies between earlier studies are reconciled. Here the authors show that in oocyte-Eed-deleted embryos growth is initially restricted and then the growth rate increases in late gestation with increased perinatal mass.

      Weaknesses:

      • Regarding the dependence of fetal growth increase on placental size increase, this link is far from clear since placental efficiency is in fact decreased in the mutants (see above).

      • "Catch-up growth" suggests that a higher growth rate is driven by an earlier growth restriction in order to restore homeostasis. There is no direct evidence for such a mechanism here. The loss of Eed expression in the oocyte and early embryo could have an independent impact on more than one phase of development.

      Firstly, there is growth restriction in the early phase of cell divisions. Potentially this could be due to depression of genes that restrain cell division on autosomes, or suppression of X-linked gene expression (as has been previously reported, Inoue, 2018 PMID: 30463900). The placentomegaly is explained by the misregulation of non-canonically imprinted genes, as the authors report (and in agreement with other studies, e.g. Inoue, 2020. PMID: 32358519).

      • Explaining the perinatal phase of growth enhancement is more difficult. I think it is unlikely to be due to placentomegaly. Multiple studies have shown that placentomegaly following somatic cell nuclear transfer (SCNT) is caused by non-canonically imprinted genes, and can be rescued by reducing their expression dosage. However, SCNT causes placentomegaly with normal or reduced embryonic mass (for example -Xie 2022, PMID: 35196486), not growth enhancement. Moreover, since (to my knowledge) single loss of imprinting models of non-canonically imprinted genes do not exist, it is not possible to understand if their increased expression dosage can drive perinatal overgrowth, and if this is preceded by growth restriction and thus constitutes 'catch up growth'.

      Reviewer 3 is correct in their assessment that placental efficiency was decreased in HET- hom offspring and we have corrected the placental efficiency analysis based on fetal/placental weight ratios (discussed in detail in our response to Reviewer 1 Comment 1). We have added substantially more data (glucose, amino acids, metabolites, labyrinth capillary area and density). These data support the conclusion that a placentally driven advantage for HET-hom fetal growth is unlikely, despite our observation that HET- hom fetuses are developmental delayed and underweight at E14.5, but are born at normal weight after a normal gestational length (19.5 days) (discussed in our responses to Reviewer 3, Comment 3 and Reviewer 2).

      This demonstrates that HET-hom fetuses are able to attain normal birth weight despite being initially growth restricted state at E14.5, and that this occurs despite low placental function. Moreover, as we compared isogenic offspring with heterozygous loss of Eed (Het-het compared to HET-hom offspring) the outcomes we observed in HET-hom offspring originate from loss of EED in the growing oocyte or loss of maternal EED in the zygote strongly suggesting that a non-genetic mechanism is involved.

      As pointed out by Reviewer 3, the initial developmental delay in HET-hom offspring may be due to increased expression of genes that regulate cell proliferation – this could clearly explain the lower number of cells we observed in the ICM and the growth delay at later stages of embryonic and fetal development. Another possibility is that maternal PRC2 provided by the oocyte promotes cell divisions in preimplantation embryos We have discussed these possibilities on Lines 467-476.

      In addition, Matoba et al 2022 demonstrated that deletion of maternal Xist together with Eed was able to rescue male-biased lethality in offspring from oocytes lacking Eed, revealing a clear role for X-linked genes in this phenotype (Matoba et al 2022, Genes and Development). However, deletion of maternal Xist did not properly normalise survival offspring from Eed null oocytes (i.e. Eed/Xist double maternal null litters were smaller than litters derived from wild type oocytes) strongly suggesting other mechanisms provide the capacity for HET-hom offspring to attain normal weight at birth. We have added further discussion of the Matoba study in the context of our study on of the Discussion (Lines 544-555)

      Finally, with respect to the outcomes for SCNT derived offspring, we extracted SCNT fetal growth and placental weight data from the supplementary data included in Matoba et al., 2018 Cell Stem Cell. 2018;23(3):343-54.e5 and compared it with data collected in our study (Figure 7). This analysis revealed that the weights of placentas and fetuses of offspring derived via SCNT were very similar to the HET-hom offpsring in our study and we have discussed the similarities and potential differences between HET-hom and SCNT offspring in the Discussion (Lines 478-500).

      As pointed out by Reviewer 3, deletion of maternal non-canonically imprinted genes partially or fully rescued the placental hyperplasia phenotype in both SCNT derived and offspring from oocyte lacking EED. However, as we have discussed, the mechanisms underlying other aspects of the offspring phenotype, such as fetal growth recovery of HET-hom offspring observed in our study, remain unknown. Moreover, the comparison we provide in Figure 7 strongly indicates that HET-hom and SCNT fetuses are similarly delayed at E14.5 and undergo similar fetal growth recovery before birth, but the mechanism also remains unknown. Together, it appears that offspring derived from either Eed-null oocytes or by SCNT have an innate ability to remediate fetal growth restriction during the late stages of pregnancy without a requirement to correct maternally inherited impacts mediated by Xist or H3K27me3-dependent imprinting.

    1. Author response

      Reviewer #1 (Public Review):

      The main contribution appears to be related to functional specialization. I suggest clarifying the major novelty of the present report and to focus the introduction on it.

      We thank this reviewer for this suggestion. We have revised the introduction to emphasize the functional specialization question. The changes are extensive; we have included a tracked-changes version of the manuscript to make these edits easy to see.

      There is a growing literature on fluctuating neural firing patterns that is not considered in this report. The scholarship appears a bit impoverished with only 19 references, many of which point to work from this group of collaborators. I suggest that the authors consider the present work in the context of the wider literature more scholarly, even if not all the relations of these different lines of work can be conclusively connected at this point. For a few examples, there is work by Kienitz and colleagues on fluctuating neural patterns in V4 evoked by competing grating stimuli. Also, the work by Engel, Moore, and colleagues on 'on' and 'off' states in the context of selective attention seems relevant, or the work by Fiebelkorn and Kastner on rhythmic perception and attention.

      We agree completely with this suggestion! We have reworded the introduction to be more inclusive of other research in this area (especially Kienitz and colleagues – exciting work that we are pleased to have had brought to our attention) and we have added about 500 words in the Discussion to cover the work on on/off states (Engel et al.), rhythmic perception (Fiebelkorn & Kastner and others), and attention more generally (e.g., Triesman & Gelade’s work on serial sampling). We are particularly pleased to add these sections because these topics are very much on our minds – we have a commentary piece under review elsewhere in which we evaluate these synergistic lines of approach in a more complete fashion. In total, we’ve added about 15 additional references.

      Reviewer #2 (Public Review):

      The description of the results would benefit from a better explanation of how low spike counts may influence the outcome of the analysis. Due to a smoothing procedure used for visualization, the spike counts for the paired stimuli (AB, black lines) shown in Figure 3a-b and Figure 4a-d go below 0. However, the actual spike count on a trial can not go below 0. The symmetric smoothing procedure may hide an underlying skewed distribution of spike counts that can only be positive. The statistical analysis is not performed on the smoothed distribution but on the actual spike counts, and the validity of the result is therefore not in question. However, the paper would benefit from 1) visualization of the unsmoothed trial counts, and 2) an explanation of how assumptions of symmetric/skewed distributions may affect the outcome.

      We thank the reviewers for noting this and making these suggestions. We now include unsmoothed raw spike counts in all the example figures (Figure 3a-b and Figure 4a-d). With regard to the symmetric/skewed distributions and the analysis methods, a Poisson distribution will be skewed at low rates and become more symmetric at higher rates, so this is already incorporated into the analysis. Indeed, the utility of Poisson distributions for fitting non-negative data is one of the reasons these distributions are so commonly used in neuroscience. We now make this point explicitly at the beginning of Methods/Data analysis: “Our method centers on modeling spike counts based on Poisson distributions, a common technique for handling non-negative count data in neuroscience and other fields.” With this edit as well as the revised example figures now making clear that no spike counts are below zero, we are optimistic that readers will better understand the analysis method and how the shape of response distributions are incorporated into it.

    1. Author Response:

      We take the liberty to thank all of you for your constructive and inspiring comments, which will help us substantially improve the final version of the paper. Before our final revision with details, I am writing this provisional letter to have a quick response to our reviewers’ comments.

      I first give a quick and short summary for your public reviews, then respond point-by-point.

      Editors:

      1. More discussion is needed.

      2. More discussion about eye fixation during adaptation. Discuss why increasing visual uncertainty by blurring the cursor in the present study produces the opposite findings of previous studies (Tsay et al., 2021; Makino et al., 2023).

      3. Discuss the broad impact of the current model.

      4. Share the codes and the metadata (instead of the current data format).

      Response: This is a concise summary of the major concerns listed in the public review. Given these concerns are easy to address, we are giving a quick but point-to-point response for now. The elaborate version will be put into our formal revision.

      **Reviewer 1: **

      1) More credit should be given to the PReMo model: a) The PReMo model also proposes that perceptual error drives implicit adaptation, as in a new publication in Tsay et al., 2023, which was not public at the time of the current writing; and b) The PReMo model can account for some dataset, e.g. Fig 4A.

      Response: We will add this new citation and point out that the new paper also uses the term perceptual error. We will also point out that the PReMo model has the potential to explain Fig 4A, though for now, it assumes an additional visual shift to explain the positive proprioceptive changes relative to the target. We would expand the discussion about the comparison between the two models.

      2) The present study produced an opposite finding of a previous finding, i.e., upregulating visual uncertainty (by cursor blurring here) decreases adaptation for large perturbations but less so for small perturbations, while previous studies have shown the opposite (by using a cursor cloud; Tsay et al., 2021; Makino et al., 2023). This needs explanation.

      Response: Using the cursor cloud (Tsay et al., 2021, Makino et al., 2023) to modulate visual uncertainty has inherent drawbacks that make it unsuitable for testing the sensory uncertainty effect for visuomotor rotation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) AND the sensory estimate of angles (the error itself, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v in our model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from an un-blurred cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target. This “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019 eLife). This is a second confound that accompanies the cursor cloud. We will expand our discussion to explain the discrepancy between our findings and previous findings.

      3) The estimation of visual uncertainty (our exp1) required people to fixate on the target, while this might not reflect the actual scenario during adaptation where people are free to look wherever they want.

      Response: Our data shows otherwise: in a typical error-clamp setting, people fixate on the target for the majority of the time. For our Exp1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1). We also collected eye-tracking data in our Exp4, which is a typical error-clamp experiment. More than 95% of gaze falls with +/- 50 pixels around the center of the screen, even slightly higher than Exp1. We will provide this part of the data in the revision. In fact, we designed our Exp1 to mimic the eye-tracking pattern as in typical error-clamp learning with carefully executed pilot experiments.

      This high percentage of fixating on the target is not surprising: the error-clamp task requires participants to use their hands to move towards the target and to ignore the cursor. In fact, we would also like to point out that the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in de Brouwer et al. 2018; Bromberg et al. 2019; we have an upcoming paper to show this). This is one reason that our new theory would also apply to other types of motor adaptation.

      4) More methodology details are needed. E.g., a figure showing the visual blurring, a figure showing individual data, a table showing data from individual sessions, code sharing, and a possible new correlational analysis.

      Response: All these additional methodological/analysis information will be provided. We were self-limited by writing a short paper, but the revision would be extended for all these details.

      Reviewer 2:

      1) More discussions are needed since the focus of this study is narrowly confined to visuomotor rotation. “A general computational principle, and its contributions to other motor learning paradigms remain to be explored”.

      Response: This is a great suggestion since we also think our original Discussion has not elaborated on the possible broad impact of our theory. Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating multimodal feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least we should consider using the two principles above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for the skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.

      We will add more discussion on the possible broad implications of our model in the revision.

      Reviewer 3:

      1) Similar to Reviewer 1, raised the concern about whether people’s fixation in typical motor adaptation settings is similar to the fixation that we instructed in our Exp1.

      Response: see above.

      2) Similar to Reviewer 2, the concern was raised about whether our new theory is applicable to a broad context. Especially, error clamp appears to be a strange experimental manipulation that has no real-life appeal, “(i)Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world”.

      Response: about the broad impact of our model, please see responses to Reviewer 2 above. We agree that ignoring errors (and thus “trying” to suppress adaptation) should not be a movement strategy for real-world intentional tasks. However, even in real life, we constantly attend to one thing and do the other thing; that’s when implicit motor processes are in charge. Furthermore, it is this exact “ignoring” instruction that elicits the implicit adaptation that we can work on. In this sense, the error-clamp paradigm is a great vehicle to isolate implicit adaptation and allows us to unpack its cognitive mechanism.

      3) In Exp1, the 1s delay between the movement end and the presentation of the reference cursor might inflate the actual visual uncertainty.

      Response: The 1s delay of the reference cursor would not inflate the estimate of visual uncertainty. Our Exp1 used a similar paradigm by visual science (e.g., White, Levi, and Aitsebaomo, Vision Research, 1992), which shows that delay does not lead to an obvious increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.

      4) Our Fig4A used Tsay et al., 2021 data, which, in the reviewer’s view, is not an appropriate measure of proprioceptive bias. The reason is that in this dataset, “participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to.”

      Response: We agree that Tsay et al., 2021 study used an unconventional way to measure the influence of implicit adaptation on proprioception. And, their observed “proprioceptive changes” should not be called “proprioceptive bias” which is conventionally a reserved term for measuring the difference between the estimated hand location relative to the actual hand location (and better to be a passively moved hand). However, we think their dataset is still subject to the same Bayesian cue combination principle and thus can be modeled. Our modeling of this dataset includes all relevant cues: the implicitly perceived hand position and the proprioceptive cue (given that the hand stays at the movement end). Both cues are in the extrinsic coordinates, which happened to set the target position as zero. But where to set the zero (whether it is the target or the actual hand location) does not matter for the model fitting. Note that our Exp4 is also based on PEA modeling of proprioceptive bias, and this time the data is presented relative to the actual location.

      In the revision, we would keep the current Fig4A and start to call the data as proprioceptive change as opposed to proprioceptive bias to follow the convention.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In no particular order:

      1. In Figs S3 and S4, can they also show gamma fit? (or rather corrected fit accounting for abundance conditioning?) The shapes look different, especially for the microbial mat.

      Author response: We have added gamma distribution fits to the rescaled AFD plots (Figs. S3, S4).

      1. Lines 170-176 seem like they should come before lines 164-166.

      Author response: In lines 166-170 we discuss empirical patterns in the data that motivate the introduction of the SLM as a model in lines 170-175. We have clarified these points in the revision.

      1. The wiggles in the gamma predictions in the occupancy-abundance plots are because occupancy depends not only on abundance but also on the shape parameter, right? Probably good to write a sentence or two explaining what's going on here.

      Author response: We agree with the reviewer that the variation in the prediction could be in-part driven by variation in the shape parameter across community members. We now include this observation in our revision (lines 209-211).

      1. In the predicted vs observed occupancy plots, it would be nice to add curves showing predicted standard deviation or similar to give a sense of how well the model is predicting the variability.

      Author response: In the revised manuscript we now include predictions for the variance of occupancy using the gamma distribution under both taxonomic and phylogenetic coarse-graining (Fig. S9; S10; lines 211-214).

      1. Covariance between sister groups: Figs S9 and S10 look very nice, but it's hard to see much because they're log-log plots over multiple decades, while even a several-fold difference from y = x would indicate a strong effect of correlations. It would be clearer if the y-axis showed the ratio of the coarsegrained variance to the sum of OTU variances and we were looking at how well it fit y = 1.

      Author response: We have included these plots in the revision (Fig. S14, S15).

      1. If the sum of gammas can be well-approximated by a gamma, does that mean that the gamma is just a fairly flexible distribution and we shouldn't take the quality of the gamma fits in general as a very specific indication of what's going on?

      Author response: While the sum of random variables that are drawn from gamma distributions with different parameters is often well-approximated by another gamma, this does not tell us why the gamma distribution holds for microbial communities at the finest-grain level (i.e., OTUs/ASVs). At present, the best explanation is that the gamma is a stationary distribution for certain stochastic differential equations which have ecological interpretations (Grilli, 2020; Shoemaker et al., 2023). Furthermore, alternative two-parameter distributions have been tested alongside the gamma and have done a comparatively poor job capturing observed macroecological patterns (Grilli, 2020). These results suggest that the utility of the gamma distribution is not simply an outcome of its flexible nature, it succeeds because it has captured core ecological properties of microbial communities. In the case of the SLM, gamma-like distributions arise when a community member is subject to self-limiting growth and environmental noise. On the other hand, the stability of the gamma distribution might explain why it can be detected as shape of the AFD, as it does not fade out across coarse-graining level.

      1. What's going on with the variance of diversity in Fig S12? Does this suggest that some of the problem in Figure 4 could be with the analytic approximation rather than the model? I had a hard time understanding the part of the Methods explaining the simulation details (lines 587-597). It would be worth expanding this. Is there some way to explain how the correlations were simulated in terms of the SLM, e.g., correlations in the noise term across OTUs?

      Author response: We believe that deviations in the variance of diversity in Fig. S16g,h are driven by small deviations in our predictions of the second moment $$< (x*ln(x) | N_{m}, \bar{x}{i}, \beta{i}^{2} >$$ (Eq. S16). Alone these predictions are slight, but their effects become noticeable when summed over hundreds or thousands of taxa. We have included this observation in the revised manuscript (lines 268-271). However, this deviation pales in comparison with the magnitude of covariance in the empirical data, suggesting that our inability to predict the variance of richness and diversity is primarily driven by our assumption of statistical independence.

      Regarding the source of the correlations, under the SLM correlations in abundances can be introduced either by adding deterministic interaction terms or through correlated environmental noise. Determining which of these two options drives empirical correlations is an active area of research (e.g., Camacho-Mateu et al., 2023). For the purpose of this study, we remain agnostic on the cause of the correlations, optioning to instead emphasize that that the inclusion of correlations is necessary to reproduce observed slopes of the fine vs. coarse-grained relationship for diversity.

      1. In Figure 5ab, is the idea that the correlation in richness is primarily driven by the number of samples from the environment? Line 390 seems to say so, but it would be good to make this explicit and put it right in that section of the Results.

      Author response: Our results suggest that sampling effort (# reads) plays a larger role in determining the correlations between fine and coarse-grained measures of richness. We now clarify this point in the revised manuscript (lines 429-435).

      1. I don't totally understand the contrast in lines 369-372. If fine-scale diversity within one group begets coarse-grained diversity in another group, couldn't that show up as correlations in the AFDs? Or is the argument that only including within-group correlations in AFDs is enough to reproduce the pattern? I'm not sure I see how that could be.

      Author response: The term “begets” implies both causation and direction. If we see a positive relationship between diversity estimates at two different scales of observation the causal mechanism cannot be determined solely from correlations between samples obtained once from different sites. So, mechanisms consistent with niche construction/"DBD" can produce correlations, though the existence of correlations do not necessarily imply DBD.

      1. The discussion of niche construction on 429-431 doesn't match very well with 440-441. Basically, niche construction is a very broad concept, not a specific one, right?

      Author response: In lines 472-576 (formerly 429-431) we discuss how the existence of correlations between fine and coarse-grained scales does not point to a single ecological mechanism. Alternatively stated, observing a non-zero slope does not mean that niche construction is driving the relationship.

      In lines 476-487 (formerly 440-441) we discuss how the mechanism of cross-feeding has been shown to generate a positive relationship between fine and coarse-grained measures of diversity. This mechanism can be interpreted as a form of “niche construction”, so it is an instance of a tested ecological mechanism that aligns with the interpretation given in Madi et al. (2020).

      1. Isn't (8) just the negative binomial distribution?

      Author response: The convolution of the stationary solution of the SLM (i.e., a gamma distribution) and the Poisson limit of a multinomial sampling distribution returns a negative binomial distribution of read counts across hosts if samples have identical sampling depths. We now include this detail in the revision (line 593-595). Note however that if different samples have different sampling depths, the distribution of reads across samples is not a negative binomial.

      1. Missing 1/M in (9).

      Author response: We have fixed this omission in the revision.

      1. Schematic figures illustrating what the different statistics are intuitively capturing would really help this work be understandable to a broader audience, but they'd also be a ton of work.

      Author response: Richness and diversity are used in ecology to such an extent that we do not see the benefit of a conceptual diagram. Furthermore, we have included a conceptual diagram about our pipeline in our revision at the request of Reviewer 2 (Fig. S20).

      Reviewer #2:

      Major Recommendations

      If I were reviewing this manuscript for a regular journal, I believe the following issues would be important to address prior to publication.

      1. From my reading, the main points of this advance are that

      a. SLM models AFDs well at all levels of coarse-graining.

      b. This makes SLM a better null-model than UNTB for macroecological relationships.

      c. Using SLM on the EMP data, the richness slopes are well explained by SLM but not the diversity slopes. Therefore, any theory that hopes to explain the diversity slopes must include interactions. Argument B appears to be one of the key points yet is missing from the abstract, and should be made clearer. If these aren't the main points the authors intended, then other main points need to be highlighted more.

      Author response: In the revision we now explicitly mention argument b in the Abstract.

      1. The title should be more specific, so as to better reflect the content. (E.g. "UNTB is not a good null model for macroecological patterns" would seem more appropriate.)

      Author response: We would prefer to focus on the success of the SLM rather than the limitations of the UNTB in the title of this work. Therefore, we have modified our title as follows: “Investigating macroecological patterns in coarse-grained microbial communities using the stochastic logistic model of growth”.

      1. The manuscript would benefit from a clearer description of exactly what information the SLM retains about the data (perhaps even a cartoon panel in one of the figures). In particular, it is important to be explicit about the number of model parameters.

      Author response: The number of model parameters for the gamma AFD are now explicitly stated in the revision (Lines 579-580).

      1. The main point of Figures 2-4 seems to be that SLM is good at describing the data (and when it fails it is due to interactions) while UNTB fails to reproduce this behavior, in support of Argument B. This is not clear from the figure descriptions or titles, which focus on SLM's "predictive" power.

      Author response: Fig. 2a demonstrates that the gamma distribution predicted by the SLM explains the empirical distribution of abundances. This result provides motivation to predict the fraction of sites harboring a given community member (i.e., occupancy, Fig. 2c) as well as general measures of community composition including mean richness (Fig. 3a,c) and mean diversity (Fig. 3b,d) using parameters estimated from the data (not free parameters).

      This success led us to consider whether the gamma distribution could predict the variance of richness and diversity, which it could not because it does not capture covariance between community members (Fig. 4).

      In the revision we have identified opportunities to make these points clear throughout the Results. Furthermore, we have added additional detail to the legends of Figs. 2-4.

      1. The manuscript would benefit from clarifying the use of "prediction" related to the SLM. Since the gamma distributions predicted by SLM were fit to empirical data, it seems like the agreement between analytic means and empirical means (Fig. 3) is a statement on gamma distributions being a good fit for the AFD's more than SLM predicting richness and diversity. For example, from my reading, it seems like this analysis could be done numerically by shuffling species abundances across environments and seeing whether this changed the mean richness/diversity. I would not call this shuffling test a prediction, since it is more a statement on the relevance of interactions. SLM predicts gamma-distributed AFD's, but those distributions recovering the data they were trained on doesn't seem like a prediction.

      Author response: In this manuscript we identified the gamma distribution as an appropriate probability distribution to describe the distribution of relative abundances across samples over a range of coarse-grained scales. Motivated by this result, we performed a separate analysis where at each scale we estimated the mean and variance of relative abundance across sites for each community member. We then used these parameters to obtain the expected value of a community-level measure using an equation we derived by assuming that the gamma distribution was appropriate (e.g., richness, Eq. 13). We then compared the expected value of richness to the mean value from empirical data and assessed the similarity between the two values.

      The outcome of this procedure constitutes a prediction. While the mean and variance are parameters, estimating them from the empirical data has no connection with the operation of training a distribution on empirical data. We could have derived predictions such as Eq. 13 using any other probability distribution that can be parameterized using the mean and variance (e.g., Gaussian). Such a prediction would likely do a poor job even though it used the same means and variances used for our gamma predictions. This is because the choice of distribution would not have been a good descriptor of the distribution of abundances across hosts.

      To better explain this last -- perhaps the most significant -- issue, I'd like to ask the authors if the following recasting would be an accurate reflection of their conclusions, or if something is missing.

      1. "Focusing on the empirical relationship observed between diversity slopes by Madi 2020, we ask the question: does explaining these relationships require accounting for species-species correlations? Or could it be reproduced in a noninteracting model?" To address this question, one can perform a randomization test, shuffling abundances to preserve all single-OTU statistics but breaking any correlations. My reading of the authors' results is that (new result 1) the richness relationships would be preserved, while diversity relationships would not be preserved. [Note that this result 1 need not mention either SLM or UNTB.]

      Author response: The question of whether correlations between species are necessary to explain the observed slope of the fine vs. coarse-grained relationship was only one component of our research goals. Our first question was whether the SLM would prove to be a more appropriate null for evaluating the novelty of observed slopes. We believe that our results support the conclusion that the SLM is an appropriate null for this question, as it was able to capture observed slopes of the fine vs. coarse-grained relationship for estimates of richness, determining that correlations and the interactions that are ultimately responsible are not necessary to explain this result.

      We then find that the SLM as a null model fails to capture observed slopes of the fine vs. coarsegrained relationship for estimates of diversity and simulate the SLM with correlations to return reasonable estimates of the slope. However, here the question about correlations is a direct follow-up from our question about a null model that excludes interactions, so it is unclear how a randomization test would relate to this result.

      1. Instead of doing a randomization test (resampling the empirical distribution), one might insist on instead fitting a model to the AFD distributions, and sampling from that distribution rather than the empirical one.

      a. If doing it this way, one should of course ensure that the distribution being fit is a good description of the data.

      b. UNTB is a bad fit. SLM is a better fit, and in fact (new result 2) continues to be a good empirical fit even at coarse-grained levels.

      c. Can make statements on using SLM as a null model for these types of cross-scale relationships. Could try arguing that fitting an SLM model per-OTU (instead of resampling the empirical distribution) could offer some advantage if certain properties could be computed analytically from the fit parameters, instead of averaging over multiple computational rounds of resampling.

      Do these two points accurately summarize the manuscript? If so, this presentation avoids the confusion with "prediction". If my summary is missing some important point, the presentation should be revised to clarify the points I appear to have missed.

      Author response: In our manuscript we derive predictions from the gamma distribution, the stationary distribution of the SLM, that require parameters estimated from the data (i.e., mean and variance of relative abundance). These parameters are estimated from the data using normal procedures and then plugged into our predictions that assume the appropriateness of the gamma, returning values that are then compared to estimates from empirical data. Our estimation of the mean and variance does not assume that the empirical distribution following a gamma distribution, but the value returned by our function derived from the gamma distribution (e.g., Eq. 13) does make that assumption.

      To address the reviewer’s broader comment, we believe that following points summarize our manuscript:

      1. The gamma distribution as a stationary solution of the SLM captures macroecological patterns and predicts typical community-level properties (i.e., mean richness and diversity) across phylogenetic and taxonomic scales.

      2. The gamma distribution fails to predict variation in community-level properties (i.e., variance of richness and diversity) across phylogenetic and taxonomic scales. This occurs because the SLM is a mean-field model that does not explicitly include interactions between community members.

      3. Despite the inability to capture interactions, the gamma distribution succeeds at predicting the fine vs. coarse-grain slope for richness, a pattern that had previously been attributed to community member interactions. This result demonstrates that the novelty of a macroecological pattern hinges on one’s choice of null model.

      4. However, the gamma cannot capture the same relationship for diversity. Simulations of the gamma distribution that incorporate correlations between community members are capable of generating reasonable estimates of the slope.

      To address the reviewer’s comments regarding the appropriateness fitted gamma distributions, in our revision we have added fitted gamma distributions to plots of AFDs so that the reader can visually assess the ability of the gamma to describe empirical patterns (Fig. S3, S4).

      We have also obtained predictions for the slope of the fine vs. coarse-grained relationship for community richness using the same form of UNTB used by Madi et al (2020). In our revised manuscript we establish a procedure to infer the single parameter of this model, generate predictions of richness at fine and coarse-grained scales, and then evaluate whether the UNTB is capable of predicting the slope of the fine vs. coarse-grained relationship for richness (Supplementary Information; Figs. S18, 24-28; lines 277-278; 370-380).

      Other/minor comments

      1. The manuscript would be improved with more consistent terminology ("fine vs. coarse-grained relationship"/"the relationship" vs. "diversity slope"). Also, many readers may be used to OTUs referring to the rather fine level of description, as opposed to any chosen level; and could interpret indexing over groups as being in contrast with indexing over OTU's (coarse vs fine). The authors' use is perfectly correct, but keeping a consistent terminology would help.)

      Author response: We have revised our manuscript to specify the “slope” as the “slope of the fine vs. coarse-grained relationship” (e.g., Line 318). We also specify in the Results and in the Methods that we use “fine” and “coarse” as relative terms, keeping with the sliding-scale approach used in Madi et al (2020).

      1. While I appreciate this "slope" is something borrowed from other work, the clarity of the paper might benefit from a cartoon of how one goes from the raw data to the slopes at a particular coarse-graining level. (Optional).

      Author response: We had added a conceptual diagram to the revision (Fig. S20).

      1. The text often colloquially references "the gamma," "predictions of the gamma," etc. This phrasing comes across as sloppy, and the manuscript would be improved by being more specific.

      Author response: We now specify “gamma” as the “gamma distribution” throughout the manuscript.

      1. Equation 6 appears to be missing some subscripts on the x terms (included on the left of the equation).

      Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

      1. In "Simulating communities of correlated...AFDs", the acronym SAD is not defined.

      Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

      1. In Figure 2:

      a. Invariant is probably the wrong word for the title, since all the AFD's were rescaled by mean and variance before being compared. Data does support that the gamma distributions are good at describing the AFD's, but as stated in the description it's the general shape that is preserved, not the distribution itself.

      Author response: When we mention the invariance of the AFD we now specify that we mean that the shape of the distribution remained qualitatively invariant.

      b. I'd recommend changing the color coding to something with more contrast, since currently it's impossible to assess the claim that the shape of the distribution collapses.

      Author response: Our coarse-graining procedure is a sequential operation that has no intuitive point that would suggest the use of a contrasting colormap (e.g., if our scale ranged from -1 to 1 then there would be a natural point of contrast at zero).

      c. The legend is missing relevant technical details: How many OTU's were used to make plot a? How many samples?

      Author response: The number of samples was listed in the Materials and Methods (line 523). In the revision we now include a table with the average and total number of OTUs as well as the average number of reads for each environment (Table S1, S2).

      d. In plot b, is the mean relative abundance referring to "mean abundance when observed" or "mean across all samples"?

      Author response: The mean relative abundance is the mean abundance across all sites (line 204) and in the legend of Fig. 2.

      e. Since one argument here is that SLM fits these distributions better than UNTB, if possible it would be nice to see UNTB's failed fits here.

      Author response: A major feature of the UNTB is that the demographic parameters of community members are indistinguishable. Under the SLM, the variation in the mean relative abundance we observe suggests that the carrying capacities of community members vary over multiple orders of magnitude, a result that is incompatible with most forms of the UNTB (x-axis of Fig. 2b). We now mention this point in the revised manuscript (lines 110; 229; 455-471).

      1. In Figure 3:

      a. It is not clear how coarse-graining is included in model fitting. The "Deriving biodiversity measure predictions" section would benefit from including how coarse-graining is incorporated.

      Author response: We predict measures of biodiversity separately at each coarse-grained scale. We now clarify this detail in the revised manuscript (Lines 624-627).

      b. Reference Shannon Diversity in Methods.

      Author response: We now cite Shannon’s diversity.

      c. What is the blue/white color coding in plots a & c? It doesn't have any color key.

      Author response: Figs. 3-6 use a uniform light-to-dark scale for all environments, with each environment having its own color. For example, Fig. 3a contains data from the human gut microbiome. Human gut data were assigned the color aquamarine, so the shade of aquamarine for a given datapoint in Fig. 3a indicates the phylogenetic scale.

      In the revision we now clarify the colorscale in the legend of Fig. 3 and specify that the same scale is used in all subsequent figure legends.

      d. Re: earlier comments, why is richness considered a prediction? (Am I correct in my interpretation that panel b is almost a tautology - counting the number of zeros in the matrix either by rows or by columns - whereas panel d is nontrivial?)

      Author response: Mean richness as a measure of biodiversity depends on the fraction of sites where a given community member is present (i.e., occupancy). The mean relative abundance of a community member and its variation across sites (beta) is clearly related to occupancy, but those two statistics do not give you a prediction of occupancy. Obtaining a prediction of occupancy and, subsequently, richness, requires 1) a probability distribution of abundances (i.e., the gamma) and 2) a probability distribution of sampling (i.e., the Poisson). Using these two pieces of information, we derived a prediction for mean richness (Eq. 13). We then compare the value of richness obtained by plugging in the mean relative abundances, betas, and known number of reads to the observed mean richness obtained from the data.

      e. The lettering of subplots in Figure 3 is not consistent with Figure 4. Figure 3 subplots are also cited incorrectly in paragraph two on page six (lines 251-254).

      Author response: We thank the reviewer for noticing the error and we have corrected it in the revision.

      f. Again, if possible show UNTB predictions in plots a & c.

      Author response: In our revised manuscript we provide extensive descriptions and predictions of mean richness and the slope of the fine vs. coarse-grained relationship for richness using the form of the UNTB used in Madi et al. (2020; Figs. S18, S24 - S29; lines 277-282; 370-380). We then compare the error of these slope predictions to those obtained from the SLM, finding that the SLM generally outperforms UNTB (Figs. S27-S29).

      1. In Figure 4:

      a. What are the color codings in plots a & b?

      Author response: The color scale used in Fig. 4 is identical to the color scale used in Fig. 3. This detail is now specified in the legend of Fig. 4.

      b. What are the two lines of empirical data in plots a & b, and why is one of them dashed?

      Author response: We now specify what the two lines mean in the key within the figure.

      c. Same comment as earlier on predictions and richness.

      Author response: We now specify what the two lines mean in the key within the figure.

      1. In Figure 5:

      a. It wasn't clear to me in the manuscript how the authors generated these plots from the raw data. The manuscript would benefit from a clear cartoon/description of the data pipeline, from raw data to empirical (and analytic) slopes.

      Author response: We have added a conceptual diagram to the revised manuscript (Fig. S20).

      b. Make the figure title more descriptive to better connect it to the figure's objective (the richness slopes relationship is not novel, but the diversity slopes relationship is).

      Author response: We have revised the figure title.

      References

      Camacho-Mateu, J., Lampo, A., Sireci, M., Muñoz, M. Á., & Cuesta, J. A. (2023). Species interactions reproduce abundance correlations patterns in microbial communities (arXiv:2305.19154). arXiv. https://doi.org/10.48550/arXiv.2305.19154

      Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature Communications, 11(1), 4743. https://doi.org/10.1038/s41467-020- 18529-y

      Madi, N., Vos, M., Murall, C. L., Legendre, P., & Shapiro, B. J. (2020). Does diversity beget diversity in microbiomes? eLife, 9, e58999. https://doi.org/10.7554/eLife.58999

      Shoemaker, W. R., Sánchez, Á., & Grilli, J. (2023). Macroecological laws in experimental microbial systems (p. 2023.07.24.550281). bioRxiv. https://doi.org/10.1101/2023.07.24.550281

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough assessment of our study, their overall enthusiasm, and the helpful suggestions for clarifying the methods and results, additional analyses, and discussion points. We have made earnest efforts to address the weaknesses raised in the public review and other recommendations made by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Herein, Blaeser et al. explored the impact of migraine-related cortical spreading depression (CSD) on the calcium dynamics of meningeal afferents that are considered the putative source of migraine-related pain. Critically previous studies have identified widespread activation of these meningeal afferents following CSD; however, most studies of this kind have been performed in anesthetized rodents. By conducting a series of technically challenging calcium imaging experiments in conscious head fixed mice they find in contrast that a much smaller proportion of meningeal afferents are persistently activated following CSD. Instead, they identify that post-CSD responses are differentially altered across a wide array of afferents, including increased and decreased responses to mechanical meningeal deformations and activation of previously non-responsive afferents following CSD. Given that migraine is characterized by worsening head pain in response to movement, the findings offer a potential mechanism that may explain this clinical phenomenon.

      Strengths:

      Using head fixed conscious mice overcomes the limitations of anesthetized preps and the potential impact of anaesthesia on meningeal afferent function which facilitated novel results when compared to previous anesthetized studies. Further, the authors used a closed cranial window preparation to maximize normal physiological states during recording, although the introduction of a needle prick to induce CSD will have generated a small opening in the cranial preparation, rendering it not fully closed as suggested.

      Weaknesses:

      Although this is a well conducted technically challenging study that has added valuable knowledge on the response of meningeal afferents the study would have benefited from the inclusion of more female mice. Migraine is a female dominant condition and an attempt to compare potential sex-differences in afferent responses would undoubtedly have improved the outcome.

      Our study included only two females, largely reflecting the much higher success rate of AAV-mediated meningeal afferent GCaMP expression in males than in females. The reason for the lower yield in female mice is unclear to us at present but may involve, at least partly, sex-specific differences in the mechanisms responsible for efficient transduction with this AAV vector observed in peripheral tissues (Davidoff et al. 2003). While our study did not address sex differences, a recent study (Melo-Carrillo et al. 2017) reported CSD equally activating and sensitizing second-order dorsal horn neurons that receive input from meningeal afferents in male and female rats.

      The authors imply that the current method shows clear differences when compared to older anaesthetized studies; however, many of these were conducted in rats and relied on recording from the trigeminal ganglion. Inclusion of a subgroup of anesthetized mice in the current preparation may have helped to answer these outstanding questions, being is this species dependent or as a result of the different technical approaches.

      We have tried to address the anesthesia issue by conducting imaging sessions in several isoflurane-anesthetized mice. However, during these experiments, we observed a substantial decrease in the GCaMP fluorescence signal with a much lower signal-to-noise ratio that made the analyses of the afferents’ calcium signal unreliable. Reduced GCaMP signal in meningeal axons during anesthesia may be related to the development of respiratory acidosis, since lower pH leads to decreased GCaMP signal, as also mentioned by Reviewer #3. Of note, urethane anesthesia, which was used in all previous rat experiments, also produces respiratory acidosis.

      The authors discuss meningeal deformations as a result of locomotion; however, despite referring to their previous work (Blaeser et al., 2022), the exact method of how these deformations were measured could be clearer. It is challenging to imaging that simple locomotion would induce such deformations and the one reference in the introduction refers to straining, such as cough that may induce intracranial hypertension, which is likely a more powerful stimulus than locomotion.

      As part of the revision, we now provide a better description of the methodology (“Image processing and calcium signal extraction” section) used to determine meningeal deformations, including scaling, shearing, and Z-shift. In our previous paper (Blaeser et al. 2023), we provided an extensive description of the types of meningeal deformations occurring in locomoting mice. It should also be noted that locomotion drives cerebral vasodilation and intracranial pressure increases (Gao and Drew, 2016), which likely mediate, at least in part, the movement of the meninges towards the skull (positive Z-shift) and potentially other meningeal deformation parameters. We also agree with the reviewer that sudden maneuvers such as coughing and sneezing that lead to a larger increase in intracranial pressure are likely to be even more powerful drivers of endogenous intracranial mechanical stimulation than locomotion. Thus, our finding of increased responsiveness to locomotion-related meningeal deformation post-CSD may underestimate the increased afferent responsivity post-CSD during other behaviors such as coughing. We added this point to the discussion.

      More recently, several groups have used optogenetic triggering of CSD to avoid opening of the cranium for needle prick. Given the authors robustly highlight the benefit of the closed cranium approach, would such an approach not have been more appropriate.

      We agree with the reviewer that optogenetic methods used for CSD induction in non-craniotomized animals will further ensure accurate pressurization and, thus, will be an even better approach that avoids the burr hole used for pinprick. It should be noted, however, that the burr hole used for the pinprick likely had a minimal effect on intracranial pressure, as we minimized depressurization by plugging the burr hole throughout the experiments with a silicone elastomer. We have added this information to the revised Methods section.

      It is also worth noting that the optogenetic methodology used by others to provoke CSD was optimized only recently and relies on transgenic mice with a strong expression of YFP (Thy1.ChR2-YFP mice) within the superficial cortex that is not compatible with the afferent GCaMP imaging of meningeal afferents. Modifications using red-shifted opsins may allow the use of this strategy in the future.

      It was not clear how deformations predictors increased independent of locomotion (Figure 4D) as locomotion is essentially causing the deformations as noted in the study. This point was not so clear to this reviewer.

      As noted in our previous paper (Blaeser et al., 2023), deformation variables often exhibit different time courses than locomotion, even when a deformation is initially induced by the onset of locomotion. Most notably, the scaling-related deformation ramps up slowly and often persists for tens of seconds after the onset and termination of locomotion, which may be related to the recovery dynamics of the meningeal vascular response to locomotion. Overall, while locomotion serves as a predictor of meningeal deformation, we observed previously (Blaeser et al. 2023) many afferents whose responses were more closely associated with the moment-to-moment deformations than with the state of locomotion per se, suggesting that a unique set of stimuli is responsible for the activation of this deformation-sensitive afferent population. The increased sensitivity to deformation signals we observed following CSD suggests that the afferent population sensitive to deformation has unique properties that render it most susceptible to becoming sensitized following CSD. We now discuss this possibility.

      Reviewer #2 (Public Review):

      This is an interesting study examining the question of whether CSD sensitizes meningeal afferent sensory neurons leading to spontaneous activity or whether CSD sensitizes these neurons to mechanical stimulation related to locomotion. Using two-photon in vivo calcium imaging based on viral expression of GCaMP6 in the TG, awake mice on a running wheel were imaged following CSD induction by cortical pinprick. The CSD wave evoked a rise in intracellular calcium in many sensory neurons during the propagation of the wave but several patterns of afferent activity developed after the CSD. The minority of recorded neurons (10%) showed spontaneous activity while slightly larger numbers (20%) showed depression of activity, the latter pattern developed earlier than the former. The vast majority of neurons (70%) were unaffected by the CSD. CSD decreased the time spent running and the numbers of bouts per minute but each bout was unaffected by CSD. There also was no influence of CSD on the parameters referred to as meningeal deformation including scale, shear, and Z-shift. Using GLM, the authors then determine that there there is an increase in locomotion/deformation-related afferent activity in 51% of neurons, a decrease in 12% of neurons, and no change in 37%. GLM coefficients were increased for deformation related activity but not locomotion related activity after CSD. There also was an increase in afferents responsive to locomotion/deformation following CSD that were previously silent. This study shows that unlike prior reports, CSD does not lead to spontaneous activity in the majority of sensory neurons but that it increases sensitivity to mechanical deformation of the meninges. This has important implications for headache disorders like migraine where CSD is thought to contribute to the pathology in unclear ways with this new study suggesting that it may lead to increased mechanical sensitivity characteristic of migraine attacks.

      1) It would be helpful to know what is meant by "post-CSD" in many of the figures where a time course is not shown. The methods indicate that 4, 30 min runs were collected after CSD but this would span 2 hours and the data do not indicate whether there are differences across time following CSD nor whether data from all 4 runs are averaged.

      While we monitored time course changes in ongoing activity (see Figure 2), it was challenging to evaluate post-CSD changes in locomotion-related deformation responses at a fine temporal scale, as running bouts resumed at different time points post-CSD and occurred intermittently throughout the post-CSD analysis period. Our experiments were also not sufficiently powered to break out analyses at multiple different epochs post-CSD, partly because there wasn’t much locomotion. To allow comparisons using a sufficient number of bouts, we conducted our GLM analyses using all data collected during running bouts in the 2-hour post-CSD period (termed “post-CSD) versus in the 1-hour pre-CSD period. We have now clarified this further in the main text and figure legends.

      2) Why is only the Z-shift data shown in Figures 4A-C? Each of the deformation values seems to contribute to the activity of neurons after CSD but only the Z-shift values are shown.

      In many afferents, only one deformation variable best predicted the activity at both the pre- and post-CSD epochs. However, at the population level, all deformation variables were equally predictive. In the examples provided, the afferent developed augmented sensitivity that could only be predicted by the Z-shift variable, and the other deformation variables were not included to keep the figure legible. This is now clarified in the figure legend.

      3) How much does the animal moving its skull against the head mount contribute to deformations of the meninges if the skull is potentially flexing during these movements? Even if mice are not locomoting, they can still attempt to move their heads thus creating pressure changes on the skull and underlying meninges. The authors mention in the methods that the strong cement used to bind the skull plates and headpost together minimize this, but how do they know it is minimized?

      We did not measure skull flexing during locomotion and its potential effect on meningeal deformation. However, we would like to point out several considerations. It is evident from numerous imaging studies across various brain regions in freely moving animals, utilizing brain motion registration, that brain motion of the same scale (a few microns), as that observed in our studies, also occurs in the absence of head fixation (e.g., Glas et al, 2019; Zong et al 2021). In our system, the head-fixed mouse is locomoting on a cantilevered (spring-like) running wheel (see also Ramesh et al., 2018), which dissipates most, albeit not all, upward and forward forces applied to the skull during locomotion. Furthermore, the position of the headpost, anterior to where the mouse's paws touch the wheel, makes it hard for the mouse to push straight up and apply forces to the skull. We have updated the text in the methods section (Running wheel habituation) to address this. In our previous work (See Figure 2B in Blaeser et al. 2023), we found a substantial subset of afferents showing an increase in calcium activity that began after each bout of locomotion had terminated, and that lasted for many seconds, suggesting that skull flexing during locomotion may not play a leading role. Finally, we proposed in that study that meningeal deformations play a major role in the afferent response, given our findings of (i) sigmoidal stimulus-response curves between afferent activity and meningeal deformation and (ii) of different afferents that track scaling deformations along different axes. It is unlikely that all of these are related to any residual forces generated from skull deformations.

      4) What is the mechanism by which afferents initiate the calcium wave during the CSD itself? Is this mechanical pressure due to swelling of the cortex during the wave? If so, why does the CSD have no impact on the deformation parameters? It seems that this cortical swelling would have some influence on these values unless the measurements of these values are taken well after cortical swelling subsides. Related to point 1 above, it is not clear when these measurements are taken post-CSD.

      We provide, for the first time, evidence that CSD evokes local calcium elevation in meningeal afferent fibers in a manner that is incongruent with action potential propagation, as the activity gradually advances along individual afferents across many seconds during the wave. As indicated in Figure 1H, we measured these changes during the first 2 minutes post-CSD. Based on the reviewer’s question, we have now addressed whether mechanical changes occurring in the cortex in the wake of CSD might be responsible for the acute afferent activation we observed. We now include new data (Results, “Acute afferent activation is not related to CSD-evoked meningeal deformation” and Figure S2) showing an acute phase of meningeal deformation (as expected given the changes in extracellular fluid volume) lasting 40-80 seconds following the induction of CSD. Our data suggests, however, that these meningeal deformations are unlikely to be the main driver of the acute afferent calcium response. We propose that, based on the speed of the afferent calcium wave propagation and the distinct dynamics of calcium activity as compared to the dynamics of the deformations, the acute afferent response is more likely to be mediated by the spread of algesic mediators (e.g., glutamate, K+ ATP) and their diffusion into the overlying meninges.

      Because the peri-CSD meningeal deformations return to baseline soon after the cessation of the CSD wave, they are unlikely to affect our analyses of post-CSD changes in afferent sensitivity in the following 2 hours. This is also supported by our data (see Figure 3F-H) showing similar locomotion-related deformations pre- and post-CSD, which were measured after the deformations related to the CSD itself had subsided.

      5) How does CSD cause suppression of afferent activity? This is not discussed. It is probably a good idea in this discussion to reinforce that suppression in this case is suppression of the calcium response and not necessarily suppression of all neuronal activity.

      The mechanism underlying the suppression of afferent activity remains unclear. We now discuss the following points:

      First, the pattern of afferent responses resembles the rapid loss of cortical activity in the wake of a CSD, but its faster recovery points to a mechanism distinct from the pre-and post-synaptic changes responsible for the silencing of cortical activity (Sawant-Pokam et al., 2017; Kucharz and Lauritzen, 2018). Whether CSD drives the local release of mediators capable of reducing afferent excitability and spiking dynamics will require further studies.

      Second, the reviewer proposes that the suppressed calcium activity we observed in ~20% of the afferents immediately following CSD may reflect a decreased calcium response independent of afferent spiking activity. Such a process could theoretically involve factors influencing the GCaMP fluorescence (see also our response to Reviewer #3) and/or factors modifying the afferents’ spiking-to-calcium coupling. We note that if a CSD-related factor could modify the calcium response independent of afferent spiking, one would expect a more consistent effect across axons, reflected as a reduced signal in a larger proportion of the afferents, which we did not observe.

      6) How do the authors interpret the influence of CSD on locomotor activity? There was a decrease in bouts but the bouts themselves showed similar patterns after CSD. Is CSD merely inhibiting the initiation of bouts? Is this consistent with what CSD is known to do to motor activity? And again related to point 1, how long after CSD were these measurements taken? Were there changes in locomotor activity during the actual CSD compared to post-CSD?

      To the best of our knowledge, there is very little data on the effect of CSD on motor activity, making it challenging to engage in further speculation regarding the mechanisms underlying the preservation of running bouts patterns post-CSD. Houben et al. (2017) described a similar reduction in locomotion in mice, corresponding to decreased motor cortex (M1) activity, and preservation of intermittent locomotion bouts. In the revised Results section, we now provide information about the cessation of locomotor activity during the CSD wave and have added information regarding the measurement of locomotion following CSD.

      7) The authors mention the caveats of prior work where the skull is open and is thus depressurized. Is this not also the case here given there is a hole in the skull needed to induce CSD?

      Unlike previous electrophysiological studies, which involved several large openings (~2x2 mm), including at the site of the afferents’ receptive field, our study involved only a small burr hole located remotely (1.5 mm) from the frontal edge of our imaging window. As noted in our response to Reviewer #1, this burr hole (~0.5 mm diameter) was unlikely to produce inflammation at the imaging site or cause depressurization as it was sealed with a silicone plug throughout the experiment.

      8) The authors should check the %'s and the numbers in the pie chart for Figure 4. Line 224 says 53 is 22% but it does not look this way from the chart.

      The 22% reported is the percentage of afferents that developed sensitivity post-CSD among all the non-sensitive ones pre-CSD. The pie chart illustrates only afferents that were deemed sensitive before and/or after the CSD. We removed the % to clarify.

      9) Line 319 mentions that CSD causes "powerful calcium transients" in sensory neurons but it is not clear what is meant by powerful if there are no downstream effects of these transients being measured. The speculation is that these calcium transients could cause transmitter release, which would be an important observation in the absence of AP firing, but there are no data evaluating whether this is the case.

      We changed the term to “robust”

      Reviewer #3 (Public Review):

      Summary:

      Blaeser et al. set out to explore the link between CSD and headache pain. How does an electrochemical wave in the brain parenchyma, which lacks nociceptors, result in pain and allodynia in the V1-3 distribution? Prior work had established that CSD increased the firing rate of trigeminal neurons, measured electrophysiologically at the level of the peripheral ganglion. Here, Blaeser et al. focus on the fine afferent processes of the trigeminal neurons, resolving Ca2+ activity of individual fibers within the meninges. To accomplish these experiments, the authors injected AAV encoding the Ca2+ sensitive fluorophore GCamp6s into the trigeminal ganglion, and 8 weeks later imaged fluorescence signals from the afferent terminals within the meninges through a closed cranial window. They captured activity patterns at rest, with locomotion, and in response to CSD. They found that mechanical forces due to meningeal deformations during locomotion (shearing, scaling, and Z-shifts) drove non-spreading Ca2+ signals throughout the imaging field, whereas CSD caused propagating Ca2+ signals in the trigeminal afferent fibers, moving at the expected speed of CSD (3.8 mm/min). Following CSD, there were variable changes in basal GCamp6s signals: these signals decreased in the majority of fibers, signals increased (after a 25 min delay) in other fibers, and signals remained unchanged in the remainder of fibers. Bouts of locomotion were less frequent following CSD, but when they did occur, they elicited more robust GCamp6s signals than pre-CSD. These findings advance the field, suggesting that headache pain following CSD can be explained on the basis of peripheral cranial nerve activity, without invoking central sensitization at the brain stem/thalamic level. This insight could open new pathways for targeting the parenchymal-meningeal interface to develop novel abortive or preventive migraine treatments.

      Strengths:

      The manuscript is well-written. The studies are broadly relevant to neuroscientists and physiologists, as well as neurologists, pain clinicians, and patients with migraine with aura and acephalgic migraine. The studies are well-conceived and appear to be technically well-executed.

      Weaknesses:

      1) Lack of anatomic confirmation that the dura were intact in these studies: it is notoriously challenging to create a cranial window in mouse skull without disrupting or even removing the dura. It was unclear which meningeal layers were captured in the imaging plane. Did the visualized trigeminal afferents terminate in the dura, subarachnoid space, or pia (as suggested by Supplemental Fig 1, capturing a pial artery in the imaging plane)? Were z-stacks obtained, to maintain the imaging plane, or to follow visualized afferents when they migrated out of the imaging plane during meningeal deformations?

      We agree that avoiding disruption of the dura is challenging. Indeed, it took many months of practice before conducting the experiments in this manuscript to master methods for a craniotomy that spared the dura.

      We addressed the issue of meningeal irritation due to cranial window surgery in our previous work (Blaeser et al., 2023). In brief, we conducted vascular imaging using the same cranial window approach and showed no leakage of macromolecules from dural or pial vessels anywhere within the imaging window at 2-6 weeks after the surgery (Figure S1D in Blaeser et al. 2022). This data suggested no ongoing meningeal inflammation below the window. The very low level of ongoing activity we observed at baseline also suggests a lack of an inflammatory response that could lead to afferent sensitization before CSD. This is now mentioned in the Discussion.

      We conducted volumetric imaging for three main reasons: 1) To capture the activity of afferents throughout the meningeal volume. In our volumetric imaging approach, including in this work, we observed afferent calcium signals throughout the meningeal thickness (see Figure 5 in Blaeser et al. 2022). However, the majority of afferents were localized to the most superficial 20 microns (Figure S1E in Blaeser et al. 2022), suggesting that we mostly recorded the activity of dural afferents; 2) to enable simultaneous quantification of three-dimensional deformation and the activity of afferents throughout the thickness of the meninges. This allowed us to determine whether changes in mechanosensitivity could involve augmented activity to intracranial mechanical forces that produced meningeal deformation along the Z-axis of the meninges (e.g., increased intracranial pressure); 3) to provide a direct means to confirm that the afferent GCaMP fluorescent changes we observed were not due to artifacts related to meningeal motion along the Z-axis. We have now added this information to the “Two-photon imaging” section of the Methods.

      2) Findings here, from mice with chronic closed cranial windows, failed to fully replicate prior findings from rats with acute open cranial windows. While the species, differing levels of inflammation and intracranial pressure in these two preparations may contribute, as the authors suggested, the modality of measuring neuronal activity could also contribute to the discrepancy. In the present study, conclusions are based entirely on fluorescence signals from GCamp6s, whereas prior rat studies relied upon multiunit recordings/local field potentials from tungsten electrodes inserted in the trigeminal ganglion.

      As a family, GCamp6 fluorophores are strongly pH dependent, with decreased signal at acidic pH values (at matched Ca2+ concentration). CSD induces an impressive acidosis transient, at least in the brain parenchyma, so one wonders whether the suppression of activity reported in the wake of CSD (Figure 2) in fact reflects decreased sensitivity of the GCamp6 reporter, rather than decreased activity in the fibers. If intracellular pH in trigeminal afferent fibers acidifies in the wake of CSD, GCamp6s fluorescence may underestimate the actual neuronal activity.

      Previous in vivo rodent studies observed a tissue acidosis transient that peaks during the DC shift corresponding to the wavefront of the spreading depolarization, and lasting for ~ 10 min. (Mutch and Hansen, 1984). Since we observed a massive increase in afferent calcium activity with a propagation pattern resembling the cortical wave, it is unlikely that the cortical acidosis during the CSD wave strongly affected the GCaMP signal in the overlying meninges. Furthermore, if cortical acidosis non-discriminately affects the GCaMP signal, one would expect a more consistent effect across axons, reflected as a reduced calcium signal in a larger proportion of the afferents, which we did not observe. Finally, the finding that in affected afferents, decreased calcium activity lasted for > 20 min – a time point when cortical acidosis has fully recovered - points to a distinct underlying mechanism. We also note that any residual acidosis would not confound our main finding of increased calcium responses to meningeal deformation at later periods post-CSD, as acidosis should, if anything, decrease calcium-related fluorescence.

      The authors might consider injecting an AAV encoding a pHi sensor to the trigeminal ganglion, and evaluating pHi during and after CSD, to assess how much this might be an issue for the interpretation of GCamp6s signals. Alternatively, experiments assessing trigeminal fiber (or nerve/ganglion) activity by electrophysiology or some other orthologous method would strengthen the conclusions.

      Please see our comment above regarding the short duration of the pH changes post-CSD.

      N's are generally reported as # of afferents, obscuring the number of technical/biological replicates (# of imaging sessions, # of locomotion bouts, # of CSDs induced, # of animals).

      We now report the number of replicates (# of afferent, # of CSD events, and # of mice).

      Fig 1F trace over the heatmap is not explained in the figure legend. Is this the speed of the running wheel? Is it the apparent propagation rate of the GCamp6s transient through the imaging field?

      We have added to the legend of Figure 1 that the trace in panel F depicts locomotion speed.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable paper examines gene expression differences between male and female individuals over the course of flower development in the dioecious angiosperm Trichosantes pilosa. Male-biased genes evolve faster than female-biased and unbiased genes, which is frequently observed in animals, but this is the first report of such a pattern in plants. In spite of the limited sample size, the evidence is mostly solid and the methods appropriate for a non-model organism. The resources produced will be used by researchers working in the Cucurbitaceae, and the results obtained advance our understanding of the mechanisms of plant sexual reproduction and its evolutionary implications: as such they will broadly appeal to evolutionary biologists and plant biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sex-biased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of non-synonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      I have reviewed this new version and find that it now addresses some of the shortcomings of the previous manuscript. However, several important limitations still remain:

      1) The conclusion that sex-linked genes contribute relatively little to the patterns described is important and would be worth including in the manuscript briefly (not just the response letter), focusing for instance on the overall comparable proportions of sex-linked genes among male-biased (3/343=0.087%), female-biased (19/1145=1.66%) and unbiased genes (36/2378=1.51%).

      Authors’ response: Thank you for your advice. We have added these sentences in “Discussion” section (Lines 492-499).

      2) The new sentence included in the results "we also found that most of them were members of different gene families generated by gene duplication" is too vague. The motivation of this analysis is not explained, leaving the intended message unclear.

      Authors’ response: In the previous revision, as stressed by reviewer #1 “(2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results”, we added the sentence “we also found that most of them were members of different gene families generated by gene duplication” in “Relaxed selection” paragraph of the results. Accordingly, in “Discussion” section, we discussed the associations between gene duplication and relaxed selection (Lines 461-473).

      Following your suggestion, we revised the results (Lines 304-307) to “Using the RELAX model, we detected that 18 out of 343 OGs (5.23%) showed significant evidence of relaxed selection (K = 0.0184–0.6497) (Tables S9). Most of the 18 OGs are members of different gene families generated by gene duplication (Table S13)”. This makes it more coherent with the discussion.

      3) The sentences "given that dN/dS values of sex-biased genes were higher due to codon usage bias..." are very confusing. I do not understand the argument being made here. I do not see why "lower dS rates would be expected in sex-biased genes ..."

      Authors’ response: We respectfully argue that codon usage bias was positively related to synonymous substitution rates. That is, stronger codon usage bias may be related to higher synonymous substitution rates (Parvathy et al., 2022). Lower ENC values represent stronger codon usage bias. So, if ω (dN/dS) values of sex-biased genes are higher due to codon usage bias, we expect lower dS rates (That is, higher ENC values). Please refer to the relevant papers (e. g. Darolti et al., 2018; Catalan et al., 2018; Schrader et al., 2021, cited in the references of the paper).

      4) The manuscript now reports the proportion of unitigs annotated by similarity with a number of species. While this is an interesting observation, the reviewer was actually asking for a comparison between the number of unitigs (59,051) and the number of genes annotated in a typical cucurbitaceae genome. This would give an indication of the level of redundancy of the de novo assembled transcriptome.

      Authors’ response: We admit that in the final assembly, transcripts may be overestimated. We respectfully suggest that it may be inappropriate to assess the redundancy of the de novo assembled transcriptome by comparing the transcriptome sequences with the genomic sequences. An appropriate approach is to compare transcriptome sequences and transcriptome sequences among different species. For example, Hu et al., 2020 (reference cited in the paper) obtained 145,975 non-redundant unigenes from flower buds of female and male plants in Trichosanthes kirilowii. Mohanty et al. (2017) obtained 71,823 non-redundant unigenes from flower buds of female and male plants in Coccinia grandis.

      Reference:

      Mohanty JN, Nayak S, Jha S, Joshi RK. 2017. Transcriptome profiling of the floral buds and discovery of genes related to sex-differentiation in the dioecious cucurbit Coccinia grandis (L.) Voigt. Gene. 626: 395-406.

      5) From reading the text I could not understand the extent to which the permutation test actually agreed with the Wilcoxon rank sum test. The text says that the results were "almost consistent", which is too vague. This paragraph should be clarified.

      Authors’ response: We performed permutation test for sex-biased genes in floral buds and flowers at anthesis. However, only in floral buds, the results of both tests (permutation test and Wilcoxon rank sum test) are significant. Taking your suggestions in consideration, we have revised them as “Additionally, we found that only in floral buds, there were significant differences in ω values in the results of ‘free-ratio’ model (female-biased versus male-biased genes, P = 0.04282 and male-biased versus unbiased genes, P = 0.01114) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.01992 and male-biased versus unbiased genes, P = 0.02127, respectively) by permutation t test, which is consistent with the results of Wilcoxon rank sum test.(Lines 273-280)”.

      6) The paragraph on the link between codon usage and dN/dS is very unclear and quite unnecessary. I would suggest to simply remove lines 312-323.

      Authors’ response: We respectfully argue that codon usage bias is one of the most important factors for higher rates of sequence evolution. Please refer to Darolti et al. (2018), Catalan et al. (2018) and Schrader et al. (2021) (cited in the references of the paper). We retain these lines here.

      7) The discussion contains many unnecessary repeats from the introduction and results section. I suggest shortening drastically at several places, including:

      • remove lines 367-369

      Authors’ response: Thank you for your suggestion. We revised these lines to “In this study, we compared the expression profiles of sex-biased genes between sexes and two tissue types, investigated whether sex-biased genes exhibited evidence of rapid evolutionary rates of protein sequences and identified the evolutionary forces responsible for the observed patterns in the dioecious Trichosanthes pilosa (Lines 369-373)”.

      We removed the sentence “We compared the expression profiles of sex-biased genes between sexes and two tissue types and examined the signatures of rapid sequence evolution for sex-biased genes, as well as the contributions of potential evolutionary forces. (Lines 374-376)”.

      • remove lines 395-410

      Authors’ response: Here we mainly discussed the possible associations between sex-biased genes, adaptation and sexual dimorphic traits. We retain them here for clarity.

      • remove lines 449-483, as they are almost entirely repetitions of elements already made clear in the results section.

      Authors’ response: In these paragraphs, we discussed reasons that lead to relaxed purifying selection for sex-biased genes. They are coherent with the results section. We retain them to make it clearer.

      Minor comments:

      • line 146: remove "However"

      Authors’ response: We have revised it.

      • line 187: "female flower buds tend to masculinize": the meaning is obscure

      Authors’ response: We revised them as “Using hierarchical clustering analysis, we evaluated different levels of gene expression across sexes and tissues (Fig. 2C). Gene expression for female floral buds clustered most distantly from expression in female flowers at anthesis. However, expression in male floral buds clustered with expression in female flowers at anthesis, suggesting that male floral buds maybe tend to feminization in the early stages of floral development.”.

      • line 226: "we sequenced transcriptomes of T. pilosa": rather say "we used the transcriptomes described above for T. pilosa"

      Authors’ response: We have revised it.

      • line 279: the meaning of "branch-site model A and branch site model null" is still not made clear.

      Authors’ response: We have revised it.

      • line 324: change to: "we also analysed whether female-biased and unbiased genes underwent... "

      Authors’ response: We have revised it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The apicoplast, a non-photosynthetic vestigial chloroplast, is a key metabolic organelle for the synthesis of certain lipids in apicomplexan parasites. Although it is clear metabolite exchange between the parasite cytosol and the apicoplast must occur, very few transporters associated with the apicoplast have been identified. The current study combines data from previous studies with new data from biotin proximity labeling to identify new apicoplast resident proteins including two putative monocarboxylate transporters termed MCT1 and MCT2. The authors conduct a thorough molecular phylogenetic analysis of the newly identified apicoplast proteins and they provide compelling evidence that MCT1 and MCT2 are necessary for normal growth and plaque formation in vitro along with maintenance of the apicoplast itself. They also provide indirect evidence for a possible need for these transporters in isoprenoid biosynthesis and fatty acid biosynthesis within the apicoplast. Finally, mouse infection experiments suggest that MCT1 and MCT2 are required for normal virulence, with MCT2 completely lacking at the administered dose. Overall, this study is generally of high quality, includes extensive quantitative data, and significantly advances the field by identifying several novel apicoplast proteins together with establishing a critical role for two putative transporters in the parasite. The study, however, could be further strengthened by addressing the following aspects:

      Response: We thank very much the reviewer for his/her positive evaluation of our work. To address the detailed function of the transporters, in the past three months, we have re-constructed plasmids (with codon-optimized DNA sequences of the genes) for expression of the transporters in a regular expression E. coli strain (BL21DE3) and in a pyruvate import knockout E. coli strain (a gift from Prof. Kirsten Jung), to examine the transport capability in vitro. And, we have also re-constructed a new plasmid containing a new leading peptide for targeting the pyruvate sensor PyronicSF to the apicoplast in the parasite, to probe the possible substrate pyruvate. However, we did not successfully observe expression of the transporters in the above E. coli strains, and we were unable to target the sensor to the correct localization (the apicoplast) in the parasite. As a result, all efforts have led the study to the current version of manuscript on the functional identification of transporters. We will keep working on this aspect, attempting to dissect out the exact transport function of the transporters in the future. In the current manuscript, we have discussed the limitations of our study in the last part of the manuscript.

      Main comments

      1) The conclusion that condition depletion of AMT1 and/or AMT2 affects apicoplast synthesis of IPP is only supported by indirect measurements (effects on host GFP uptake or trafficking, possibly due to effects on IPP dependent proteins such as rabs, and mitochondrial membrane potential, possibly due to effects on IPP dependent ubiquinone). This conclusion would be more strongly supported by directly measuring levels of IPP. If there are technical limitations that prevent direct measurement of IPP then the author should note such limitations and acknowledge in the discussion that the conclusion is based on indirect evidence.

      Response: We thank the reviewer very much for the suggestions. We have tried to establish the measurement of IPP using a commercial company in recent months, yet we have not been successful in making the assay work. Considering the problem of indirect evidence, we have discussed this limitation in the discussion.

      2) The conclusion that condition depletion of AMT1 and/or AMT2 affects apicoplast synthesis of fatty acids is also poorly supported by the data. The authors do not distinguish between the lower fatty acid levels being due to reduced synthesis of fatty acids, reduced salvage of host fatty acids, or both. Indeed, the authors provide evidence that parasite endocytosis of GFP is dependent on AMT1 and AMT2. Host GFP likely enters the parasite within a membrane bound vesicle derived from the PVM. The PVM is known to harbor host-derived lipids. Hence, it is possible that some of the decrease in fatty acid levels could be due to reduced lipid salvage from the host. Experiments should be conducted to measure the synthesis and salvage of fatty acids (e.g., by metabolic flux analysis), or the authors should acknowledge that both could be affected.

      Response: We thank the reviewer very much for comments and suggestions. We partially agree with the comments that the depletion of transporters could affect lipids scavenged from the host cells, as endocytic vesicles are indeed derived from the parasite plasma membrane at the micropore and potentially from the host cell endo-membrane system, as demonstrated with the micropore endocytosis in our previous study (pmid: 36813769). Our latest study has addressed this by showing that the endocytic trafficking of GFP vesicles is regulated by prenylation of proteins (e.g. Rab1B and YKT6.1), depletion of which resulted in diffusion of GFP vesicles, but not disappearance of GFP vesicles in the parasites (pmid: 37548452), indicating that the vesicles (containing lipids) enter the parasites. In the current manuscript, the percentage of parasites containing GFP foci was significantly reduced in AMT1/AMT2-depleted parasites, and instead, parasites containing GFP diffusion appeared and the percentage was almost equal to the reduced level of parasites with GFP foci. These results suggested that endocytic vesicles (e.g. GFP vesicles) were continuously generated by the micropore in the parasites depleted with AMT1/AMT2, and that the vesicle trafficking was regulated by proteins modified by IPP derivatives that were derived from the apicoplast. Based on these observations, we considered that lipids in endocytic vesicles should not contribute to the reduced level of fatty acids and other lipids in parasites depleted with AMT1/AMT2. We have added in a short discussion concerning the fatty acids and lipids reduced in the parasites.

      Reviewer #2 (Public Review):

      In this study Hui Dong et al. identified and characterized two transporters of the monocarboxylate family, which they called Apcimplexan monocarboxylate 1 and 2 (AMC1/2) that the authors suggest are involved in the trafficking of metabolites in the non-photosynthetic plastid (apicoplast) of Toxoplasma gondii (the parasitic agent of human toxoplasmosis) to maintain parasite survival. To do so they first identified novel apicoplast transporters by conducting proximity-dependent protein labeling (TurboID), using the sole known apicoplast transporter (TgAPT) as a bait. They chose two out of the three MFS transporters identified by their screen based and protein sequence similarity and confirmed apicoplast localisation. They generated inducible knock down parasite strains for both AMC1 and AMC2, and confirmed that both transporters are essential for parasite intracellular survival, replication, and for the proper activity of key apicoplast pathways requiring pyruvate as carbon sources (FASII and MEP/DOXP). Then they show that deletion of each protein induces a loss of the apicoplast, more marked for AMC2 and affects its morphology both at its four surrounding membranes level and accumulation of material in the apicoplast stroma. This study is very timely, as the apicoplast holds several important metabolic functions (FASII, IPP, LPA, Heme, Fe-S clusters...), which have been revealed and studied in depth but no further respective transporter have been identified thus far. hence, new studies that could reveal how the apicoplast can acquire and deliver all the key metabolites it deals with, will have strong impact for the parasitology community as well as for the plastid evolution communities. The current study is well initiated with appropriate approaches to identify two new putatively important apicoplast transporters, and showing how essential those are for parasite intracellular development and survival. However, in its current state, this is all the study provides at this point (i.e. essential apicoplast transporters disrupting apicoplast integrity, and indirectly its major functions, FASII and IPP, as any essential apicoplast protein disruption does). The study fails to deliver further message or function regarding AMC1 and 2, and thus validate their study. Currently, the manuscript just describes how AMC1/2 deletion impacts parasite survival without answering the key question about them: what do they transport? The authors yet have to perform key experiments that would reveal their metabolic function. I would thus recommend the authors work further and determine the function of AMC1 and 2.

      Response: We thank very much the reviewer for his/her positive evaluation of our work. To address the detailed function of the transporters, in the past three months, we have re-constructed plasmids (with codon-optimized DNA sequences of the genes) for expression of the transporters in a regular expression E. coli strain (BL21DE3) and in a pyruvate import knockout E. coli strain (a gift from Prof. Kirsten Jung), to examine the transport capability in vitro. And, we have re-constructed a new plasmid containing a new leading peptide for targeting the pyruvate sensor PyronicSF to the apicoplast in the parasite, to probe the possible substrate pyruvate. However, we were unable to successfully observe expression of the transporters in the above E. coli strains, and we were unable to target the sensor to the correct localization (the apicoplast) in the parasite. As a result, all these efforts have led the study to the current version of manuscript on the functional identification of transporters. We will keep working on this aspect, attempting to dissect out the exact transport function of the transporters in the near future. In this current manuscript, we have discussed the limitations of our study in the last part of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Line 35: ...appears to have evolved...

      Line 67: remove first comma

      Line 105: thereafter or therefore?

      Line 130: define ACP

      Line 131: define TMD

      Response: We thank very much the reviewer for the suggestions, and we have revised the points in the current manuscript.

      Figure 1: more information on APT1 would be helpful for readers to interpret the results from turboID e.g., consider showing an illustration showing, according to Karnataki et al 2007 that APT1 likely occupies all 4 membranes of the apicoplast. Also, according to DeRocher et al 2012, APT1 N-term and C-term are both cytosolically exposed, at least in the outermost membrane. The orientation in the other membranes is not known.

      Response: We thank very much the reviewer for the suggestions. We analyzed the localization information of APT1 in T. gondii, based on the studies as the reviewer proposed (Karnataki, et al., 2007; DeRocher et al., 2012). The HA tag at the C-terminus of APT1 was distributed at the four membranes of the apicoplast, indicating that the topology of APT1 might be difficult to be defined at the membranes. Considering this information, we felt hesitant to clearly describe the topology in a schematic diagram about the protein APT1. Nevertheless, the TurboID tagging at the C-terminus of APT1 was an excellent model for identification of potential transporters localized at membranes of the apicoplast. We have put more information about the topology of APT1 in the manuscript, thus providing a better understanding of the proteomic results.

      Figure 2: add a space between "T." and "gondii"

      Figure 2: remove period between "Fitness" and "scores"

      Figure 2: different fonts are used within the figure. Consider using only one font such as arial. Same for Figure 4.

      Figure 2: "Fitness scores" is not bold in panel A but is bold in panel B.

      Response: We thank very much the reviewer for the suggestions. We have revised the points in the current version of the manuscript.

      Line 187: superscript -7

      Line 249: Caution should be used in interpreting two bands as being a precursor and mature product without additional experiments to establish such a relationship. Consider using the term "might" rather than "appear to". The presence of multiple bands could be due to phenomena other than proteolytic processing e.g., alternative splicing, alternative initiator codons, etc.

      Response: We thank very much the reviewer for the suggestions. We have revised the sentences in the current version of manuscript.

      Line 291: define IPP

      Figure 3E. The data points for KD strains appear to be positioned above the zero value on the y-axis. Is this correct?

      Response: We thank very much the reviewer for the suggestions. We have rechecked the figure and replaced it with the correct one.

      Figure 3 G/H legend. Please describe what a single data point represents e.g., the average of one field of view, the average of a certain number of fields of view, or something else? Are the data combined from three experiments or from a representative experiment?

      Response: We thank very much the reviewer for the suggestions. Three independent experiments were performed with at least three replicates. At least 150 vacuoles were scored in each replicate, thus resulting in at least 9 data points in total. The data points were shown with the results from each replicate.

      Line 325: define MEP and explain how it is connected to IPP

      Response: We thank very much the reviewer for the suggestions. We have provided the information in the current version of the manuscript.

      Lines 351-355: The authors refer to Figure 4D to support this statement, but presumably they mean 4E. Also, the authors use the terms C14, C16, and C18. They should more precisely use the terms myristic acid, palmitoleic acid, and trans_oleic acid if this is what they are referring to. Finally, the authors should determine if there is a statistically significant difference between levels of these fatty acids between AMT1 KD and AMT2 KD. If not, they should suggest there is an overall trend toward lower levels of these fatty acids in AMT2 KD parasites compared to AMT1 KD parasites.

      Response: We thank very much the reviewer for the suggestions. We have revised the information in the current version of the manuscript.

      Lines 363-364: The basis of this comment is unclear. Please clarify.

      Lines 369-370: the authors have not shown that the observed lower levels of fatty acids are due to synthesis, as noted above

      Response: We thank very much the reviewer for the suggestions. We have accordingly revised the information in the current version of the manuscript.

      Line 383: Should be Figure S6D

      Line 386: An entire section of the results is used to describe data that are entirely in a supplemental figure. Consider moving this data to a main figure.

      Response: We thank very much the reviewer for the suggestions. We have transferred the data to the main figure in the current version of the manuscript.

      Line 391: Consider using the term virulence instead of growth since now experiments were performed to specifically assess parasite growth in the infected mice.

      Response: We thank very much the reviewer for the suggestions. We have revised the terms in the Results section.

      Line 427: Perhaps the authors mean "...strong growth defect..." or ...strong growth impairment..."

      Line 460-461: This statement is unclear. Please explain how strong backgrounds in proteomics have made it difficult to identify apicoplast transporters. Because they are low abundance? Because they are membrane proteins?

      Response: We thank very much the reviewer for the suggestions. We have revised the corresponding sentences in the current version. The strong backgrounds in the proteomics resulted from the high activity and nonspecific labeling of biotin ligase fused with the apicoplast proteins.

      518-521: It would be helpful for non-specialists if the authors explained how pyruvate is connected to IPP biosynthesis.

      523: delete period after "Escherichia"

      548-549: "We observed similar decreases in level of the MEP biosynthesis activity upon depletion of AMT1 and AMT2..." Reword this since no experiments were done to measure MEP biosynthesis activity.

      Response: We thank very much the reviewer for the suggestions. We have accordingly revised the relevant sentences in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      • The metabolomic data on fatty acid synthesis and isoprenoid levels is relevant but cannot inform about the function of the transporter, since any protein causing loss of the apicoplast would behave in such a manner, i.e. block the apicoplast pathways.

      Response: We thank very much the reviewer for the comment. We agree with this comment. We have thus discussed these points in a subsection in the Discussion, pointing out some of the limitations in the study.

      • Currently, the manuscript fails to directly prove what AMC1 and AMC2 transports, potentially pyruvate as suggested to putatively fuel FASII and MEP/DOXP. Further experimental approaches using exogenous complementation and/or metabolomic analyses using stable isotope labelling (for example) should potentially bring light to the putative functions of AMC1/2.

      Response: We thank very much the reviewer for the comments. As described above, we attempted several approaches to find out the substrates that the AMT1 and AMT2 transports. However, we could not successfully express the proteins in E. coli strains, and we did not generate a T. gondii strain that a pyruvate sensor was properly targeted to the apicoplast. At the end of the Discussion, we have a subsection that discusses the limitations of this study. We hope that our future approaches will be able to tackle these difficulties on the substrate identification.

      Furthermore, the authors have not considered other pathways of interest, like heme or lysophosphatidic acid (LPA)n synthesis, which are two other key pathway, which may be related to AMC1/2 function. Those proposed experiments represent an important body of work, required to bring light to their metabolic functions.

      Response: We thank very much the reviewer for the comments. We thought about that, but we finally decided to mainly discuss two of the pathways that the transporters might participate in, since the transporters contain specific domains on the proteins sequences that potentially are associated with pyruvate.

      Further, the authors might have partially missed some referencing and data about the apicoplast in their introduction (and potentially to address other facets of the apicoplast metabolic functions/capacities in regards to AMC1/2 function): the introduction referencing and explanations are somehow not fully exact/precise for the part of the apicoplast and its pathway: references about the apicoplast, discovery and origin are not citing the original work (that should be Wilson et al. 1996, McFadden et al. 1996, Kohler et al. 1997,), same for the discovery of FASII and MEP./DOXP (Waller 1998, Jomaa et al...). The introduction (and the study?) lacks information about other key functions of the apicoplast: heme synthesis, lysophosphatidic acid synthesis (using FASII products). The explanations about the roles of FASII/DOXP are partial and not fully citing important references: Krishnan et al. 2020, and Amiar et al. 2020 are also key to understanding how the role of FASII is metabolically flexible depending on nutrient content. A whole part on the fact that FASII is not only dispensible but can also become essential under metabolic adaptations conditions, are missing (Botté et al. 2013, Amiar et al. 2020, Primo et al. 2021). These novel important facets of parasite biology should be mentioned as well as directly linked to the author's topic. This is more minor but could bring new ideas to the authors.

      Response: We thank very much the reviewer for the suggestions. We have revised the relevant part in the introduction.

      We are grateful for the suggestions to improve the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The evolution of transporter specificity is currently unclear. Did solute carrier systems evolve independently in response to a cellular need to transport a specific metabolite in combination with a specific ion or counter metabolite, or did they evolve specificity from an ancestral protein that could transport and counter-transport most metabolites? The present study addresses this question by applying selective pressure to Saccharomyces cerevisiae and studying the mutational landscape of two well-characterised amino acid transporters. The data suggest that AA transporters likely evolved from an ancestral transporter and then specific sub-families evolved specificity depending on specific evolutionary pressure.

      Strengths:

      The work is based on sound logic and the experimental methodology is well thought through. The data appear accurate, and where ambiguity is observed (as in the case of citruline uptake by AGP1), in vitro transport assays are carried out to verify transport function.

      Weaknesses:

      Although the data and findings are well described, the study lacked additional contextual information that would support a clear take-home message.

      We appreciate the reviewer’s positive assessment of the work, and the helpful comment to summarize the findings into a short take-home message. We chose not to discuss protein evolution theories in detail to keep the text as concise as possible. However, we do acknowledge the fact that the reader might want to see our results embedded in more context. In a revised version, we will integrate our findings more with the pertinent literature, which will show how our results align with theoretical models for protein evolution towards novel functions. We will also discuss in more detail how our laboratory results could be translated into a “natural” setting of evolution.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes evolution experiments performed on yeast amino acid transporters aiming at the enlargement of the substrate range of these proteins. Yeast cells lacking 10 endogenous amino acid transporters and thus being strongly impaired to feed on amino acids were again complemented with amino acid transporters from yeast and grown on media with amino acids as the sole nitrogen source.

      In the first set of experiments, complementation was done with seven different yeast amino acid transporters, followed by measuring growth rates. Despite most of them have been described before in other experimental contexts, the authors could show that many of them have a broader substrate range than initially thought.

      Moving to the evolution experiments, the authors used the OrthoRep system to perform random mutagenesis of the transporter gene while it is actively expressed in yeast. The evolution experiments were conducted such that the medium would allow for poor/slow growth of cells expressing the wt transporters, but much better/faster growth if the amino acid transporter would mutate to efficiently take up a poorly transported (as in the case of citrulline and AGP1) or non-transported (as in case of Asp/Glu and PUT4) amino acid.

      This way and using Sanger sequencing of plasmids isolated from faster-growing clones, the authors identified a number of mutations that were repeatedly present in biological replicates. When these mutations were re-introduced into the transporter using site-directed mutagenesis, faster growth on the said amino acids was confirmed. Growth phenotype data were attempted to be confirmed by uptake experiments using radioactive amino acids; however, the radioactive uptake data and growth-dependent analyses do not fully match, hinting at the existence of further parameters than only amino acid uptake alone to impact the growth rates.

      When mapped to Alphafold prediction models on the transporters, the mutations mapped to the substrate permeation site, which suggests that the changes allow for more favourable molecular interactions with the newly transported amino acids.

      Finally, the authors compared the growth rates of the evolved transporter variants with those of the wt transporter and found that some variants exhibit a somewhat diminished capacity to transport its original range of amino acids, while other variants were as fit as the wt transporter in terms of uptake of its original range of amino acids.

      Based on these findings, the authors conclude that transporters can evolve novel substrates through generalist intermediates, either by increasing a weak activity or by establishing a new one.

      Strengths:

      The study provides evidence in favour of an evolutionary model, wherein a transporter can "learn" to translocate novel substrates without "forgetting" what it used to transport before. This evolutionary concept has been proposed for enzymes before, and this study shows that it also can be applied to transporters. The concept behind the study is easy to understand, i.e. improving growth by uptake of more amino acids as nitrogen source. In addition, the study contains a large and extensive characterization of the transporter variants, including growth assays and radioactive uptake measurements.

      Weaknesses:

      The authors took a genetic gain-of-function approach based on random mutagenesis of the transporter. While this has worked out for two transporters/substrate combinations, I wonder how comprehensive and general the insights are. In such approaches, it is difficult to know which mutation space is finally covered/tested. And information that can be gained from loss-of-function analyses is missed. The entire conclusions are grounded on a handful of variants analyzed. Accordingly, the outcome is somewhat anecdotal; in some cases, the fitness of the variants was changed and in others not. Highlighting the amino acid changes in the context of the structural models is interesting, but does not fully explain why the variants exhibit changed substrate ranges. Two important technical elements have not been studied in detail by the authors, but may well play a certain role in the interpretation of the results. Firstly, the authors did not quantify the amount of transporter being present on the cell surface; altered surface expression can impact uptake rates and thus growth rates. Secondly, the authors have not assessed whether overexpressing wt versus variant transporters has an impact on the growth rate per se. Overexpressing transporters from plasmids is quite a burden for the cells and often impacts growth rates. Variants may be more or less of a burden, an effect that may (or may also not) go hand in hand with increased/decreased surface production levels.

      And finally, I was somewhat missing an evolutionary analysis of these transporters to gain insights into whether the identified substitutions also occurred during natural evolution under real-life conditions.

      First of all, we thank the reviewer for the attention to detail with which they have read the manuscript, and the very helpful comments on how to improve it. We will indeed take on some of the suggestions in a revised version of the text:

      Regarding the match of growth rate and uptake rate measurements, we plan to plot their correlation in a graph.

      Regarding the amount of transporter on the plasma membrane, we acknowledge that the visual representation of the fluorescence micrographs already in the text might not be enough. We therefore will quantify expression levels from said micrographs and include the information in the manuscript.

      On a similar note, we had already measured the growth rates of all transporter variant cultures in the absence of selection for amino acid uptake (i.e., in medium with ammonium as the nitrogen source; Figure 4 - Supplement figure 1). We will include the measured growth rates in the text to give an indication of what the impact of transporter overexpression is on the growth rate per se.

      Regarding the proposed analysis of natural transporter sequences, we do see the possible value in such an analysis. However, it is currently out of scope for the present study. The reasons are 1) that preliminary analyses show that the sequence similarity of functionally verified/annotated transporters is too low to reliably pinpoint a phenotype to a single residue, and 2) that we do not envision that the variants that we discovered are necessarily beneficial in a natural setting, where fine-grained regulation of amino acid transport may be more important than a broad substrate range. Regarding the generality of the insights, we do agree on the reviewer’s comment that we “only” analyzed a relatively small number of variants. However, the target of the study was not to generate high-throughput data on a large set of variants (e.g., by NGS of the whole culture) but to provide in-depth data for characterized and verified variants in a clean genetic background (i.e., verified phenotype and fitness measurements on all native and novel substrates).

      As to the mutation space, we will include an estimate in a revised version of the text. We estimate that a majority of all possible single mutants is covered in the first and second passages of the selection experiment, which is corroborated by the fact that we repeatedly find the same mutants in biological replicates.

      Regarding the mentioned loss-of-function analyses, we are unsure about what the reviewer intends with this statement at this point. To briefly summarize, we feel that our results are a good indication that transporters can evolve new functions analogously to enzymes. We explicitly do not imply that this is the only way to evolve novelty.

      Reviewer #3 (Public Review):

      The goal of the current manuscript is to investigate how changes in transporter substrate specificity emerge through experimental evolution. The authors investigate the APC family of amino acid transporters, a large family with many related transporters that together cover the spectrum of amino acid uptake in yeast.

      The authors use a clever approach for their experimental evolutions. By deleting 10 amino acid uptake transporters in yeast, they develop a strain that relies on amino acid import by introducing APC transporters under nitrogen-limiting conditions. They can thus evolve transporters towards the transport of new substrates if no other nitrogen source is available. The main takeaway from the paper is that it is relatively easy for the spectrum of substrates in a particular transporter of this family to shift, as a number of single mutants are identified that modulate substrate specificity. In general, transporters evolved towards gain-of-function mutations (better or new activities) and also confer transport promiscuity, expanding the range of amino acids transported.

      The data in the paper support the conclusions, in general, and the outcomes (evolution towards promiscuity) agree with the literature available for soluble enzymes. However, it is also a possibility that the design of these experiments selects for promiscuity among amino acids. The selections were designed such that yeast had access to amino acids that were already transported, with a greater abundance of the amino acid that was the target of selection. Under these conditions, it seems probable that the fittest variants will provide the yeast access to all amino acid substrates in the media, and unlikely that a specificity swap would occur, limiting the yeast to only the new amino acid.

      The authors also examine the fitness costs of mutants, but only in the narrow context of growth on a single (original) amino acid under conditions of nitrogen limitation. Amino acid uptake is typically tightly controlled because some amino acids (or their carbon degradation products) are toxic in excess. This paper does not address or discuss whether there might be a fitness cost to promiscuous mutants in conditions where nitrogen is not limiting.

      We are grateful for the reviewer’s insightful comments on the paper.

      Regarding the design of our experiments, we followed the concept of directed evolution as described by pioneers of the field, in which the starting point for evolving a protein is to have a basic level of that activity. In the case of AGP1, the promiscuous activity is Cit uptake. We recognize that elimination of all the already transported amino acids from the evolution media could also yield very insightful results. However, we aimed to simulate the effect of the evolutionary pressure acting in a “natural” environment, where the uptake of the specific amino acid is not initially crucial for its survival. In the case of PUT4, the experimental design was chosen to ensure the initial survival of the culture (since neither Glu nor Asp support the growth of the strain) by providing a low level of already transported amino acids. In the revised manuscript, we will state this more clearly.

      Regarding the second point, we agree that a short discussion about the potentially detrimental effects of promiscuous transporters would be beneficial for the reader. We will touch on this aspect in the revised version of the text. Indeed, our system is intentionally simplified, as we try to take regulation of transport out of the equation (e.g., by using the constitutive ADH1 promoter as opposed to a nitrogen-regulated one). In a natural setting, microorganisms encounter fluctuations of nutrient availability, necessitating tight control of nutrient transport. This is probably a major reason why microorganisms typically encode transporters with redundant specificities (i.e., promiscuous and specific ones). Otherwise, one very broad-range nutrient transporter would suffice. In our system, we artificially select for broad-range transport, which is reflected in the observed phenotypes of the evolved transporters. We expect that in a natural setting, a broad-range transporter would be a stepping stone to evolve a narrow-range transporter with a new specificity (which is actually what we see in the double-mutant AGP1-NV, with lowered fitness in original substrates and increased fitness in Cit).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our understanding of the ways in which different types of communication signals differentially affect mouse behaviors and amygdala cholinergic/dopaminergic neuromodulation. Researchers interested in the complex interaction between prior experience, sex, behavior, hormonal status, and neuromodulation should benefit from this study. Nevertheless, the data analysis is incomplete at this stage, requiring additional analysis and description, justification, and - potentially - power to support the conclusions fully. With the analytical part strengthened, this paper will be of interest to neuroscientists and ethologists.

      GENERAL COMMENTS ON REVIEWS AND REVISIONS

      Experimental design

      Here we address questions from several reviewers regarding our periods of neuromodulator and behavioral analysis. First, we recognize that the text would benefit from an overview of the experimental structure different from the narrative we provide in the first paragraphs of the Results. We now include this near the beginning for the Materials and Methods (page 17). We further articulate that the 10-minute time periods were dictated by the sampling duration required to perform accurate neurochemical analyses (and to reserve half of the sample in the event of a catastrophic failure of batch-processing samples). Since neurochemical release may display multiple temporal components (e.g., ACh: Aitta-aho et al., 2018) during playback stimulation, and since these could differ across neurochemicals of interest, we decided to collect, analyze, and report in two stimulus periods as well as one Pre-Stim control. We now clarify this in additional text in the Material and Methods (p. 24, lines 20-22; p. 26, lines 17-19). We decided not to include analyses of the post-stimulus period because this is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback.

      We also sought to clarify the meaning of the periods “Stim 1” and “Stim 2”; they are two data collection periods, using the same examplar sequences in the same order. We have added statements in the Material and Methods (p. 18, lines 4-7; Fig. caption, p. 39, lines 11-13) to clarify these periods.

      For behavioral analyses, observation periods were much shorter than 10 mins, but the main purpose of behavioral analyses in this report is to relate to the neurochemical data. As a result, we matched the temporal features of the behavioral and neurochemical analyses (p. 22, lines 17-22). We plan a separate report, focused exclusively on a broader set of behavioral responses to playback, that may examine behaviors at a more granular level.

      Data and statistical analyses

      Reviewers 1 and 3 expressed concerns about our normalization of neurochemical data, suggesting that it diminishes statistical power or is not transparent. We note that normalization is a very common form of data transformation that does not diminish statistical power. It is particularly useful for data forms in which the absolute value of the measurement across experiments may be uninformative. Normalization is routine in microdialysis studies, because data can be affected by probe placement and factors affecting neurochemical recovery and processing. Recent examples include:

      Li, Chaoqun, Tianping Sun, Yimu Zhang, Yan Gao, Zhou Sun, Wei Li, Heping Cheng, Yu Gu, and Nashat Abumaria. "A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice." Neuron (2023).

      Gálvez-Márquez, Donovan K., Mildred Salgado-Ménez, Perla Moreno-Castilla, Luis Rodríguez-Durán, Martha L. Escobar, Fatuel Tecuapetla, and Federico Bermudez-Rattoni. "Spatial contextual recognition memory updating is modulated by dopamine release in the dorsal hippocampus from the locus coeruleus." Proceedings of the National Academy of Sciences 119, no. 49 (2022): e2208254119.

      Holly, Elizabeth N., Christopher O. Boyson, Sandra Montagud-Romero, Dirson J. Stein, Kyle L. Gobrogge, Joseph F. DeBold, and Klaus A. Miczek. "Episodic social stress-escalated cocaine self-administration: role of phasic and tonic corticotropin releasing factor in the anterior and posterior ventral tegmental area." Journal of Neuroscience 36, no. 14 (2016): 4093-4105.

      Bagley, Elena E., Jennifer Hacker, Vladimir I. Chefer, Christophe Mallet, Gavan P. McNally, Billy CH Chieng, Julie Perroud, Toni S. Shippenberg, and MacDonald J. Christie. "Drug-induced GABA transporter currents enhance GABA release to induce opioid withdrawal behaviors." Nature neuroscience 14, no. 12 (2011): 1548-1554.

      However, since all reviewers requested raw values of neurochemicals, we provide these in supplementary tables 1-3. The manuscript references these table early in the Results (p. 6, lines 18-19) and in the Material and Methods (p. 27, lines 3-4)

      All reviewers commented on correlation analyses that we presented, with different perspectives. Reviewer 2 questioned the validity of such analyses, performed across experimental groups, while Reviewer 1 pointed out that the analyses were redundant with the GLM. We agree with these criticisms, and note the challenges associated with correlations involving behaviors for which there is a “floor” in the number of observations. As a result, we have removed most correlation analyses from the manuscript. The text and figures have been modified accordingly. Due these changes, we have to decline requests of Reviewer 3 to include many more such analyses. While correlation analyses could still be performed between neurochemicals and behaviors for each group, the relatively small size of each experimental group, the large number of groups, and the even larger numbers of pairings between neurochemicals and behavior, the statistical power is very low. The only correlations we utilize in the manuscript concern the interpretation of our increased acetylcholine levels.

      As part of this revision, we re-ran our statistical analyses on neuromodulators because of a calculation error in 3 animals (regarding baseline values). In a few instances, a significance level changed, but none of these changed a conclusion regarding neuromodulator changes under our experimental conditions.

      Other revisions

      INTRODUCTION: We modified the Introduction to provide both a more general framework and specific gaps in our understanding relating neuromodulators with vocal communication.

      DISCUSSION: We have added material in the first two pages of the Discussion to provide more framework to our conclusions, to address the issues of the temporal aspects of neurochemical release and behavioral observations, and to identify limitations that should be addressed in future studies.

      FIGURES: All figures are now in the main part of the manuscript. We modified most figures in response to reviewer comments. We removed neuromodulator – behavior correlations from several figures. We modified all box plots to ensure that all data points are visible. The visible data points match the numbers reported in figure captions. We brought 5-HIAA data into the main figures reporting on neuromodulator results.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain states and neurochemistry. In addition, the manuscript highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized. The authors are thoughtful about their quantitative approaches and interpretations of the data.

      That being said, the authors need to work on justifying some of their analytical approaches (e.g., normalization of neurochemical data, dividing the experimental period into two periods (as opposed to just analyzing the entire experimental period as a whole)) and should provide a greater discussion of how their data also demonstrate dissociations between neurochemical release in the basolateral amygdala and behavior (e.g., neurochemical differences during both of the experimental periods but behavioral differences only during the first half of the experimental period). The normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and could be problematic; by normalizing all data to the baseline data (p. 24), one artificially creates a baseline period with minimal variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power.

      Please see our general responses to structure of observation periods and normalization of neuromodulator data. Normalization is a common and appropriate procedure in microdialysis studies that does not alter statistical power.

      We have included a section in the Discussion concerning the temporal relationship between behavioral responses and neurochemical changes in response to vocal playback (p. 12, lines 3-17). We note where the linkage is particularly strong (e.g., ACh release and flinching). This points to a need to examine these phenomena with finer temporal resolution, but also with the recognition that the brain circuits driving a behavioral response may extend beyond the BLA.

      The Introduction could benefit from a priori predictions about the differential release of specific neuromodulators based on previous literature.

      We added some material to the Introduction to provide additional rationale for the study. However, we did not attempt to develop predictions for the range of neuromodulators that we sought to test. The literature can lead to opposite predictions for a given neuromodulator. For example, acetylcholine could be associated with both positive and negative valence. Instead, we note in the Introduction the association of both DA and ACh with vocalizations.

      The manuscript would also benefit from a description of space use and locomotion in response to different valence vocalizations.

      We have provided additional descriptions of space use and video tracking data in Material and Methods (p. 23, lines 1-6). We now report a few correlations based on these data in the Results to demonstrate that increased ACh in Restraint males and Mating estrus females was not related to the amount of locomotion (p. 9, lines 8-14).

      Nevertheless, the current manuscript seems to provide some compelling support for how positive and negative valence vocalizations differentially affect behavior and the release of acetylcholine and dopamine in the basolateral amygdala. The research is relevant to broad fields of neuroscience and has implications for the neural circuits underlying social behavior.

      Reviewer #2 (Public Review):

      Ghasemahmad et al. report findings on the influence of salient vocalization playback, sex, and previous experience, on mice behaviors, and on cholinergic and dopaminergic neuromodulation within the basolateral amygdala (BLA). Specifically, the authors played back mice vocalizations recorded during two behaviors of opposite valence (mating and restraint) and measured the behaviors and release of acetylcholine (ACh), dopamine (DA), and serotonin in the BLA triggered in response to those sounds.

      Strength: The authors identified that mating and restraint sounds have a differential impact on cholinergic and dopaminergic release. In male mice, these two distinct vocalizations exert an opposite effect on the release of ACh and DA. Mating sounds elicited a decrease of Ach release and an increase of DA release. Conversely, restraint sounds induced an increase in ACh release and a trend to decrease in DA. These neurotransmission changes were different in estrus females for whom the mating vocalization resulted in an increase of both DA and ACh release.

      Weaknesses: The behavioral analysis and results remain elusive, and although addressing interesting questions, the study contains major flaws, and the interpretations are overstating the findings.

      Although Reviewer 2 raises several valid issues that we have addressed in our response and revision, we believe that none represent “major flaws” in the study that challenge the validity of our central conclusions. In brief, we will:

      --provide enhanced description of behaviors (pp. 22-23 and Table 1)

      --clarify / modify box-plot representations of data (p 28. Lines 3-9)

      --point to our methods that describe corrections for multiple comparisons (p. 27; lines 15-16)

      --revise figures to clarify sample size (Figs. 3-6)

      Reviewer #3 (Public Review):

      Ghasemahmad et al. examined behavioral and neurochemical responses of male and female mice to vocalizations associated with mating and restraint. The authors made two significant and exciting discoveries. They revealed that the affective content of vocalizations modulated both behavioral responses and the release of acetylcholine (ACh) and dopamine (DA) but not serotonin (5-HIAA) in the basolateral amygdala (BLA) of male and female mice. Moreover, the results show sex-based differences in behavioral responses to vocalizations associated with mating. The authors conclude that behavior and neurochemical responses in male and female mice are experience-dependent and are altered by vocalizations associated with restraint and mating. The findings suggest that ACh and DA release may shape behavioral responses to context-dependent vocalizations. The study has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the BLA while an animal listens to social vocalizations; however, multiple concerns must be addressed to substantiate their conclusions.

      Major concerns:

      1) The authors normalized all neurochemical data to the background level obtained from a single pre-stimulus sample immediately preceding playback. The percentage change from the background level was calculated based on a formula, and the underlying concentrations were not reported. The authors should report the sample and background concentrations to make the results and analyses more transparent. The authors stated that NE and 5-HT had low recovery from the mouse brain and hence could not be tracked in the experiment. The authors could be more specific here by relating the concentrations to ACh, DA, and 5-HIAA included in the analyses.

      Please see our general statement regarding normalization of neurochemical data. We have added supplemental tables that shows concentrations of dopamine, acetylcholine, 5-HIAA. We do not report serotonin or noradrenalin since these were below the detection threshold.

      2) For the EXP group, the authors stated that each animal underwent 90-min sessions on two consecutive days that provided mating and restraint experiences. Did the authors record mating or copulation during these experiments? If yes, what was the frequency of copulation? What other behaviors were recorded during these experiences? Did the experiment encompass other courtship behaviors along with mating experiences? Was the female mouse in estrus during the experience sessions?

      In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing. Since the session lasted 90 minutes, more general courtship behavior was likely. However, we did not record detailed behaviors or track estrous stage for the mating experience. See p. 21, line 20-22.

      3) For the mating playback, the authors stated that the mating stimulus blocks contained five exemplars of vocal sequences emitted during mating interactions. The authors should clarify whether the vocal sequences were emitted while animals were mating/copulating or when the male and female mice were inside the test box. If the latter was the case, it might be better to call the playback "courtship playback" instead of "mating playback".

      We have modified the Results (p. 5, lines 18-20) and Materials and Methods (p. 21, lines 8-15) to clarify our meaning. We continue to use the term “mating” because this refers to a specific set of behaviors associated with mounting and copulation, rather than the more general term “courtship”. We also indicate that we based these behaviors on previous work (e.g., Gaub et al., 2016).

      4) Since most differences that the authors reported in Figure 3 were observed in Stim 1 and not in Stim 2, it might be better to perform a temporal analysis - looking at behaviors and neurochemicals over time instead of dividing them into two 10-minute bins. The temporal analysis will provide a more accurate representation of changes in behavior and neurochemicals over time.

      Please see our general response to the structuring of experimental periods. The 10-min periods are the minimum for the neurochemical analyses, and we adopted the same periods for behavioral analyses to match the two types of observations. Our repeated measures analysis is a form of temporal analysis, since it compares values in three observation periods.

      5) In Figures 2 and 3, the authors show the correlation between Flinching behavior and ACh concentration. The authors should report correlations between concentrations of all neurochemicals (not just ACh) and all behaviors recorded (not just Flinching), even if they are insignificant. The analyses performed for the stim 1 data should also be performed on the stim 2 data. Reporting these findings would benefit the field.

      Please see general comments regarding correlation analyses. We removed almost all such analyses and references to them from the manuscript based on concerns of the other reviewers.

      6) The mice used in the study were between p90 - p180. The mice were old, and the range of ages was considerable. Are the findings correlated with age? The authors should also discuss how age might affect the experiment's results.

      Our p90-p180 mice are not “old”. CBA/CaJ mice display normal hearing for at least 1 year (Ohlemiller, Dahl, and Gagnon, JARO 11: 605-623, 2010) and adult sexual and social behavior throughout our observation period. They are sexually mature adults, appropriate for this study. We decline to perform correlation analyses with age, both because this was not a question for this study and because the very large number of correlations, for each experimental group (as requested by reviewer #2), render this approach statistically problematic.

      7) The authors reported neurochemical levels estimated as the animals listened to the sounds played back. What about the sustained effects of changes in neurochemicals? Are there any potential long-term effects of social vocalizations on behavior and neurochemical levels? The authors might consider discussing long-term effects.

      We have not included discussion of long term effects of neuromodulatory release, both because our data analysis doesn’t address it (see response to Comment #10) and because we desired to keep the Discussion focused on topics more closely related to the results.

      8) Histology from a single recording was shown in supplementary figure 1. It would benefit the readers if additional histology was shown for all the animals, not just the colored schematics summarizing the recording probe locations. Further explanation of the track location is also needed to help the readers. Make it clear for the readers which dextran-fluorescein labeling image is associated with which track in the schematic.

      Based on the recent publications cited in our overall response to reviewer comments about statistical methods, our reporting of histological location of microdialysis exceeds the standard. We believe that the inclusion of all histology is unnecessary and not particularly helpful. Raw photomicrographs do not always illustrate boundaries, so interpretation is required. However, we added a second photomicrograph example and we identified which tracks correspond to these photomicrographs (see Figure 2; now in main body of manuscript).

      9) The authors did not control for the sounds being played back with a speaker. This control may be necessary since the effects are more pronounced in Stim 1 than in Stim 2. Playing white noise rather than restraint or courtship vocalizations would be an excellent control. However, the authors could perform a permutation analysis and computationally break the relationship between what sound is playing and the neurochemical data. This control would allow the authors to show that the actual neurochemical levels are above or below chance.

      We considered a potential “control” stimulus in our experimental design. We concluded, based on our previous work (e.g., Grimsley et al., 2013; Gadziola et al., 2016), that white noise is not or not necessarily a neutral stimulus and therefore the results would not clarify the responses to the two vocal stimuli. Instead, we opted to use experience as a type of control. This control shows very clearly that temporal patterns and across-group differences in neurochemical response to playback disappear in the absence of experience with the associated behavior.

      10) The authors indicated that each animal's post-vocalization session was also recorded. No data in the manuscript related to the post-vocalization playback period was included. This omission was a missed opportunity to show that the neurochemical levels returned to baseline, and the results were not dependent on the normalization process described in major concern #1. The data should be included in the manuscript and analyzed. It would add further support for the model described in Figure 6.

      We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.

      11) The authors could use a predictive model, such as a binary classifier trained on the CSF sampling data, to predict the type of vocalizations played back. The predictive model could support the conclusions and provide additional support for the model in Figure 6.

      We recognize that a binary classifier could provide an interesting approach to support conclusions. However, we do not believe that the sample size per group is sufficient to both create and test the classifier.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • Introduction: It would be useful to set up an experimental framework before delving into the results. What are the predictions about specific neuromodulators based on previous literature?

      Because this narrative is laid out in the first two paragraphs of the Results, which immediately follow the Introduction, we believe that additional text in the Introduction on the experimental framework is redundant. As stated above, detailing predictions for a range of neuromodulators would make for a long and not particularly illuminating Introduction. We instead have related our findings to more general understanding of DA and ACh in the Discussion.

      • There really isn't a major difference in stimuli during the "Stim 1" and "Stim 2" phases, and it's not clear why the authors divided the experimental period into two phases. Therefore, the authors need to justify their experimental approach. For example, the authors could first anecdotally mention that behavioral responses to playbacks seem to be larger in the first half of the playbacks than during the second half, therefore they individually analyzed each half of the experimental period. Or adopt a different approach to justify their design. Overall, the analytical approach is reasonable but it is currently not justified.

      See general comment for analysis periods. As noted, we clarified these issues in several locations with Materials and Methods (pp. 24, lines 20-22; p. 26, lines 17-19). We also sought to clarify the meaning of the periods “Stim 1” and “Stim 2”; they are two data collection periods, using the same examplar sequences in the same order. We have added statements in the Material and Methods (p. 18, lines 4-7; Fig. caption, p. 39, lines 11-13).

      • The normalization of neurochemical data seems problematic and unnecessary. By normalizing all data to the baseline data (p. 24), one artificially creates a baseline period with minimal variation (all are "0"; Figures 2, 3 & 5) and this has implications for statistical power. Because the analysis is a within-subjects analysis, this normalization is not necessary for the analysis itself. It can be useful to normalize data for visualization purposes, but raw data should be analyzed. Indeed, behavioral data are qualitatively similar to the neurochemical data, and those data are not normalized to baseline values.

      Please see our general comment on this issue. We believe normalization does not affect statistical power and is both the standard way and an appropriate way to analyze microdialysis results. We include concentrations of ACh, DA, and 5-HIAA in supplementary tables?

      • The authors should include a discussion (in the Discussion section) of how behavior and neurochemical release are associated during the first half of the experimental session but not in the second half (e.g., differences in Ach and DA release between mating and restraint groups during stim 1 and 2, but behavioral differences only during stim 1).

      We have included a section in the Discussion concerning the temporal relationship between behavioral responses and neurochemical changes in response to vocal playback. We note that the linkage is particularly strong in some cases (e.g., ACh release and flinching). This points to a need to examine these phenomena with finer temporal resolution, but also with the recognition that the brain circuits driving a behavioral response may extend beyond the BLA.

      Minor comments:

      • Keywords: add "serotonin" (even though there are no significant differences on 5-HIAA, people interested in serotonin would find this interesting).

      Added to keywords list.

      • Do the authors collect data on the vocalizations of mice in response to these playbacks?

      We monitored vocalizations during playback, noting that vocalizations–especially “Noisy” vocalization–were common. However, we did not record vocalizations and are therefore unable quantify our observations.

      • First line of page 7: readers do not know about "stim 1" and "stim 2". Therefore, the authors need to describe their approach to analyzing behavior and neurochemical release.

      We first introduce these terms earlier, citing Figure 1D,E. We have added some additional wording for further clarification. page 7, lines 4-5.

      • Make sure citations are uniformly formatted (e.g., Inconsistencies in: "As male and female mice emit different vocalizations during mating (Finton et al., 2017; J. M. S. Grimsley et al., 2013; Neunuebel et al., 2015; Sales (née Sewell), 1972)").

      We have reviewed and corrected citations throughout the manuscript.

      • Last paragraph of page 7: "attending behavior" has not been defined yet.

      Table 1 contains our description of the behaviors analyzed in this study. We have now inserted a reference to Table 1 earlier in the Results (p. 6, line 12).

      • Figure 2E and 3G: I find these correlations to be redundant with the GLMs. This is because the significant relationship is likely to be driven by group differences in behavior and in neurochemical release.

      Please see general comments regarding correlation analyses. We removed such analyses and references to them from the manuscript.

      • Page 2, 2nd paragraph, 2nd sentence: this paragraph seems to be rooted in comparing and contrasting experienced and inexperienced mice, so there should be explicit comparisons in each sentence. For example, the 2nd sentence should read: "Whereas EXP estrus females demonstrated increased flinching behaviors in response to mating vocalizations, INEXP ....". This paragraph overall could use some refining.

      We believe this refers to page 9. We have revised the paragraph to clarify our findings (Beginning p. 9, line 23).

      • Page 9: "Further, there were no significant differences across groups during Stim 1 or Stim 2 periods. These results contrast sharply with those from all EXP groups, in which both ACh and DA release changed significantly during playback (Figs. 2C, 2D, 3E, 3F)." While I understand their perspective, this is misleading because changes were only observed during the Stim 1 period.

      We have slightly revised the wording in this paragraph, because the restraint males did not show significant ACh decreases. However, we do not believe our statements mislead readers just because some changes are observed in only one of the stimulation periods (p 10, lines 13-16).

      • Last paragraph of page 14: it would be useful to mention the increase in flinching in experienced females in response to mating vocalizations.

      We have added a sentence in this paragraph relating flinching in estrus females to increased ACh (p. 15, lines 18-20).

      • Was there a full analysis of locomotion in response to playbacks? I see that locomotion was correlated with neurochemical release but was it different in response to different stimuli? Were there changes to the part of the arena that mice occupied in response to restraint vs. mating vocalizations? Given their methods section, it would be useful for the authors to mention the results of the analyses of these aspects of movement.

      We have provided additional descriptions of space use and video tracking data in Material and Methods (p. 23, lines 1-6). We now report additional results associated with these analyses (p. 8, lines 13-15; p. 9, lines 8-14).

      • I believe that each experimental mouse only heard one of the stimuli (given the analytical approach). Because it is plausible to measure neurochemical release in response to both types of stimuli, I encourage the authors to be more explicit about this aspect of the experimental design (e.g., mention in Results section).

      Sentence modified to read: “Each mouse received playback of either the mating or restraint stimuli, but not both: same-day presentation of both stimuli would require excessively long playback sessions, the condition of the same probe would likely change on subsequent days, and quality of a second implanted probe on a subsequent day was uncertain.” (p. 7, lines 5-9).

      • Figure 1A and 1B: add labels to the panels so readers don't have to read the legend to know what spectrogram is associated with what context.

      We added these labels to Figure 1.

      • Table 1: in the definition of "still and alert", should this mention "abrupt attending" instead of "abrupt freezing"? The latter isn't described.

      Yes, we intended “abrupt attending”, and now indicated that in Table 1

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      • The authors report they performed manual behavioral analysis, and provide a table defining the different behaviors. However, it remains unclear how some of these behaviors were detected (such as still-and-alert events). A thorough description of the criteria used to define these events needs to be provided.

      We have modified some descriptions of manually analyzed behaviors in Table 1, and have added additional description of how we developed this set of behaviors for analysis in the study (pp. 22-23).

      • The box plots do not appear to represent the "minimum, first quartile, median, third quartile, and maximum values." as specified on page 24 (Methods). Indeed, the individual data points sometimes do not reach the max or min of the bar plot, and sometimes are way beyond them.

      We used the “inclusive median” function in Excel to generate final boxplots. These boxplots will sometimes result in a data point being placed outside of the whiskers. SPSS considers these to be “outliers”, but our GLM analysis includes these values. We describe this in Data Analysis section of Materials and Methods (p. 28, lines 3-9)

      • Some of the data are replicated in different Figures: Figure 2A and Figure 3C. While this is acceptable, the authors did not correct for multiple comparisons (dividing the p value by the number of comparisons).

      Our analysis included corrections for multiple comparisons, as we have indicated on p. 27, lines 15-16.

      • Overall, the sample sizes are too small (for example in Figure 3, non-estrus females are at n=3), and are different in experiments where they should be equal (Figure 2B: mating stim 1 is at n=5 and mating stim 2 is at n=3).

      We apologize that sample sizes were not properly displayed in figures. Please note that sample sizes are identified in the figure captions. For neuromodulator data, all sample sizes are at least 7. For behavioral data, the minimum sample size is 5. We have revised Figures 3-6 to ensure that all data points are visible.

      • It remains unclear why the impact of mating vocalizations has been tested only in males.

      We assume the reviewer meant that only males were tested in restraint. We now indicate that our preliminary evidence indicated no difference in behavioral responses to restraint vocalization between males and females, so we opted to perform the neurochemical analysis for restraint only in males (page 22 lines 4-5). If there were no limitations to time and cost, we would have preferred to test responses to restraint in females as well. We note that such inclusion would have added up to 4 experimental groups (estrus and non-estrus groups in both EXP and INEXP groups).

      • The correlation between the number of flinching and ACh release changes (Figure 2E) visually appears to be opposite between mating and restraint playbacks. The authors should perform independent correlations for these 2 playbacks.

      Please see general comments regarding correlation analyses. We removed such analyses and references to them from the manuscript.

      • The authors state that their findings "indicate that behavioral responses to salient vocalizations result from interactions between sex of the listener or context of vocal stimuli with the previous behavioral experience associated with these vocalizations.". However, in male mice, they do not report any difference in previous experience on flinching for both restraint and mating sounds, as well as no difference in rearing for the restrain sounds (Figure 4A-B). Thus, the discussion of these results should be completely revisited.

      We revised the paragraph in question (p. 9, line 22 through p. 10, line 9). For instance, we note that significant differences between EXP male-mating and male-restraint flinching do not exist between the INEXP groups. We believe that the last sentence correctly summarizes findings described in this paragraph.

      • For serotonin experiments in Figure S2 there are strong outliers (150% increase in 5HIAA release). Did the authors correlate these levels with the behavior of the animals?

      Outliers are identified by the Excel function that generated the boxplots, but we have no reason to consider these as outliers and exclude them. As noted above, we have clarified that these “outliers” are the result of the Excel function in the Materials and Methods (p. 28, lines 3-9) and we have revised the plotting of data points

      Minor comments:

      • Mating vocalization playback is mainly emitted by males, thus, instead of a positive valence signal, this could also be interpreted as a competitive signal to other males.

      There is support in the literature for viewing our mating stimulus as having positive valence. Gaub et al., 2016 describe the emission of stepped calls, lower frequency harmonics, and increased sound level as indicators of “positive emotion”. We have shown (Grimsley et al, 2013) that the female LFH vocalization can be highly attractive to male mice, under the right conditions, indicating something like “sex is happening”. The inclusion of both the male and female vocalizations in our stimuli was a key piece of our experimental design, based on our understanding of the contributions of both vocalizations to the meaning of the overall acoustic experience.

      • Figure 1 should include panel titles.

      No change. This information is available in the Figure caption.

      • n=31 should be indicated in the EXP group.

      We’re not sure where the reviewer is referring to this value.

      • The color legend of Figure 1E is absent, making the Figure not understandable.

      We added text in the Figure 1 caption to indicate that each color represents a different exemplar. We don’t think a legend provides additional useful information.

      • The point of making two blocks (stim 1 and stim2) should be stated more clearly.

      Please see general statement regarding experimental blocks. We have modified our description of these in an Experimental overview section in the Material and Methods.

      • Including raw data of micro-dialysis in the supplementary figures would allow assessment of the variability and quality of the measurements.

      We have added concentrations of neurochemicals in supplemental tables 1-3.

      • Baseline (prestimulus) number of flinch and rearing should systematically be indicated (missing in Figure 4).

      The focus in this figure is on the differences that occur in Stim 1 values. There are no differences between EXP and INEXP animals of any group during the Pre-Stim period. We now state that in the Figure 4 caption.

      • Discussion: "increase in AMPA/NMDA currents". We believe the authors are referring to the ratio of AMPA to NMDA currents. This sentence should be reformulated.

      These are modified to refer to “… the AMPA/NMDA current ratio…” in two locations in the Discussion (p. 14, lines 8-9; p. 15, line 4)

      • Overall the discussion is very speculative and should rely more on the data.

      We believe that the Discussion provides appropriate speculation that is based on our experimental data and previous literature. We have added a paragraph to identify limitations of our findings and recommendations of future experiments to resolve some issues (p. 12, lines 3-17)

      Reviewer #3 (Recommendations For The Authors):

      Minor concerns:

      1) The authors stated that USVs are most likely to be emitted by males, and LFH are likely to be emitted by females. However, Oliveira-Stahl et al. 2023, Matsumoto et al. 2022, Warren et al. 2018, Heckman et al. 2017, Neunuebel et al., 2015 showed that females also emit USVs. The authors should mention that USVs are emitted by both males and females and discuss how the sex of the vocalizing animal (both males and females) can influence neuromodulator release.

      The reviewer slightly mis-stated the wording of our text, changing the meaning significantly. Our wording is “These sequences included ultrasonic vocalizations (USVs) with harmonics, steps, and complex structure, mostly emitted by males, and low frequency harmonic calls (LFHs) emitted by females (Fig. 1A,C)…” This phrasing is correct and carefully chosen. The Discussion in Oliveira-Stahl et al 2023 (p. 10-11) supports our statement: “The exact fraction of USVs emitted by females as concluded in all previous studies on dyadic courtship has varied, ranging from 18%, 17.5%, and 16% to 10.5% in the present study…”.

      2) The authors should explain why ECF from BLA was collected unilaterally from the left hemisphere.

      p. 23, lines 9-11: We inserted a sentence to explain why we targeted the BLA unilaterally. “Since both left and right amygdala are responsive to vocal stimuli in human and experimental animal studies (Wenstrup et al., 2020), we implanted microdialysis probes into the left amygdala to maintain consistency with other studies in our laboratory..” Beyond that, the choice was arbitrary.

      3) The authors said each animal recovered in its home cage for four days before the playback experiment. A 4-day period may not be sufficient for every animal to recover from surgery, so the authors should describe how a mouse's recovery was assessed.

      p. 23, lines 20-23: We provide more description about the recovery and how it was assessed. Except for a few animals that were not included in the experiments, all animals recovered within 4 days.

      4) The authors stated that each animal was exposed to 90-min sessions with mating and restraint behaviors in a counterbalanced design. This description for Figure 1D should also include the duration of the mating and restraint experience.

      The Results that immediately precede citation to this figure include this information.

      5) The authors stated, "Data are reported only from mice with more than 75% of the microdialysis probe implanted within the BLA". What are the implications of having 25% of the probe outside the BLA? The authors should shed more light on this by discussing this issue as it relates to the findings and commenting on where the other 25% of the probe was located.

      We inserted a sentence to explain the rationale for this inclusion criterion. “We verified placement of microdialysis probes to minimize variability that could arise because regions surrounding BLA receive neurochemical inputs from different sources (e.g., cholinergic inputs to putamen and central amygdala).” (p. 25, lines 21-23).

      All brain regions that surround BLA, dorsal, medial, ventral, or lateral, could have been sampled by the “other” 25%. Some of these, e.g., the central amygdala or caudate-putamen, have different sources of cholinergic input that may not have the same release pattern. We do not think it is worthy of further speculation in the Discussion. Due to the high cost of the neurochemical analysis, we often did not process the neurochemistry data if histology indicated that a probe missed the BLA target.

      6) The authors confirmed that the estrus stage did not change during the experiment day by evaluating and comparing estrus prior to and after data collection. This strategy was a fantastic experimental approach, but the authors should have discussed the results. How did the results the authors included change when the females were in estrus before but not after data collection? What percentage of females started in estrus but ended in metestrus? Assuming that some females changed estrus state, were these animals excluded from the analyses?

      All animals were in the same estrus state at the beginning and end of the playback session.

      7). Authors cite Neunuebel et al., 2015 for the sentence "As male and female mice emit different vocalizations during mating". However, Neunuebel et al., 2015 showed vocalizations emitted during chasing--not mating. If mating is a general term for courtship, then this reference is appropriate, but see major concern #3.

      In the Results (p. 8, line 5), we changed the phrasing to “courtship and mating” to include the Neunubel et al study.

      As we indicate in our response to Public Comment #3, we have modified the Results (p. 5, lines 18-20) and Materials and Methods (p. 21, lines 8-15) to clarify our meaning. We continue to use the term “mating” because this refers to a specific set of behaviors associated with mounting and copulation, rather than the more general term “courtship”. We also indicate that we based these behaviors on previous work (e.g., Gaub et al., 2016).

      8) Authors interpret Figure 3F as DA release showed a "consistent" increase during mating playback across all three experimental groups. However, the increase in the estrus female group is inconsistent, as seen in the graph. This verbiage should be reworded to describe the data more accurately.

      p. 8, line 23 “consistent” was deleted.

      9) In all the box plots, multiple data points overlay each other. A more transparent way of showing the data would be adding some jitter to the x value to make each data point visible. The mean (X's) in Figure 3D (pre-stim mating and mating estrus) are difficult to see, as are all the data points in mating non-estrus. Adding all the symbols to the figure legend or a key in the figure instead of the method section would aid the reader and make the plots easier to interpret

      We have revised the boxplots to ensure that all data points are visible.

      10) Some verbiage used in the discussion should be toned down. For example, "intense" experiences and "emotionally charged" vocalizations should be removed.

      We have not changed these terms, which we believe are appropriate to describe these experiences and vocalizations.

      11) The authors include "Emotional Vocalizations" in the title. It would be beneficial if the authors included more detail and references in the introduction to help set up the emotional content of vocalizations. It may benefit a broader readership as typically targeted by eLife.

      We now cite Darwin and some more recent publications that articulate the general understanding that social vocalizations carry emotional content.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Recommendations For The Authors):

      While the details are mostly well-explained, I think that the authors could better bring forth the goals and potential usages of hippocampome.org overall.

      I think that this is a great and helpful tool that can leverage various and detailed cellular experimental studies that are out there in the literature to garner potential insights, direct future experimental studies, observe/classify experimental 'differences' (e.g., the deep and superficial pyramidal studies they mention) and so on. Say that one gets some mechanistic insight from more abstract theoretical models, hippocampome can be used to determine whether the experimental data where available is supportive of the theory. They also describe CA3 model and grid cells. While I am not suggesting that the authors completely re-organize the manuscript, I did feel that the last section 'potential applications...' could have perhaps been brought forth earlier (in a summarized form) for the reader/user to better appreciate hippocampome - indeed it is line 288 that should be near the beginning of the paper I thought.

      We thank the Reviewer for the suggestion. We have now included a summary of the simulation readiness of Hippocampome.org in the Introduction.

      I thought the 'application' paragraph (starting line 288) needed expansion to appreciate - I did not have a chance to look at the cited papers in that section - but maybe 2 paragraphs, one on CA3 and the other on grid cells, with a few more sentences of goal/context and tool usage details could be provided?

      We thank the Reviewer for the suggestion. We have added expanded paragraphs describing the simulation work on CA3 and grid cells.

      The authors start their Discussion by mentioning other resources (e.g. blue brain) in comparison. I thought that this was not too helpful without a bit more expansion about these other resources and what in particular is comparable. For example, the blue brain project is different in that it does not mine the literature per se (I think)? But then I am not sure of the extent of the comparison that the authors intend with blue brain and the other mentioned resources.

      Thank you for the helpful suggestion. We have now expanded upon the paragraph to draw more explicit parallels and contrasts among the various projects, in particular between the Blue Brain Project and Hippocampome.org.

      Minor comments

      • Fig 3D caption missing

      Thank you for pointing this out. We have now amended the figure caption.

      • Fig 5A line 211-12 refers to v2.0 but Fig 5 caption says v1.0?

      We apologize for the confusion. We have now added text clarifying the V1.X relevant descriptions around Figure 5.

      • Fig 6A confusing with thin and thick arrows and direction?

      We apologize for the confusion. We have re-colored the thick arrows orange to emphasize the fact that they are feeding directly into the spiking neural simulations.

      • Line 260 - not sure what this means - how is importance defined?

      We apologize for the confusion. We have now added text clarifying that “importance” refers to the role the neuron type plays in the functioning circuitry of the hippocampal formation.

      • CARLsim vs Brian/NEST in choosing - maybe a sentence or two for rationale

      Thank you for the suggestion. We have now added a sentence explaining the selection of CARLsim. CARLsim was selected due to its ability to run on collections of GPUs. CARLsim was the only simulator with this capability at the time the simulation work was being planned, and the power of a GPU supercomputer was needed to simulate the millions of neurons that comprise a full simulation of the complete hippocampal formation.

      • Fig 9 mv should be mV, and the voltage values specified there refer to which dash?

      Thank you for pointing these situations out. We have amended the millivolts label and have made changes to the figure to help clarify which specific tick marks are being labeled.

      Reviewer #3 (Recommendations For The Authors):

      Compliments to the authors on this nicely organized and structured presentation of V 2.0 of hippocampome.org. The paper is well prepared giving a useful short summary of the history of hippocampome for the newcomers and refreshing the memory of users, switching to highlighting the new data additions, why these are relevant and how these complement the existing database, and opening up to new applications. The added potential is well illustrated and in addition, the authors provide numerical information on the usage of this amazing resource. I enjoyed roaming around in the new version, which was made available for reviewers, and although it has been a while since I worked with the system, the new version is easy to work with. I have not had the time to use it extensively so cannot comment in detail but based on the long experience of the authors and their support team, I trust that version 2 will be almost not completely flawless; however that will for sure become clear when it is released.

      One could always wish for more, disagree, or even criticize choices made to cluster neurons, divide areas, and so forth, though in my view that does not contribute to what the resource has to offer. Having said this, the authors might consider addressing briefly issues about differences in the nomenclature used in original descriptions and how they handled the translation into their nomenclature. To mention one that is constantly being debated: how does one define the border between SMo and SMi.

      Thank you for the suggestion. We have added text to the Introduction that addresses the nomenclature issue, as presented in Hamilton et al. (2017), and provide a definition for SMo and SMi.

      Another confusing issue is presented by layers in the entorhinal cortex or its subdivisions (how many and how are these defined). So, some remarks for newcomers in the field who might use the database without spending too much energy to read the original data, might be useful.

      Thank you for the suggestion to clarify this situation pertaining to the entorhinal cortex. Often, we have assumed the authors’ own definitions of the layers and subdivisions (medial and lateral), when naming neuron types. When our name is a hybrid of two published names that include both medial and lateral neurons, our name is prefixed by a simple EC, rather than by MEC or LEC.

      As noted, the authors present version 2 nicely and comprehensibly and I have only a few additional comments, meant to further improve the already high quality of the paper.

      1) The figures, nice as they are, are incredibly information-dense, so they require serious study to get the details; the legends do help, but the many abbreviations coming from totally different fields make it challenging to keep track of them while reading. This is a pity since there is a lot of new information in this version of the dataset, compared to previous versions and the authors overall succeed in emphasizing what is new and why this might be of use/importance.

      So a few suggestions: i) add relevant/most important abbreviations to the legends of the individual figures; ii) introduce all abbreviations upon first use and do not simply refer to the table in the methods. Interestingly, even the authors lose track in the introduction where they use BICCN in line 43 and refer to the abbreviation list, though the full name is given two lines below.

      We apologize for the confusion. We have amended the main text to clarify abbreviations. We have added the abbreviation definitions to the captions of the figures, and in some instances, removed the abbreviations from the figures altogether where space allowed.

      2) Figure 3 and even more so figure 5 depend strongly on the color differences red/green; please change since generally red/green is no longer used for obvious reasons.

      Thank you for pointing this out. We have switched the fonts in Figure 3 to black (excitatory) and gray (inhibitory) to match our previous publication. We have also changed the color schemes in Figure 5 to avoid red and green.

      Reviewer #3 commented on the complexity of our figures and how the figures are information dense. To partially address this, we have decided to remove panel A2 of Figure 3. It was originally meant to emphasize where the information came from to add new axonal projections to two v1.0 neuron types; however, it is not necessary to make the point in the illustration. Thus, we have removed the panel and amended the caption for Figure 3A to include the cited reference.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for The Authors):

      1) While the specificity of the observed muscle phenotypes seems clear, the subsequent molecular analysis of Numb protein interactors does not seem to consider the potential involvement of Numb-like. The authors should demonstrate the relative expression levels of Numb and Numb-like in the models used, and establish the specificity of the antibodies used in IP, western and staining experiments.

      Response: Perhaps the most convincing evidence that the anti-Numb antibody did not pull down Numb-like is that this protein was not detected among immunoprecipitated protein complexes pulled down by the anti-Numb antibody used. The antibody used in the immunoprecipitation was validated by the supplier and was previously reported to immunoprecipitate Numb [1, 2]. We previously demonstrated that a morpholino against Numb mRNA almost completely eliminated the band detected by this antibody and that this band was at the expected molecular weight [ref]. In our hands, mRNA levels for Numb-like in skeletal muscle are 5-10-fold lower than those for Numb [3]. We have been unable to detect Numb-like protein in healthy adult skeletal muscle by immunoblotting or immunofluorescence staining. Taking all of these findings together, it seems unlikely that the antibodies used for immunoprecipitating Numb-protein complexes pulls down Numb-like.

      2) The authors use PCR to investigate Numb isoform expression and conclude that p65 is likely the dominant protein isoform expressed. While this agrees with the single band observed in Supp Figure 4A, a positive control for exon 9 excluded and included isoforms in the PCR reactions would strengthen this conclusion.

      Response: The amplicons shown in Supplemental 4 were sequenced. The clones corresponded to the isoforms with the exon 3 present or removed. No amplicons containing exon 9 were detected. The following sentence was added to the Analysis of Splice Variants section of Methods to address this point: “PCR products were cloned using the TOPO TA cloning system (ThermoFisher) and multiple resulting clones were sequenced to confirm that the expected products were generated.”

      3) PCR analysis of total Numb and Numb-like expression levels are not shown. This is important given the specificity of the Numb antibodies used for AP-MS experiments are not described and some Numb antibodies are well known to also recognize Numb-like. Two different Numb antibodies were used for Western and immunoprecipitation but the specificity for Numb and Numb-like is not described. In particular, does the antibody used in the AP-MS experiment recognize both Numb and Numb-like? Supplementary Table 1 does not list Numb or Numb-like, but presumably peptides were identified?

      Response: As noted above, the specificity of anti-Numb antibodies was confirmed in previous studies [3]. Importantly, Numb-like mRNA levels are 5-10-fold lower than Numb mRNA, and NumbL protein is undetectable in healthy adult skeletal muscle by Western. The physiology data reported in this manuscript supports the conclusion that a single KO of Numb is sufficient to recapitulate the physiological phenotype of Numb/Numb-like KO . We therefore reason that the majority, if not all, of the physiological contribution of these proteins to muscle contractility due to Numb (Fig. 1).

      4) The validation experiment used the same Numb antibody for immunoprecipitation, immunoblotted with Septin 7. A reciprocal IP of Septin 7 and blotted with Numb should be performed. In addition, a Numb-like IP or immunoblot would also be useful to demonstrate the specificity of the interaction. Efforts to map the interaction between Numb and Septin 7 would be useful to demonstrate specificity of the interaction and strategies to establish the biological relevance of the interaction.

      Response: We agree with the reviewer and attempted several IPs with anti-Septin7 antibodies. These were unsuccessful. In a new collaboration, Dr. Italo Cavini (University of Sao Paulo) has used machine-learning-based approaches to model binding between Numb and several septins, including Septin 7. The analysis suggests that binding of Numb with septins involves a domain of Numb that has not yet been ascribed a function in protein-protein interactions. These computational predictions require experimental validation but provide rational starting point for experiments to define the domains responsible for these interactions. Such experiments were included in our recent NIH R01 renewal application. We hope to be able to report on results of confirmatory experiments of these computational models in the future.

      5) Other septins were identified in the AP-MS experiment and might have been anticipated to also be disrupted by Numb/Numb-like deletion. Are these septins known to interact in a complex?

      Response: This is an excellent question. Septins have conserved motifs providing a clear reason to imagine that many different mammalian septins could directly interact with Numb. Septins form heterooligomers consisting of complexes formed by 3, 6 or 8 septins [4]. It is likely that when Numb binds to one septin, antibodies against Numb pull down other septins present in the septin oligomer to which Numb is bound. The following paragraph was added to the discussion: “Our findings suggest that Numb may also interact with other septins such as septins 2, 9 and 10, which were also identified with a high level of confidence as Numb interacting proteins by our LC/MS/MS analysis. Our data to not allow us to determine if Numb binds directly to these septins. Septins contain highly conserved regions, and, consequently, if one such region of septin 7 interacts with Numb, then many septins would be expected to directly bind Numb through the same domain. However, because septins self-oligomerize, is possible that when Numb binds to one septin, antibodies against Numb could also pull down other septins present in the septin oligomer to which Numb is bound regardless of whether or not they are also bound by Numb. “

      6) The text for Figure 5 describes analysis of Septin localization in inducible Numb/Numb-like cKO muscle, but the figure indicates only Numb is knocked out. Please clarify.

      Response: We apologize for this oversight on our part. The Legend to Figure 5 has been corrected.

      7) Supplementary Figure 2 seems to show that TAM treatment increases Numb expression. Please clarify. Also, please correct reference 9.

      Response: The figure was incorrectly labeled. We apologize for this oversight and have corrected the figure in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Overall, the manuscript is well written. I do have a few minor issues/concerns, which are detailed below.

      Abstract: Please be a little more specific regarding which where the tissue came from (i.e. humans, mice, cell) when referring to your previous studies.

      Response: The abstract has been revised as requested.

      Introduction: Please be more specific regarding the technique used for detecting ultrastructural changes. I assume it was done with TEM, but the reference is listed as an "invalid citation" in your reference list.

      Response: The introduction was revised as requested and the citation was updated to reference a valid citation.

      Methods / Numb Co-Immunoprecipitation: Please indicated the level of confluency of the C2C12 cells as this will alter gene expression.

      Response: As indicated in the updated Methods section, confluent C2C12 cells were switched to differentiation media (low serum) for seven days. When harvested, the cells had differentiated and fused into myotubes.

      Methods / Immunohistochemical Staining: The first sentence needs to be edited regarding plurality and grammar.

      Response: Thank you for this comment. The text was revised accordingly.

      Results / GWAS and WGS Identify...: Please spell out phosphodiesterase (I assume) for PDE4D

      Response: This change was incorporated in the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports jAspSnFR3, a biosensor that enables high spatiotemporal resolution of aspartate levels in living cells. To develop this sensor, the authors used a structurally guided amino acid substitution in a glutamate/aspartate periplasmic binding protein to switch its specificity towards aspartate. The in vitro and in cellulo functional characterization of the biosensor is convincing, but evidence of the sensor's effectiveness in detecting small perturbations of aspartate levels and information on its behavior in response to acute aspartate elevations in the cytosol are still lacking.

      We thank the reviewers and editors for the detailed assessment of our work and for their constructive feedback. Most comments have now been experimentally addressed in the revised manuscript, which we feel is substantially improved from the initial draft.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Davidsen and coworkers describe the development of a novel aspartate biosensor jAspSNFR3. This collaborative work supports and complements what was reported in a recent preprint by Hellweg et al., (bioRxiv; doi: 10.1101/2023.05.04.537313). In both studies, the newly engineered aspartate sensor was developed from the same glutamate biosensor previously developed by the authors of this manuscript. This coincidence is not casual but is the result of the need to find tools capable of measuring aspartate levels in vivo. Therefore, it is undoubtedly a relevant and timely work carried out by groups experienced in aspartate metabolism and in the generation of metabolite biosensors.

      Reviewer #2 (Public Review):

      In this work the IGluSnFR3 sensor, recently developed by Marvin et al (2023) is mutated position S72, which was previously reported to switch the specificity from Glu to Asp. They made 3 mutations at this position, selected a S72P mutant, then made a second mutation at S27 to generate an Asp-specific version of the sensor. This was then characterized thoroughly and used on some test experiments, where it was shown to detect and allow visualization of aspartate concentration changes over time. It is an incremental advance on the iGluSnFR3 study, where 2 predictable mutations are used to generate a sensor that works on a close analog of Glu, Asp. It is shown to have utility and will be useful in the field of Asp-mediated biological effects.

      Reviewer #3 (Public Review):

      In this manuscript, Davidsen and collaborators introduce jAspSnFR3, a new version of aspartate biosensor derived from iGluSnFR3, that allows monitoring in real-time aspartate levels in cultured cells. A selective amino acids substitution was applied in a key region of the template to switch its specificity from glutamate to aspartate. The jAspSnFR3 does not respond to other tested metabolites and performs well, is not toxic for cultured cells, and is not affected by temperature ensuring the possibility of using this tool in tissues physiologically more relevant. The high affinity for aspartate (KD=50 uM) allowed the authors to measure fluctuations of this amino acid in the physiological range. Different strategies were used to bring aspartate to the minimal level. Finally, the authors used jAspSnFR3 to estimate the intracellular aspartate concentration. One of the highlights of the manuscript was a treatment with asparagine during glutamine starvation. Although didn't corroborate the essentiality of asparagine in glutamine depletion, the measurement of aspartate during this supplementation is a glimpse of how useful this sensor can be.

      Reviewer #1 (Recommendations For The Authors):

      The authors should evaluate the effectiveness of the sensor in detecting small perturbations of aspartate levels and its behavior in response to acute aspartate elevations in the cytosol. In vivo aspartate determinations were performed exclusively in conditions that cause aspartate depletion. By means the use of mitochondrial respiratory inhibitors or aspartate withdrawal, it was determined the reliability of the sensor performing readings during relatively long periods, until reaching a steady-state of aspartate-depletion 12-60 hours later. Although in Hellweg and coworkers, it has been demonstrated that a related aspartate sensor could detect increases in aspartate in cell overexpressing the aspartate-glutamate GLAST transporter, the differences reported here between both sensors advise testing whether this aspect is also improved, or not, using jAspSNFR3.

      Similarly, Davidsen et al. did not test if the sensor can be able to detect transient variations in cytosolic aspartate levels. In proliferative cells aspartate synthesis is linked to NAD+ regeneration by ETC (Sullivan et al., 2015, Cell), indeed the authors deplete aspartate using CI or CIII inhibitors but do not analyze if those are recovered, and increased, after its removal. Furthermore, the sequential addition of oligomycin and uncouplers could generate measurable fluctuations of aspartate in the cytosol.

      We agree with the reviewer that only including situations of aspartate depletion in our cell culture experiments provided an incomplete evaluation of the utility of this biosensor. In the revised manuscript we provide three additional experiments using secondary treatments that restore aspartate synthesis to conditions that initially caused aspartate depletion. First, we conducted experiments where cells expressing jAspSnFR3/NucRFP were changed into media without glutamine, inducing aspartate depletion, with glutamine being replenished at various time points to observe if GFP/RFP measurements recover. As expected, glutamine withdrawal caused a decay in the GFP/RFP signal and we found that restoring glutamine caused a subsequent restoration of the GFP/RFP signal at all time points, with each fully recovering the GFP/RFP signal over time (Revised Manuscript Figure 2E). Next, we conducted the experiment suggested by the reviewer, testing whether the published finding, that oligomycin induced aspartate limitation can be remedied by co-treatment with electron transport chain uncouplers, could be visualized using jAspSnFR3 measurements of GFP/RFP. Indeed, after 24 hours of oligomycin induced aspartate depletion, treatment with the ETC uncoupler BAM15 dose dependently restored GFP/RFP signal (Revised Manuscript Figure 2G). Finally, we also measured whether the ability of pyruvate to mitigate the decrease in aspartate upon co-treated with rotenone (Figure 2B) could also be detected in a sequential treatment protocol after aspartate depletion. Indeed, after 24 hours of aspartate depletion by rotenone treatment, the GFP/RFP signal was rapidly restored by additional treatment with pyruvate (Revised Manuscript Figure 2, figure supplement 1C). Collectively, these results provide support for the utility of jAspSnFR3 to measure transient changes in aspartate levels in diverse metabolic situations, including conditions that restore aspartate to cells that had been experiencing aspartate depletion.

      Reviewer #2 (Recommendations For The Authors):

      Weaknesses: Sensor basically identical to iGluSnFR3, but nevertheless useful and specific. The results support the conclusions, and the paper is very straightforward. I think the work will be useful to people working on the effects of free aspartate in biology and given it is basically iGluSnFR3, which is widely used, should be very reproducible and reliable.

      We appreciate the reviewer’s comment that sensor is useful for specific detection of aspartate. We agree that the advance of the paper is primarily in demonstrating its utility to measure aspartate, rather than any fundamental innovation on the biosensor approach. We hope the fact that jAspSnFR3 derives from a well validated biosensor (iGluSnFR3) will support its adoption.

      Reviewer #3 (Recommendations For The Authors):

      Although this is a well-performed study, I have some comments for the authors to address:

      1) A red tag version of the sensor (jAspSnFR3-mRuby3) was generated for normalization purposes, with this the authors plan to correct GFP signal from expression and movement artifacts. I naturally interpret "movement artifacts" as those generated by variations in cell volume and focal plane during time-lapse experiments. However, it was mentioned that jAspSnFR3-mRuby3 included a histidine tag that may induce a non-specific effect (responses to the treatment with some amino acids). This suggests that a version without the tag needs to be generated and that an alternative design needs to be set for normalization purposes. A nuclear-localized RFP was expressed in a second attempt to incorporate RFP as a normalization signal. Here the cell lines that express both signals (sensor and RFP) were generated by independent lentiviral transductions (insertions). Unless the number of insertions for each construct is known, this approach will not ensure an equimolar expression of both proteins (sensor and RFP). In this scenario is not clear how the nuclear expression of RFP will help the correction by expression or monitor changes in cell volume. The authors may be interested in attempting a bicistronic system to express both the sensor and RFP.

      The reviewer noted several potential issues concerning the use of RFP for normalization, which will be separated into sections below:

      Movement artifacts:

      We are glad the reviewer raised this issue since we see how it was confusingly worded. We have deleted the text “and movement artefacts” from the sentence.

      His-tag and non-specific responses to some amino acids:

      We also found it concerning that non-specific responses to amino acids could potentially contribute to our RFP normalization signal, and so we conducted additional experiments to address whether this was likely to be an issue in intracellular measurements. We first tested whether the non-specific signal was related to the histidine tag, or was intrinsic to the mRuby3 protein itself, by comparing the fluorescence response to a titration of histidine (which showed the largest effect of red fluorescence), aspartate, and GABA (structurally related to glutamate and aspartate, but lacking a carboxylate group) across a group of mRuby containing variants, with or without histidine tags. We replicated the non-specific signal originally observed in jAspSnFR3-mRuby3-His and found that another biosensor with a histidine tagged on the C terminus of mRuby3 had a similar response (iGlucoSnFR2.mRuby3-His), as did mRuby3-His alone, indicating that the aspect of being fused with jAspSnFR3 or another binding protein was not required for this effect. Additionally, we also compared the fluorescence response of lysates expressing mRuby2 and mRuby3 without histidine tags and found that the non-specific signal was essentially absent (Revised Manuscript Figure 1, figure supplement 4B-D). Collectively. These data support our original hypothesis that the histidine tag was responsible for the non-specific signal, alleviating concerns about more substantial protein design issues or with using nuc-RFP for normalization. Since we also found that measuring aspartate signal using GFP/RFP ratios from cells with linked the jAspSnFR3-Ruby3-His agreed with measurements from cells separately expressing jAspSnFR3 and nucRFP (without a His tag), and the amino acid concentrations needed to significantly alter His tagged Ruby3 signal are above those typically found in cells, we conclude that this is unlikely to be a significant factor in cells. Nonetheless, we have added all the relevant data to the manuscript to allow readers to make their own decision about which construct would be best for their purposes.

      Original text:

      "Surprisingly, the mRuby3 component responds to some amino acids at high millimolar concentrations, indicating a non-specific effect, potentially interactions with the C-terminal histidine tag (Figure 1—figure Supplement 2, panel B). Notably, this increase in fluorescence is still an order of magnitude lower than the green fluorescence response and it occurs at amino acid concentrations that are unlikely to be achieved in most cell types."

      Revised text:

      "Surprisingly, the mRuby3 fluorescence of affinity-purified jAspSnFR3.mRuby3 responds to some amino acids at high millimolar concentrations, indicating a non-specific effect (Figure 1—figure Supplement 4, panel A). This was determined to be due to an unexpected interaction with the C-terminal histidine tag and could be reproduced with other proteins containing mRuby3 and purified via the same C-terminal histidine tag (Figure 1—figure Supplement 4, panel B and C). Interestingly, a structurally related, non-amino acid compound, GABA, does not elicit a change in red fluorescence; indicating, that only amino acids are interacting with the histidine tag (Figure 1—figure Supplement 4, panel D). Nevertheless, most of our cell culture experiments were performed with nuclear localized mRuby2, which lacks a C-terminal histidine tag, and these measurements correlated with those using the histidine tagged jAspSnFR3-mRuby3 construct (Figure 1—figure Supplement 1 panel D)."

      Lentiviral transductions

      We agree that splitting the two fluorescent proteins across two expression constructs and infections effectively guarantees that there will not be equimolar expression of jAspSnFR3 and RFP, however we do not think equimolar expression is necessary in this context. The primary goal of RFP measurements in these experiments (and in experiments using the jAspSnFR3-mRuby3 fused construct) is to control for global alterations in protein expression that might confound the interpretation that a change in GFP fluorescence corresponds to a change in aspartate levels. While a bicistronic system is arguably a better approach to improve the similarity of expression of jAspSnFR3 and nuc-RFP in a cell, we only require that the cells have consistent expression of both proteins across all cells in the population, not that the expression of one necessarily be a similar molarity to the other. We accomplish consistent expression of proteins by single cell cloning after expression of jAspSnFR3 and nucRFP (or jAspSnFR3-mRuby3), and screening for clones that have high enough expression of both proteins such that they are well detected by standard Incucyte conditions. Given that our data do not identify an obvious downside to separate expression of jASPSnFR3 and nuc-RFP compared to the fused jAspSnFR3-mRuby3 construct (where the fluorescent proteins are truly equimolar) (Figure 2, Figure Supplement 1C), we elected to prioritize the separate jAspSnFR3 and nuc-RFP combination, which provides additional opportunities to measure cell number in the same experiment (see below).

      2) The authors were interested in establishing the temporal dynamics of aspartate depletion by genetics and pharmaceutical means. For the inhibition of mitochondrial complex I rotenone and metformin were used. Although the assays are clearly showing aspartate depletion the report of cell viability is missing. Considering that glutamine deprivation induces arrest in cell proliferation, I think will be important to know the conditions of the cell cultures after 60 hours of treatment with such inhibitors.

      We agree that ensuring that cells are still viable in conditions where aspartate is depleted, as determined by GFP/RFP in jAspSnFR3 expressing cells, is an important goal. To this end, we added a new experiment investigating the restoration of glutamine on the GFP/RFP signal at different time points after glutamine depletion (Revised Manuscript Figure 2E, see response to reviewer 1). One advantage of using the nuclear RFP as a normalization marker is that it also enables measurements of nuclei counts, a surrogate measurement for cell number. In the same glutamine depletion experiment we therefore measured cell counts using nuclear RFP incidences and confluency as measurements of cell proliferation/growth. In both cases, the arrest in cell proliferation upon glutamine withdrawal was obvious, as was the restoration of cell proliferation following glutamine replenishment, with the amount of growth delay corresponding to the length of glutamine withdrawal (Revised Manuscript Figure 2, Figure Supplement 2A-B). Nonetheless, there was no obvious lasting defects in restarting cell proliferation even after 12 hours of glutamine withdrawal, indicating that cell viability is preserved. In the case of mitochondrial inhibitors, we also observe even that after 24 hours of treatment with oligomycin or rotenone, restoration of aspartate synthesis from BAM15 or pyruvate, respectively, can also restore GFP/RFP signal, supporting the conclusion that cellular metabolism is still active in these conditions (Revised Manuscript Figure 2G; Revised Manuscript Figure 2, figure supplement 1C).

      3) The pH sensitivity was checked in vitro with jAspSnFR3-mRuby3 and the sensor reported suitable for measurements at physiological pH. It would be an opportunity to revisit the analysis for pH sensitivity in cultured cells using an untagged version of jAspSnFR3 coupled, for example, to a sensor for pH.

      We thank the reviewer for the suggestion and agree that pH effects on sensor signal could be a confounding factor in some conditions. Unfortunately, measuring intracellular pH is not trivial and using multiple fluorescent sensors that change simultaneously would be complex to interpret, particularly in the absence of controls to unambiguously control intracellular pH and aspartate concentrations. Thus, we believe that proper investigation of the variable of pH is beyond the scope of this study. Nonetheless, we agree that measuring the contribution of pH to sensor signal is an important goal for future work, particularly if deploying it in conditions likely to cause substantial pH differences, such as comparing compartmentalized signal of jAspSnFR3 in the cytosol and mitochondria. We have added the following italicized text to the conclusions section to underscore this point:

      “Another potential use for this sensor would be to dissect compartmentalized metabolism, with mitochondria being a critical target, although incorporating the influence of pH on sensor fluorescence will be an important consideration in this context.”

      4) While the authors take an interesting approach to measuring intracellular aspartate concentration, it will be highly desirable if a calibration protocol can be designed for this sensor. Clearly, glutamine depletion grants a minimal ("zero") aspartate concentration. However, having a more dynamic way for calibration will facilitate the introduction of this tool for metabolism studies. This may be achieved by incorporating a cultured cell that already expresses the transporter or by ectopic expression in the cells that have already been used.

      We appreciate the suggestion and would similarly desire a calibration protocol to serve as a quantitative readout of aspartate levels from fluorescence signal, if possible. While we do calibrate jAspSnFR3 fluorescence in purified settings, conducting an analogous experiment intracellularly is currently difficult, if not impossible. While we have several methods to constrain the production rate of aspartate (glutamine withdrawal, mitochondrial inhibitors, and genetic knockouts of GOT1 and GOT2), we cannot prevent cells from decreasing aspartate consumption and so cannot get a true intracellular zero to aid in calibration. Additionally, the impermeability of aspartate to cell membranes makes it challenging to specifically control intracellular concentrations using environmental aspartate, and the best-known aspartate transporter (SLC1A3) is concentrative and so has the reciprocal problem. Considering these issues, we are wary of implying to readers that any specific fluorescence measurement can be used to directly interpret aspartate concentration given the many variables that can impact its signal, both related to the biosensor system itself (expression of jAspSnFR3, expression of Nuc-RFP, sensitivity and settings of the fluorescence detector) and based on cell intrinsic variability (differences in basal ASP levels, different sensitivity to treatments, influence of pH, etc.). We maintain that jAspSnFR3 has utility to measure relative changes in aspartate within a cell line across treatment conditions and over time, but absolute quantitation of aspartate still will require complementary approaches, like mass spectrometry, enzymatic assays, or NMR.

      5) jAspSnFR3 seems to have the potential to be incorporated easily for several research groups as a main tool. In general, a minor correction to replace F/F with ΔF/F in the text.

      Thank you for catching this error, the text has been edited accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors provide evidence to show that an increase in Kv7 channels in hilar mossy cells of Fmr1 knock out mice results in a marked decrease in their excitability. The reduction in excitatory drive onto local hilar interneurons produces an increased excitation/inhibition ratio in granule cells. Inhibiting Kv7 channels can help normalize the excitatory drive in this circuit, suggesting that they may represent a viable target for targeted therapeutics for fragile-x syndrome.

      Strengths:

      The work is supported by a compelling and thorough set of electrophysiological studies. The authors do an excellent job of analysing their data and present a very complete data set.

      We thank the Reviewer for the positive comments.

      Weaknesses:

      There are no significant weaknesses in the experimental work, however the complexity of the data presentation and the lack of a schematic showing the organizational framework of this circuit make the data less accessible to non-experts in the field. I highly encourage a graphical abstract and network diagram to help individuals understand the implications of this work.

      We thank the Reviewer for the suggestion, and added a schematic of the dentate network organization (Figure 1A).

      The work is important as it identifies a unique regional and cell-specific abnormality in Fmr1 KO mice, showing how the loss of one gene can result in region-specific changes in brain circuits.

      Reviewer #2 (Public Review):

      Summary:

      Deng et al. investigate, for the first time to my knowledge, the role that hippocampal dentate gyrus mossy cells play in Fragile X Syndrome. They provide strong evidence that, in slice preparations from Fmr1 knockout mice, mossy cells are hypoactive due to increased Kv7 function whereas granule cells are hyperactive compared to slices from wild-type mice. They provide indirect evidence that the weakness of mossy cell-interneuron connections contributes to granule cell hyperexcitability, despite converse adaptations to mossy cell inputs. The authors show that application of the Kv7 inhibitor XE991 is able to rescue granule cell hyperexcitability back to wild-type baseline, supporting the overall conclusion that inhibition of Kv7 in the dentate may be a potential therapeutic approach for Fragile X Syndrome. However, any claims regarding specific circuit-based intervention or analysis are limited by the exclusively pharmacological approach of the manipulations.

      Strengths:

      Thorough electrophysiological characterization of mossy cells in Fmr1 knockout mice, a novel finding.

      Their electrophysiological approach is quite rigorous: patched different neuron types (GC, MC, INs) one at a time within the dentate gyrus in FMR1 KO and WT, with and without 'circuit blockade' by pharmacologically inhibiting neurotransmission. This allows the most detailed characterization possible of passive membrane/intrinsic cell differences in the dentate gyrus of Fmr1 knockout mice.

      Provide several examples showing the use of Kv7 inhibitor XE991 is able to rescue excitability of granule cell circuit in Fmr1 knockout mice (AP firing in the intact circuit, postsynaptic current recordings, theta-gamma coupling stimulation).

      We thank the Reviewer for the positive comments.

      Weaknesses:

      The implications for these findings and the applicability of the potential treatment for the disorder in a whole animal are limited due to the fact that all experiments were done in slices.

      We appreciate the Reviewer’s point and agree. To address this concern, we have revised the Discussion to state that “the applicability of a circuit-wide approach as a potential treatment in vivo will require extensive future behavioral analyses, which are beyond the scope of the current study”. We also now emphasize in Discussion that “these findings provide a proof-of-principle demonstration that a circuit-based intervention can normalize dynamic E/I balance and restore dentate circuit output in vitro”.

      The authors' interpretation of the word 'circuit-based' is problematic - there are no truly circuit-specific manipulations in this study due to the reliance on pharmacology for their manipulations. While the application of the Kv7 inhibitor may have a predominant effect on the circuit through changes to mossy cell excitability, this manipulation would affect many other cells within the dentate and adjacent brain regions that connect to the dentate that express Kv7 as well.

      We appreciate the reviewer’s point but would like to clarify that by using a term “circuit-based” we did not intend to imply that it is a “’circuit-specific” intervention. Our intended interpretation of the term ‘circuit-based’ stems from the following reasoning: the dentate circuit has two types of excitatory neurons which show opposite excitability defects in FXS mice, thus presenting an irreconcilable conflict to correct pharmacologically for each cell type individually. Instead, we sought an approach to correct the overall dentate circuit output, rather than to restore excitability defects of individual cell types. Notably, when we pharmacologically isolated granule cells from the circuit, inhibition of Kv7 failed to restore their excitability, suggesting that normalization of the dentate output depends on the circuit activity. Since we focused on correcting dentate output using such a circuit-dependent approach, we used the term ‘circuit-based intervention’ to emphasize this notion.

      Reviewer #3 (Public Review):

      The paper by Deng, Kumar, Cavalli, Klyachko describes that, unlike in other cell types, loss of Fmr1 decreases the excitability of hippocampal mossy cells due to up-regulation of Kv7 currents. They also show evidence that while muting mossy cells appears to be a compensatory mechanism, it contributes to the higher activity of the dentate gyrus, because the removal of mossy cell output alleviates the inhibition of dentate principal cells. This may be important for the patho-mechanism in Fragile X syndrome caused by the loss of Fmr1.

      These experiments were carefully designed, and the results are presented ‎in a very logical, insightful, and self-explanatory way. Therefore, this paper represents strong evidence for the claims of the authors. In the current state of the manuscript, there are only a few points that need additional explanation.

      We thank the Reviewer for the positive comments.

      One of the results, which is shown in the supplementary dataset, does not fit the main conclusions. Changes in the mEPSC frequency suggest that in addition to the proposed network effects, there are additional changes in the synaptic machinery or synapse number that are independent of the actual activity of the neurons. Since the differences of the mEPSC and sEPSC frequencies are similar and because only the latter can signal network effects, while the former is typically interpreted as a presynaptic change, it cannot be claimed that sEPSC frequency changes are due to the hypo-excitability of mossy cells.

      We thank the Reviewer for this important point and agree. To address this concern, we now state in Results that “We note that changes in the excitatory drive onto interneurons include both mEPSC and sEPSC frequencies, which reflect not only potential deficits in excitability of their input cells, such as MCs, but also changes in synaptic connectivity/function, that may arise from homeostatic circuit reorganization/compensation (see Discussion)”.

      We also now emphasize this point in Discussion by stating that “alterations in excitatory drives, including both mEPSC and sEPSC frequencies onto interneurons, suggest changes in the excitatory synapse number and/or function. Together with alterations in inhibitory drives these changes may reflect compensatory circuit reorganization of both excitatory and inhibitory connections, including mossy cell synapses”.

      We also note in Discussion that “Such circuit reorganization can explain the balanced E/I drive onto granule cells in Fmr1 KO mice we observed in the basal state, which can result from reorganization of excitatory and inhibitory axonal terminals”.

      Notably, our findings that Kv7 blocker acting by increasing MC excitability is sufficient to correct dentate output, supports the notion that hypo-excitability of mossy cells is a major factor contributing to dentate circuit E/I imbalance. This does not exclude the presence of additional mechanisms contributing to E/I imbalance, such as changes of synaptic connectivity or release machinery. To reflect this point, we revised the Results to temper the initial claim that “this analysis supports the notion that the hypo-excitability of MCs in Fmr1 KO mice caused (now replaced with “is a major factor contributing to”) the reduction of excitatory drive onto hilar interneurons, which ultimately results in reduced local inhibition”.

      An apparent technical issue may imply a second weak point in the interpretation of the results. Because the IPSCs in the PP stimulation experiments (Fig 8) start within a few milliseconds, it is unlikely that its first ‎components originate from the PP-GC-MC-IN feedforward inhibitory circuit. The involvement of this circuit and MCs in the Kv7-dependent excitability changes is the main implication of the results of this paper. But this feedforward inhibition requires three consecutive synaptic steps and EPSP-AP couplings, each of them lasting for at least 1ms + 2-5ms. Therefore, the inhibition via the PP-GC-MC-IN circuit can be only seen from 10-20ms after PP stimulation. The earlier components of the cPSCs should originate from other circuit elements that are not related to the rest of the paper. Therefore, more isolated measurements on the cPSC recordings are needed ‎which consider only the later phase of the IPSCs. This can be either a measurement of the decay phase or a pharmacological manipulation that selectively enhances/inhibits a specific component of the proposed circuit.

      We appreciate the Reviewer’s point. As we mentioned in Results: “The EPSP measured in granule cells in response to the PP stimulation integrates both excitatory and inhibitory synaptic inputs onto granule cells, including the direct synaptic input from the PP and all the PP stimulation-associated feedforward and feedback synaptic inputs. In other words, the EPSP in granule cells integrates all dentate circuit ‘operations’.” As the Reviewer pointed out, this is also the case in the measurements of cPSCs, which comprise all of PP stimulation-associated feedforward and feedback inhibition. We thank the Reviewer for the suggestion to isolate specific components of IPSC. However, we did not attempt to do it in this study for three reasons. First, activity of all of these circuit components likely overlaps extensively in time and it is difficult to identify the specific time point that can separate contributions from earlier canonical feed-forward and feed-back components from the contribution of the later MC-dependent PP-GC-MC-IN feed-forward component. Notably the tri-synapse PP-GC-MC-IN component differs temporarily from the canonical di-synaptic (PP-GC-IN) feed-back inhibition only by a single synaptic activation step, resulting in only a few milliseconds difference. Moreover, the temporal differences in the contributions of these components vary widely among different recordings making a uniform analysis very difficult. Second, we used three different metrics to assess E/I changes in cPSC measurements, which capture a wide range of temporal processes and their integration, including peak-to-peak measurements, the charge transfer, and the excitation window metrics. Third, the principal readout in our study was the overall dentate output (i.e., granule cell firing), which reflects the integration of all dentate circuit ‘operations’ thus making the overall cPSC measurements appropriate, in our view, for this readout.

      I suggest refraining from the conclusions saying "‎MCs provide at least ~51% of the excitatory drive onto interneurons in WT and ~41% in KO mice", because too many factors (eg. IN cell types, slice condition, synaptic reliability) are not accounted for in these actual numbers, and these values are not necessary for the general observation of the paper.

      We thank the reviewer for this suggestion, and have revised the manuscript accordingly.

      There are additional minor issues about the presentation of the results.

      We have carefully checked and corrected the minor errors that reviewer pointed out.

      Recommendations for the authors:

      Revisions that are considered essential for improved assessment regarding the strengths of support of the claims:

      • Temper claims regarding circuit-based effects

      • Temper claims regarding very specific quantitative assessments of synaptic drives

      • Differentiate between monosynaptic inputs and inputs arriving through multiple synaptic contacts with proper analytical techniques.

      We appreciate these suggestions and have revised the manuscript to address the concerns raised by the reviewers.

      Reviewer #1 (Recommendations For The Authors):

      The authors do an outstanding job of reviewing and presenting all of their data. This is a paper I will recommend all of my trainees read, as it is an excellent example of a complete research project. While I am impressed with the effort involved, I also wondered if the complexity and thoroughness of their presentations could make the story less accessible to non-expert readers. My comments are simply intended to help them present a more coherent and succinct story to a wider audience, though I am not sure I really provide any meaningful changes. This is simply a very thorough and complete body of work that the authors should be commended for. After reading it I felt they had gone above and beyond what most authors would provide in terms of data to support their story, and thus I had no doubt that a change in Kv7 plays a role in changing the excitability of the network.

      We thank the Reviewer for the positive comments and great suggestions. We have made numerous changes to present our work in a more coherent and succinct way, in part by re-plotting some of the figures, as well as by adding a schematic of the dentate circuit in Figure 1.

      Figure 1. A visual of mossy cells and the local circuit they are studying would be a useful addition to Figure. 1. I also feel this is important for conveying the story of how hypo-excitability can impact the E/I of the network. I think it has to be more of a cell structure/circuit-based figure than is presented in Supplementary Figure 8.

      We thank the reviewer for this suggestion. We have added a schematic of the dentate circuit with all major cell types involved in Figure 1A.

      Figure 1. A, B, and C tell a coherent story and are easy to understand. The interpretation of the phase plot in D is harder to access. Perhaps having this as a separate figure and providing a clearer presentation of the way the phaseplot was created (see Figure 3 Bove et al., 2019, Neuroscience 418; DOI: 10.1016/j.neuroscience.2019.08.048)

      We appreciate the Reviewer’s point and agree. In order to keep Figure 1 more concise and readable, we removed the phase plot in the revised version. This change did not negatively impact the result presentation because the primary aim of this plot was to visualize changes in voltage threshold in an alternative way, but it was already clearly shown by the ramp-evoked AP traces (revised Figure 1D, insert), and thus was not essential to show.

      Figure 1 E-N might be better situated in a supplementary graph as the characteristics of the AP aren't changing.

      We understand the Reviewer’s point, but we feel it would be better to keep all action potential metrics together in one figure, to show that only a specific subset of parameters was affected in Fmr1 KO mice.

      Figure 2: (A-D) I am not sure having so many figures is required given the focus is on having a small change in Ir at one membrane potential. I do worry that the significance appears to be due to 2 cells with an IR of over 100 in the WT group and 2 with an IR of around 62 in the KO group. All other cells are between 75-100 in both groups. I also worry a bit bc in the literature IRs between 55 and 125 seem to be commonly reported by groups that do this work normally (Buzsacki, Westbrook, etc.). I would be cautious about making too much out of this result.

      We thank the Reviewer for these comments. We have performed additional analyses of these data, as also suggested by Reviewer 3 (Point #1), and improved presentation of the data in Figure 2D-F by showing the effect of XE991 on increasing input resistance in WT vs KO. We also plotted other panels in a similar way to show the comparisons between WT and KO, as well as comparisons within genotype +/- XE991, which makes the results easy to follow. For more details, please also see the response to Reviewer 3, Point 1.

      Figure 2D-E: As in the text, this result is really pointing towards there being a Kv7 issue. Worries about the data in D aside, I think these two figures alone tell a clearer story. Figure 3 on the other hand tells a story of the effects of blocking Kv7 on membrane potential. Is this central to the story the others are trying to tell?

      We thank the reviewer for this point. We believe that Figure 2, Figure 3 and Figure 4—figure supplement 1 together provide strong and multifaceted evidence to support changes in Kv7 function in Fmr1 KO mossy cells.

      Figure 3. This is an interesting finding that shows how detailed their analysis was. Showing that the change in holding current in KO animals is greater than in WT is the first solid piece of evidence that there is a change in Kv7 in these cells that affects their excitability.

      We appreciate the reviewer’s comment. As mentioned above, we believe that Figure 2, Figure 3 and Figure 4—figure supplement 1 together provide strong and multifaceted evidence to support changes in Kv7 function in Fmr1 KO mossy cells.

      Figures 4 and 5 provide additional detail to support the idea that Kv& changes by showing how the E/I ratio and spontaneous minis are shifted in KO animals.

      We thank the Reviewer for the comments.

      Figures 6-8 build a compelling story for the reduction in excitatory drive in mossy cells affecting the network dynamics in excitatory/inhibitory interactions in DG cells.

      We appreciate the Reviewer’s comment.

      Reviewer #2 (Recommendations For The Authors):

      1) Other than location and characteristic morphology, the other parameters that were used to identify mossy cells and granule cells were also parameters used to find differences in cellular properties between wild-type and Fmr1 KO mice (RMP, sEPSC frequency, etc.), which would confound the results shown. The use of available transgenic mouse lines would provide for a more unbiased screen of these cells. Afterhyperpolarization was also used as a parameter while screening cells, yet none of the data on this measurement is shown.

      We thank the reviewer for this point and agree that transgenic mouse lines provide a more unbiased way to identify various types of neurons. However, since the present study involves analyses of at least three different types of neurons, establishing multiple transgenic lines labeling different types of dentate neurons in the Fmr1 KO mouse model would be very time consuming and beyond the current resources of the lab. We would also like to clarify that the three types of dentate neurons are easily distinguished according to the large differences in location, morphology and basal electrophysiological properties, none of which were essential in defining differences between genotypes. Specifically, granule cells are located in the granule cell layer, have a small cell body (<10 m), RMP around -80mV, capacitance ~20 pF, and infrequent sEPSCs (<20 events/min); mossy cells are located in the hilus, have a large cell body (>15 m), RMP around -65 mV, capacitance >100 pF, and fast afterhyperpolarization less than -10 mV (WT –5.1 ± 0.7 mV, KO -5.8 ± 0.5 mV); interneurons are located in the hilus or border of granule cell layer, have a relative smaller cell body (10-15 m), RMP around -55 mV, capacitance <60 pF, and afterhyperpolarization larger than -15 mV (WT -20.4 ± 1.3 mV, KO -19.8 ±1.4 mV). We note that the cells that could not be definitively classified into the three categories were not included in analyses, and we have now clarified this further in the Methods. To address the reviewer’s second concern regarding AHP, we now provided the corresponding values in the Methods.

      2) A definitive way to test the cell-autonomous nature of the Kv7 changes would be to use female mice, who will have a mosaic of cells affected by the fragile X chromosome, and the Fmr1 KO cells could be engineered to express GFP to help identify them from wild-type cells.

      We agree and appreciate this suggestion. This could be an interesting follow up study to further verify the cell-autonomous nature of Kv7 changes.

      3) The authors heavily rely on XE991 as a selective Kv7 blocker. Is it blocking all Kv7 channels at the concentration used? If so, given the significant expression of Kv7 in the dentate as shown by Western blot, is it surprising that there is no effect of this inhibitor on wild-type slices in most cases?

      We thank the reviewer for this important point. We used 10x of IC50 concentration in the present study, suggesting that more than 80% of Kv7 should be blocked. Notably, we observed several effects of XE991 in WT mice: it significantly increased input resistance (new Figure 2D-F), and strongly enhanced AP firing evoked by step depolarization (Figure 7E-H), although we did not observe effect of XE991 in WT in the analyses of spiking evoked by theta-gamma stimulation in Figure 8. However, this is not surprising. If a parameter we measured is predominately cell-autonomous (for example, input resistance), the effects of XE991 are easy to observe. However, if a parameter reflects integration of all dentate circuit operations (for example, AP probability in response to theta-gamma stimulation), it is difficult to detect the effect of XE991 in WT mice because the dentate circuit of WT mice has larger capability to maintain E/I balance in response to XE991.

      4) E/I ratio is a helpful concept, and it is heavily relied upon in the results text, but statistically shaky, especially for sEPSC:sIPSCs since you are combining uncertainty in the sEPSC and sIPSC to make one very uncertain ratio that doesn't undergo any subsequent statistical confirmation (such as in Fig 4I).

      We appreciate the reviewer’s point and apologize for the confusion in presentation of Fig 4I (and 5I), due to lack of detailed explanation. The E/I ratio shown in Figs. 4I (and 5I) is a single data-point estimate calculated from the mean values of independent sEPSC and sIPSC measurements (Figs. 4G-H and 5G-H, respectively). This ratio was used only as an estimate/illustration of the changes, rather than a precise determination of the shift in E/I balance. Because there is only one data-point for this ratio, statistical analysis is not possible. For this reason we performed extensive additional analyses in Figures 7 and 8, in which the EPSC and IPSC were measured from the same cells and at the same time to define the actual E/I ratio with the corresponding statistical analyses (i.e., a real matched and dynamic E/I ratio).

      5) Is this mGlur2/CB1 specificity to PP/granule and MC axons, respectively, true in the Fmr1 KO mice? It is possible that mGluR2 and CB1 expression patterns are altered in FMR1 KO, thus the assumption used to isolate these distinct inputs may not hold true.

      This is a very good point. We do assume that the specificity of Group II mGluR and CB1 is similar between Fmr1 KO and WT mice, but this is an assumption that we have not directly verified. However, our results in Figures 7 and 8 strongly support this assumption, because if it were not true, then our intervention would be unlikely to correct the excessive dentate output.

      6) XE991 only normalized GC firing when other cells were not pharmacologically blocked. The authors suggest this means blockage of MC Kv7 reduces GC excitability back to normal...presumably by increasing MC --> IN --> GC firing. This is a conclusion from many indirect comparisons (comparing XE991 effect on GC with/without GABA and glutamate blockers; comparing MC firing rates with/without XE991, and using CB1 agonist versus mGluR2 agonist to say it is mossy cells that are mostly controlling INs) - a clincher experiment would be to acutely knockdown Kv7 in mossy cells specifically and measure GC and IN firing.

      Thank you, this is a great suggestion. Indeed, as an expansion of this project, in the future studies we are planning to manipulate excitability of mossy cells through manipulating Kv7, or using chemogenetic or optogenetic approaches.

      7) The reasoning behind the FMRP-Kv7 connection is quite weak, citing the paper Darnell 2011 as "translational target", but FMRP has myriad translational targets.

      We agree, and attempted to define the mechanism of increased Kv7 function using co-immunoprecipitation approach, as well as immunostaining to look at cell-type specific expression changes. However, both of these approaches were difficult to interpret due to technical limitations of the available antibodies. We also note that “We did not further investigate the precise mechanisms underlying enhancement of Kv7 function in the absence of FMRP, since the present study primarily focuses on the functional consequences of abnormal cellular and circuit excitability”. To address this concern, we extensively discussed the potential mechanisms of FMRP-Kv7 connection, acknowledged in Discussion that “further studies will be needed to elucidate the precise mechanism responsible for the increased Kv7 function in Fmr1 KO mice”, and will continue to investigate it in the future studies.

      8) The authors attempt to look for changes in Kv7 expression with Western blot, but since they hypothesize that Kv7 changes are mainly in the mossy cells, it is perhaps not surprising that they would not be able to see any changes when they look at dentate as a whole. Staining for Kv7 subunits to look at expression on a cellular level would be beneficial.

      We appreciate the reviewer’s suggestion. We attempted to perform the suggested experiments using immunostaining for KCNQ2, KCNQ3 and KCNQ5 in different subtypes of dentate neurons. However, these experiments failed to produce interpretable results due to technical limitations of the available antibodies.

      9) Is Kv7 localization or splice/composition different in FMR1 KO mice?

      This is a very good point. As we mentioned in Point 8 above, we were not able to perform these experiments and do not have the answer at this point.

      10) Regarding the 3 subtypes of interneurons in the dentate, the authors are pooling data based on similar intrinsic properties, but this conclusion may be affected by the low number of recorded neurons for the regular-spiking type. In addition, it is unclear whether these different interneuron types have differential circuit connectivity (most likely) which would make it imperative to keep circuit analysis for interneurons segregated into these cell types.

      We appreciate the reviewer’s point. Indeed, these different interneuron types may have distinct circuit connectivity and contributions to circuit activity. However, identification of these 3 types of interneurons and determination of their respective functions is in itself a very extensive set of experiments which is beyond the scope of the current manuscript. We also note that the functional readout of circuit activity in our measurements was the AP firing and EPSPs evoked in granule cells by PP stimulation, which integrate all dentate circuit operations, including all of the feedforward and feedback loops which are mediated by all of these different types of interneurons. For simplicity, we thus pooled all interneuron data for the purposes of this study. But we fully agree that extensive future work is required to elucidate interneuron-type specific changes in Fmr1 KO mice and their contributions to the dentate circuit dysfunction.

      11) To do statistics treating each cell individually, and therefore assuming each cell is independent of one another, is not correct. Two cells from the same mouse will be more similar than two cells from different mice, therefore they are not independent data points. Nested statistical methods (n cells from o slices from p mice) will be important in future work, as discussed by (Aarts et al., Nat. Neurosci. 2014).

      We agree with the Reviewer’s point and appreciate this suggestion. In the present study, the cells tested in electrophysiological experiments were from at least 3 different mice for each condition, which help minimize this kind of errors.

      Reviewer #3 (Recommendations For The Authors):

      Is there a difference in the Rin at -45mV of the control cell after the application of XE991? This is important to appreciate whether the XE991-sensitive conductances contribute to the basal excitability of MCs. Furthermore, the statistical comparison of the Rin at -45mV of the FXS animals in the control solution and in the presence of XE991 would be also important‎. Actually, the most accurate measurement would be to show a difference in the acute Kv7-blockade between control and FXS animals, if that is possible with this blocker. Additionally, it would be also informative if the bar graphs in Fig.2 D & E were merged for this purpose, similarly as in the later figures.

      We thank the Reviewer for this suggestion and agree. Following this suggestion, we have re-plotted the data in Figure 2 accordingly. Specifically, we now show that XE991 significantly increased input resistance in both WT and KO mossy cells, and the effect of XE991 on increasing input resistance was markedly larger in KO than WT mossy cells. For other figures, we have plotted data in a similar way to show the comparisons between WT and KO, as well as comparisons within genotype +/- XE991.

      Because of the cell-to-cell variability of the voltage responses, it would be more informative and representative if the average of traces from all cells were shown in Fig.2 D & E.

      We agree with the Reviewer’s point. For clarity of presentation, we presented the cell-to-cell variability of the data as scatter points of input resistance values in the bar graph (Figure 2E), together with the representative traces (Figure 2D). Plotting the average traces from all cells would result in a total of 30 traces for all the WT and KO mice, which is difficult to visually assess clearly.

      On page 7, please clarify the recorded cell type in this sentence: "In ‎contrast, WIN markedly reduced the number of sEPSCs in both WT and KO mice...".

      We thank the Reviewer for pointing out this omission and have clarified it in the revised version.

      In Figures 6 C, F, and I, the title of the Y-axis should be normalized frequency. Please also correct the figure legend accordingly because the current sentence can be also interpreted as the absolute or total number of events that were compared, irrespective of the duration of the recordings.

      We thank the Reviewer for this point and have corrected the revised version accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I highly appreciate this study and found the paper to be very well-written and easy to follow. However, a more extensive discussion of what I summarized under "weakness" would strengthen the paper. This may include a broader discussion of the canopy effect itself and the most relevant literature on its extent in rainforest settings in general and primate foods in particular, as well as more details on the dietary behavior of modern orangutans (stratigraphy of orangutan foods) and how seasonal their diet is. The extreme seasonality in orangutan plant food availability should be discussed. Now there are only 2 sentences in the discussion (lines 304-312) and I find the word "plant' only twice overall, though variation in plant food d18O is what drives variation in orangutan dental d18O values.

      We very much appreciate the support of this reviewer, and their feedback about the clarity of the paper. As noted in the provisional reply to reviewers, we are happy to add additional context about the issue of isotopic enrichment within forest canopies, and have expanded the original paragraph in the discussion devoted to this subject. We made reference to the fact that orangutan diets vary by season and site in the original submission, and have now acknowledged that seasonal diet variation may also contribute to variation in enamel isotope values.

      Also, I'd like to note that there has been only one recent study so far that made some level of an attempt to find a breastfeeding effect in orangutans using fecal isotope data. Tsutaya et al. 2022 (AJBA) report some seasonality in adult orangutan fecal isotope values, which could be relevant here as well. But also they reported some data from 2 to 7-year-old orangutan offspring and did not see any breastfeeding pattern in isotope values here either. Probably not too surprising at this older age, but still worth noting in the context of this study.

      There is a 2019 study that sampled fecal isotopes in 43 mother-infant orangutan pairs and found a different pattern than Tsutaya et al. (2022), although these data have not been published in full (Knott et al. (2019) AJBA 168, S68, 128-129). Given these contradictions, the fact that neither study serially sampled the first two years of life, and caveats to fecal isotope sampling of wild primates reviewed in Bădescu et al. (2023: American Journal of Primatology 2023;e235), introducing these nitrogen isotope studies does not aide in the interpretation of oxygen isotope data during intensive nursing, and thus is beyond the scope of this paper. The seasonality Tsutaya et al. (2022) reported in adult fecal samples was for carbon isotopes rather than nitrogen isotopes, and its relevance to the current study is unclear given that the orangutan plant foods measured did not show seasonal variation in carbon isotopes. As requested above, we have noted orangutans’ dietary seasonality might influence the variation of oxygen isotope values.

      Reviewer #2 (Recommendations For The Authors):

      First, the manuscript offers upfront flashy numbers with respect to the number of samples, but what the reader really needs to know upfront is the number of individuals and the number of teeth per individual. These facts are buried and make the reader work too hard to keep track. While the specimen ID numbers are valuable in the table, perhaps a different ID could be used in the text, such as individuals modern Borneo A and B, fossil Sumatra A and B, etc.? Similarly, it would be helpful to remind readers of each locality - Borneo or Sumatra, modern or fossil.

      Tables 1 and 2 and the first sentence of the results and the materials and methods stated that we measured 18 teeth in this study. It is likely that the placement of the tables at the very end of the manuscript in the submitted version made the sample sizes and specimen information less evident to the reviewer. In response to this critique we have now added the number of teeth to the abstract, and trust that when the tables are placed within the text as indicated it will be easier to follow textual references to particular individuals. Museum identification codes have been provided in two previous publications of these teeth, and we retain them here for consistency.

      Second, the manuscript mentions some climate change in Sumatra, but what about Borneo?

      The results on the Bornean fossil teeth stated: “The range of values from these two fossil molars (14.2–24.8 ‰) markedly exceeds the range of modern Bornean orangutans (12.7–20.0 ‰) (Figure 4), with the mean δ18O value at least 2‰ heavier, suggesting possibly drier conditions with greater seasonality during their formation.” In the final section of the discussion, we devoted two paragraphs to discussing evidence for climate change at Niah Cave in Borneo - more than we devote to discussing such data from Sumatra.

      The most valuable figure in the manuscript is Figure 3 showing the serial sampling of modern teeth. It would be incredibly useful to see a similar graph for the fossils and a graph of the modern and fossils together for each island. The violin plots demonstrate a range of values but fail to provide the important seasonality signals. The manuscript is promising but as written is difficult to follow, and the results and conclusions with regard to climate change need more demonstration. On a minor note, I found myself wanting to know about the dates of fossils before knowing the isotopic values. You might wish to move the dating section to precede the isotopes.

      As requested, we have added an additional Supplemental figure making the comparisons of seasonality between fossil and modern individual more evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addressed an alternative hypothesis to temporal binding phenomena. In temporal binding, two events that are separated in time are "pulled" towards one another, such that they appear more coincidental. Previous research has shown evidence of temporal binding events in the context of actions and multisensory events. In this context, the author revisits the well-known Libet clock paradigm, in which subjects view a moving clock face, press a button at a time of their choosing to stop the clock, a tone is played (after some delay), and then subjects move the clock dial to the point where the one occurred (or when the action occurred). Classically, the reported clock time is a combination of the action and sound times. The author here suggests that attention can explain this by a mechanism in which the clock dial leads to a roving window of spatiotemporal attention (that is, it extends in both space and time around the dial). To test this, the author conducted a number of experiments where subjects performed the Libet clock experiment, but with a variety of different stimulus combinations. Crucially, a visual detection task was introduced by flashing a disc at different positions along the clock face. The results showed that detection performance was also "pulled" towards the action event or sensory event, depending on the condition. A model of roving spatiotemporal attention replicated these effects, providing further evidence of the attentional window.

      Strengths:

      The study provides a novel explanation for temporal binding phenomena, with clear and cleverly designed experiments. The results provide a nice fit to the proposed model, and the model itself is able to recapitulate the observed effects.

      Weaknesses:

      Despite the above, the paper could be clearer on why these effects are occurring. In particular, the control experiment introduced in Experiment 3 is not well justified. Why should a tactile stimulus not lead to a similar effect? There are possibilities here, but the author could do well to lay them out. Further, from a perspective related to the attentional explanation, other alternatives are not explored. The author cites and considers work suggesting that temporal binding relies on a Bayesian cue combination mechanism, in which the estimate is pulled towards the stimulus with the lowest variance, but this is not discussed. None of this necessarily detracts from the findings, but otherwise makes the case for attention less clear.

      I would like to thank the reviewer for the helpful comments and recommendations. Regarding Experiment 3, the rationale is this. We showed in Experiments 1 and 2 that, for outcome binding, there were two types of difference between Action Sound condition and Sound Only condition: the reported time of sound onset (i.e. the reported clock hand location at the sound onset) and the attention distribution. To experimentally test the relevance of the attention difference to the difference of reported time, we created a situation where the attention difference could be minimised and then checked the difference of reported time. We found that when the attention difference was controlled for between the two conditions, the difference of reported time was also gone, thus providing further evidence for a close link between attention and time report in the current testing paradigm. Therefore, Experiment 3 was primarily targeting the experimental evidence for the claim of the current study. What we needed in Experiment 3 was a condition that could have a smaller attention difference with the Action Sound condition than the attention difference between Sound Only and Action Sound conditions in Experiments 1 and 2. We expected that a tactile stimulus before the sound onset could work, without a clear prediction of the strength of the tactile stimulus in shifting attention, which was also not necessary. This experimental manipulation was a nice fit for the purpose of experiment 3, as we could empirically measur the effectiveness of the tactile stimulus on attention shift and then relate it to the changes in outcome binding.

      As the reviewer correctly suggested, the Bayesian framework has been applied in several studies to explain the time judgement distortion in sensorimotor situations (e.g. the temporal binding effect studied here). However, the current study asked what temporal binding is really about when it is measured with the Libet clock method. Is it really about a distortion in time perception (which the Bayesian account tries to explain)? Or is it also about attention? The results showed that the spatiotemporal attention distribution is at least a confound in measuring the perceived time of an event using the Libet clock method. Therefore, the Bayesian account raised in previous studies is relevant when explaining the distortion in time perception, given that it really exists. We here asked if the distortion really exists, and to what extent.

      Reviewer #2 (Public Review):

      Summary:

      Temporal binding, generally considered a timing illusion, results from actions triggering outcomes after a brief delay, distorting perceived timing. The present study investigates the relationship between attention and the perception of timing by employing a series of tasks involving auditory and visual stimuli. The results highlight the role of attention in event timing and the functional relevance of attention in outcome binding.

      Strengths:

      • Experimental Design: The manuscript details a well-structured sequence of experiments investigating the attention effect in outcome binding. Thoughtful variations in manipulation conditions and stimuli contribute to a thorough and meaningful investigation of the phenomenon.

      • Statistical Analysis: The manuscript employs a diverse set of statistical tests, demonstrating careful selection and execution. This statistical approach enhances the reliability of the reported findings.

      • Narrative Clarity: Both in-text descriptions and figures provide clear insights into the experiments and their results, facilitating readers in following the logic of the study.

      Weaknesses:

      • Conceptual Clarity: The manuscript aims to integrate key concepts in human cognitive functions, including attention, timing perception, and sensorimotor processes. However, before introducing experiments, there's a need for clearer definitions and explanations of these concepts and their known and unknown interrelationships. Given the complexity of attention, a more detailed discussion, including specific types and properties, would enhance reader comprehension.

      • Computational Modeling: The manuscript lacks clarity in explaining the model architecture and setup, and it's unclear if control comparisons were conducted. These details are critical for readers to properly interpret attention-related findings in the modeling section. Providing a clearer overview of these aspects will improve the overall understanding of the computational models used.

      I would like to thank the reviewer for the helpful comments and recommendations. The attention in the current study, which has been made clearer in the revised manuscript, refers specifically to visuospatial attention. It is presented as a key factor shaping the results of timing report obtained with the clock method, thereby contributing to the explanation of temporal binding. Indeed, attention has been mentioned previously in a similar context, but was treated vaguely as a kind of general cognitive resources. The current study specifically tested and verified that the visuospatial attention paid to the clock face influenced the timing reports. This point has been discussed in a dedicated paragraph in the discussion section of the revised manuscript.

      The modelling of the timing report using the attention data was based on a very simple idea: The clock hand location receiving more attention should be given more weight when participants made the timing report (i.e. reporting the clock hand position). The weight for each location was calculated using the detection rate at each location. The relevant methods section has been extensively revised to provide a step-by-step implementation of the modelling, with rationales and pitfalls in the interpretation of the modelling results given (also in the discussion section).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their constructive and critical comments/ suggestions regarding our paper. We have since extensively revised the manuscript accordingly, including the addition of new experimental data. Hope the readers, reviewers, and editors are now satisfied with the quality and significance of the revised paper.

      Our responses to the eLife assessment and the reviewers’ comment as well as the details of the revisions are described below.

      Wang et al present a useful manuscript that builds modestly on the group's previous publication on KLF1 (EKLF) K47R mice focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo (Shyu et al., Adv Sci (Weinh). 2022). The data demonstrates that Eklf (K74R) imparts these advantages in a background, age, and gender independent manner, not the consequence of the specific amino acid substitution, and transferable by BMT. However, the authors overstate the meaning of these results and the strength of evidence is incomplete, since only a melanoma model of cancer is used, it is unclear why only homozygous mutation is needed when only a small fraction of cells during BMT confer benefit, they do not show EKLF expression in any cells analyzed, and the PD-1 and PDL-1 experiments are not conclusive. The definitive mechanism relative to the prior publication from this group on this topic remains unclear.

      The issues in the assessment by the editor on our paper were also brought up by the reviewers. We have taken care of them by carrying out new experiments as well as rewriting of the paper to highlight the rationales and novel aspects of the current study, as described below in our responses to the three reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors Wang et al. present a study of a mouse model K74R that they claim can extend the life span of mice, and also has some anti-cancer properties. Importantly, this mechanism seems to be mediated by the hematopoietic system, and protective effects can be transferred with bone marrow transplantation.

      The authors need to be more specific in the title and abstract as to what is actually novel in this manuscript (a single tumor model), and what relies on previously published data (lifespan). Because many of these claims derive from previously published data, and the current manuscript is an extension of previously published work. The authors need to be more specific as to the actual data they present (they only use the B16 melanoma model) and the actual novelty of this manuscript.

      Especially experiments on life span are published and not sufficiently addressed in this actual paper, as the title would suggest.

      Indeed important to point out the novelty of this paper in comparison to the previous paper. First, we have modified the title, the abstract, and the text so to emphasize that the extended lifespan as well as tumor resistance could be transferred by from Eklf(K74R) mice to WT mice by a single transplantation of the Eklf(K74R) bone marrow mononuclear cells (BMT) to the WT mice at their young age (2 months).

      We now also provide several new experimental data including the one demonstrating that Eklf(K74R) mice are resistant to tumorigenesis of hepatocellular carcinoma as well (new Fig. 1E). These points are elaborated in more details below in my responses to the reviewers’ comments/ suggestions.

      Reviewer #2 (Public Review):

      The manuscript by Wang et al. follows up on the group's previous publication on KLF1 (EKLF) K47R mice and reduced susceptibility to tumorigenesis and increased life span (Shyu et al., Adv Sci (Weinh). Sep 2022;9(25):e2201409. doi:10.1002/ advs.202201409). In the current manuscript, the authors have described the dependence of these phenotypes on age, gender, genetic background, and hematopoietic translation of bone marrow mononuclear cells. Considering the current study is centered on the phenotypes described in the previous study, the novelty is diminished. Further, there are significant conceptual concerns in the study that make the inferences in the manuscript far less convincing. Major concerns are listed below:

      1) The authors mention more than once in the manuscript that KLF1 is expressed in range of blood cells including hematopoietic stem cells, megakaryocytes, T cells and NK cells. In the case of megakaryocytes, studies from multiple labs have shown that while EKLF is expressed megakaryocyte-erythroid progenitors, EKLF is important for the bipotential lineage decision of these progenitors, and its high expression promotes erythropoiesis, while its expression is antagonized during megakaryopoiesis. In the case of HSCs, the authors reference to their previous publication for KLF1's expression in these cells- however, in this study nor in the current study, there is no western blot documented to convincingly show that KLF1 protein is expressed at detectable levels in these cells. For T cells, the authors have referenced a study which is based on ectopic expression of KLF1. For NK cells, the authors reference bioGPS: however, upon inspection, this is also questionable.

      2) The current study rests on the premise that KLF1 is expressed in HSCs, NK cells and leukocytes, and the references cited are not sufficient to make this assumption, for the reasons mentioned in the first point. Therefore, the authors will have to show both KLF1 mRNA and protein levels in these cells, and also compare them to the expression levels seen in KLF1 wild type erythroid cells along with knockout erythroid cells as controls, for context and specificity.

      Regarding the novelties of the current story. Besides demonstration of the independence of the healthy longevity characteristics on age, gender, and genetic background, as exemplified by the tumor resistance, another novelty of the current study is that the healthy longevity characteristics, in particular the tumor resistance and extended lifespan, could be transferred by one-time long-term transplantation of the Eklf(K74R) bone marrow mononuclear cells from young Eklf(K74R) mice to young WT mice. Also, since submission of the last version of the paper, we have carried out new experiments, including the characterization of the anti-cancer capability of NK cells (new Fig. 6) as well as assay of the tumor-resistance of Eklf(K74R) mice to hepatocellular carcinoma (new Fig. 1E), etc.

      We have also modified the title, Abstract, and different parts of the text to highlight the novelties of the current study.

      As to the expression of EKLF in different hematopoietic blood cell types, we have now added a paragraph in Result (p.6 and p.7) describing what have been known in literature in relation to our data presented in the paper. Importantly, following the reviewer’s comments, we have since carried out Western blot analysis of EKLF expression in NK, T, and B cells (p. 6, p.7 and new Fig. S4B). Also noted is that the level of EKLF in B cells is very low and only could be detected by RT-qPCR (Fig. S4C) and RNA-Seq (Bio-GPS database)

      3) To get to the mechanism driving the reduced susceptibility to tumorigenesis and increased life span phenotypes in EKLF K74R mice, the authors report some observations- However, how these observations are connected to the phenotypes is unclear.

      a. For example, in Figure S3, they report that the frequency of NK1.1+ cells is higher in the mutant mice. The significance of this in relation to EKLF expression in these cells and the tumorigenesis and life span related phenotypes are not described. Again, as mentioned in the second point, KLF1 protein levels are not shown in these cells.

      b. In Figure 4, the authors show mRNA levels of immune check point genes, PD-1 and PD-l1 are lower in EKLF K74R mice in PB, CD3+ T cells and B220+ B cells. Again, the questions remain on how these genes are regulated by EKLF, and whether and at what levels EKLF protein is expressed in T cells and B cells relative to erythroid cells. Further, while the study they reference for EKLF's role in T cells is based on ectopic expression of EKLF in CD4+ T cells, in the current study, CD3+ T cells are used. Also, there are no references for the status of EKLF in B cells. These details are not discussed in the manuscript.

      Regarding this part of the questions and comments by the reviewer.

      First, we have since assayed the effect of the K74R substitution of EKLF on the in vitro cancer cell-killing ability of NK cells (termed NK1.1 cells in the previous version). The data showed that NK(K74R) cells have higher ability than the WT NK cells (new Fig. 6). This property together with the higher expression level of NK(K74R) cells in 24 month-old Eklf (K74R) mice than NK cells in 24 month-old WT mice would contribute to the higher tumor-resistance of the Eklf (K74R) mice. This point is also addressed on p. 8 andp.9.

      Second, as stated in previous sections, we have since carried out comparative Western blot analysis of the expression of EKLF protein in NK, CD3 T, and B cells of the WT and Eklf(K74R) mice, respectively (please see the new Fig. S4B). Also, description regarding what are known in literature in relation to our data on the expression of EKLF protein/ Eklf mRNA in different types of hematopoietic blood cells is now included in the Result (please see p.6 and p.7). Notably though, the level of EKLF protein in B cells was too low to be detected by WB (Fig. S4B).

      4) The authors perform comparative proteomics in the leukocytes of EKLF K74R and WT mice as shown in Figure S5. What is the status of EKLF levels in the mutant lysate vs wild type lysates based on this analysis? More clarity needs to be provided on what cells were used for this analysis and how they were isolated since leukocytes is a very broad term.

      The leukocytes used by us were isolated from the peripheral blood after removal of red blood cells, as described in the Materials and Methods.

      Also, the Western blot analysis of EKLF expression in the lysates of leukocytes/ white blood cells (WBC) has been shown previously, now presented in the new Figure S4A.

      5) In the discussion the authors make broad inferences that go beyond the data shown in the manuscript. They mention that the tumorigenesis resistance and long lifespan is most likely due to changes in transcription regulatory properties and changes in global gene expression profile of the mutant protein relative to WT leukocytes. And based on reduced mRNA levels of Pd-1 Pd-l1 genes in the CD3+ T cells and B220+ B cells from mutant mice, they "assert" that EKLF is an upstream regulator of these genes and regulates the transcriptomes of a diverse range of hematopoietic cells. The lack of a ChIP assay to show binding of WT EKLF on genes in these cells and whether this binding is reduced or abolished in the mutant cells, make the above statements unsubstantiated.

      We have since carried out ChIP-PCR analysis of EKLF-binding in the Pd-1 promoter (new Fig. S5). The data showed that EKLF was bound on the CACCC box at -103 of the promoter in WT CD3+T as well as in CD3+T(K74R) cells. This result is discussed on p.7.

      6) Where westerns are shown, the authors need to show the molecular weight ladder, and where qPCR data are shown for EKLF, it will be helpful to show the absolute levels and compare these levels to those in erythroid cells, along the corresponding EKLF knock out cells as controls.

      We have since included the molecular weight markers by the side of Western blots in Fig. S4. Also, we have added a new figure (Fig.S4C) showing the comparison of the expression levels of Eklf mRNA in B cells and CD3+ T cells to the mouse erythroleukemia (MEL) cells, as analyzed by RT-qPCR.

      Also, as indicated now in the Material and Methods section, the specificity of the primers used for RT-qPCR quantitation of mouse Eklf mRNA has been validated before by comparative analysis of wild type and EKLF-knockout mouse erythroid cells (Hung et al., IJMS, 2020).

      7) Figure S1D does not have a figure legend. Therefore, it is unclear what the blot in this figure is showing. In the text of the manuscript where they reference this figure, they mention that the levels of the mutant EKLF vs WT EKLF does not change in peripheral blood, while in the figure they have labeled WBCs for the blot, and the mRNA levels shown do seem to decrease in the mutant compared to WT peripheral blood.

      We apologize for this ignorance on our side. The data shown in the original Fig. SID (new Fig. S4A) are from Western blot analysis of EKLF protein and RT-qPCR analysis of Eklf mRNA in leukocytes/ white blood cells (WBC) isolated from the peripheral blood samples. We have now added back the figure legend and also rewritten the corresponding description in the text on p.6.

      Reviewer #3 (Public Review):

      Hung et al provide a well-written manuscript focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo. The work is fundamental and the data is convincing although several details remain incompletely elucidated. The major strengths of the manuscript include the clarity of the effect and the appropriate controls. For instance, the authors query whether Eklf (K74R) imparts these advantages in a background, age, and gender dependent manner, demonstrating that the findings are independent. In addition, the authors demonstrate that the effect is not the consequence of the specific amino acid substitution, with a similar effect on anticancer activity. Furthermore, the authors provide some evidence that PD-1 and PDL-1 are altered in Eklf (K74R) mice.

      Here we thank the encouraging comments by this reviewer.

      Finally, they demonstrate that the effects are transferrable with BMT. Several weaknesses are also evidence. For instance, only melanoma is tested as a model of cancer such that a broad claim of "anti-cancer activity" may be somewhat of an overreach.

      We have now included new data showing that the Eklf(K74R) mice also carry a higher anti-cancer ability against hepatocellular carcinoma than the WT mice (new Fig. 1E).

      It is also unclear why a homozygous mutation is needed when only a small fraction of cells during BMT can confer benefit. It is also difficult to explain how transplanted donor Eklf (K74R) HSCs confer anti-melanoma effect 7 and 14 days after BMT.

      First, these two observations not necessarily conflict with each other. It is likely that homozygosity, but not heterozygosity, of the K74R substitution in EKLF allows one or more types of hematopoietic blood cells to gain new functions, e.g. the higher cancer cell- killing capability of NK(K74R) cells (new Fig. 6), that help the mice to live long and healthy. Also, the data in Fig. 2D indicated that as low as 20% of the blood cells carrying homozygous Eklf(K74R) alleles in the recipient mice upon BMT could be sufficient to confer the mice a higher anti-cancer capability, likely in part due to cells such as NK(K74R). These points are now clarified in Discussion (p.9 and p.10).

      Second, we think the NK(K74R) cells contributed a significant part to the anti-cancer capability of the transplanted Eklf(K74R) blood in the recipient WT mice. As documented in some literature, e.g. Ferreira et al., Journal of Molecular Medicine (2019), the hematopoietic lineage of the NK cells would be fully reconstituted as early as 2 weeks after BMT. Of course, there could be other still unknown factors/ cells that also contribute to the tumor-resistance of the recipient mice at 7 day following BMT. This point is now touched upon on p.8 and p.9.

      Furthermore, it would be useful to see whether there are virulence marker alterations in the melanoma loci in WT vs Eklf (K74R) mice.

      As responded in the Public Reviews, we will analyze this in future together with other types of tumors in a separate study.

      Finally, the data in Fig 4c is difficult to interpret as decreased PD-1 and PDL-1 after knockdown of EKLF in vitro is not a useful experiment to corroborate how mutation without changing EKLF expression impacts immune cells. The work is impactful as it provides evidence that healthspan and lifespan may be modulated by specific hematological mutation but the mechanism by which this occurs is not completely elucidated by this work.

      As described in a previous section, we have since also carried out ChIP-qPCR analysis of the binding of WT EKLF and EKLF (K74R) on the Pd-1 promoter (new Fig. S5).

      Reviewer #1 (Recommendations For The Authors):

      The authors present interesting melanoma model data but need to tone down their claim of multiple effects of their model system. It needs to be clear what is new and what is previously known.

      As respond in the Public Reviews, we have since added new data on the tumor resistance of the Eklf(K74R) mice to hepatocellular carcinoma (new Fig. 1E). We have also modified the title as well as highlighted the novel points in the Abstract and text of the revised draft.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the major concerns listed in the public review, the minor concerns that the authors could address are listed below:

      1) Will be helpful to describe why was the pulmonary melanoma focus assay chosen for metastasis assay?

      We now describe on p. 4 the rationale behind the initial choice of this assay for analysis of the anti-cancer capability of the Eklf(K74R) mice. Also, we have since included data from experiment using the subcutaneous cancer cell inoculation assay for comparative analysis of the anti-hepatocellular carcinoma capability of Eklf(K74R) and WT mice (Fig. 1E and p.5).

      2) Reference #61 for B16-F10-luc cells cited in the methods does not have details on the generation of these cells. What these cells are and why this model was chosen needs to be described.

      Sorry about not providing this information before. We now describe the generation of B16F10-luc cells in the Material and Methods section (p.13). The rationale of choosing the B16-F10 cells for the pulmonary lung foci assay is also added on p.4.

      3) The DNA binding consensus site for EKLF needs to be expanded in the introduction.

      This part has been taken care of now on p.13.

      Reviewer #3 (Recommendations For The Authors):

      Hung et al provide a well-written manuscript focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo. The work is fundamental and the data is convincing although several details remain incompletely elucidated.

      1) Only melanoma is tested as a model of cancer such that a broad claim of "anti-cancer activity" may be somewhat of an overreach. The authors, therefore, need to provide evidence of a second type of malignancy to which Eklf mutation confers anticancer and longevity advantages or temper the claims in the discussion that the effect still needs to be tested in non-melanoma cancer models to determine the broad anti-cancer effect.

      As responded in the Public Reviews, we have since shown that Eklf(K74R) mice also exhibited a higher resistance to the carcinogenesis of hepatocellular carcinoma (new Fig. 1E).

      2) Why is a homozygous mutation needed when only a small fraction of cells during BMT can confer benefit of Eklf mutation? Is there evidence that the cellular effect is binary but only a few such cells are needed? This is confusing and requires further clarification.

      As responded in the Public Reviews, these two observations not necessarily conflict with each other. It is likely that homozygosity, but not heterozygosity, of the K74R substitution in EKLF allows one or more types of hematopoietic blood cells to gain new functions, e.g. the higher cancer cell- killing capability of NK(K74R) cells (new Fig. 6), that help the mice to live long and healthy. Also, the data in Fig. 2D indicated that as low as 20% of the blood cells carrying homozygous Eklf(K74R) alleles in the recipient mice upon BMT could be sufficient to confer the mice a higher anti-cancer capability, likely in part due to cells such as NK(K74R). This point is now clarified in Discussion (p.9).

      3) BMT typically requires at least 3-4 weeks to reconstitute the marrow compartment but the authors are able to see effects of Eklf mutation as early as 7 days following BMT. This is surprising and brings into question the mechanism of effect.

      As responded in the Public Reviews, we think the NK(K74R) cells contributed a significant part to the anti-cancer capability of the transplanted Eklf(K74R) blood in the recipient WT mice. As documented in some literature, e.g. Ferreira et al., Journal of Molecular Medicine (2019), the hematopoietic lineage of the NK cells would be fully reconstituted as early as 2 weeks after BMT. Of course, there could be other still unknown factors/ cells that also contribute to the tumor-resistance of the recipient mice at 7 day following BMT (please see discussion of this point on p. 9).

      4) It would be useful to see whether there are virulence marker alterations in the melanoma loci in WT vs Eklf (K74R) mice.

      As responded in the Public Reviews, we will analyze this in future together with other types of tumors in a separate study.

      5) The data in Fig 4c is difficult to interpret as decreased PD-1 and PDL-1 after knockdown of EKLF in vitro is not a useful experiment to corroborate how mutation WITHOUT changing EKLF expression impacts immune cells.

      Indeed, the RNAi knockdown experiment only demonstrated a positive regulatory role of EKLF in Pd1/Pd-l1 gene expression. We have followed the reviewer’s suggestion and carried out ChIP-qPCR analysis and shown that the factor is bound on the Pd-1 promoter in both WT CD3+T cells and CD3+T(K74R) cells (new Fig. S5). We briefly discuss these data on p.7 in relation to the possible effect of K74R substitution of EKLF on Pd-1 expression.

      We have now further clarified this point on p. 7.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on the very nice structure! In my opinion, which you can feel free to take or leave, this would work better as a short report focused on the improvement of the structure relative to the current published model. To my mind, while the functional and dimerization studies are supportive of the cryo-EM studies (specifically, the purified protein is functional, and does tend to dimerize in various membrane mimetics), these experiments don't provide a lot of new mechanistic insight on their own. The dimerization, in particular, could be developed further.

      Response: Thank you for the comments. We have chosen to stick with the current article format. That the protein is dimeric is exciting in our view and we are working to further define the functional significance of this formation.

      Reviewer #2 (Recommendations For The Authors):

      Ln 48. Abstract. "highlighting feature of the complex interface" sounds a bit vague. I was wondering if the authors considered including more specific findings here.

      Response: This sentence has been removed.

      Ln 149 and elsewhere. The authors refer to the previously published structure of HiSiaQM as "low resolution". It may just be me and likely not the intention of the authors, but this comes across as an attempt to diminish the validity of this previous work from another group, which is not necessary. I would recommend rewording these parts slightly, even if it is just to say "lower resolution" instead of "low resolution".

      Response: It was not our intention to diminish the excellent work published by another group, we have changed “low resolution” to “lower resolution” throughout.

      Ln 160. The authors state that the inward-open conformation is likely "the resting state of the transporter". I think this statement should be modified slightly to acknowledge that this is only true under these conditions, i.e. in the absence of the bilayer, membrane potential and chemical gradients.

      Response: We have edited this as follows “That we observe the inward-open conformation without either a bound P-subunit or fiducial marker, suggests that this is the resting state of the transporter under experimental conditions (in the absence of a membrane bilayer, membrane potential and chemical gradients).”

      Ln 202. I'm not convinced that the use of the word "probable" is appropriate here; "possible" would likely fit better in the absence of compelling evidence that this dimer forms in a bacterial cell membrane with physiological levels of HiSiaQM expression.

      Response: We have changed “probable” to “possible”.

      The authors show an SEC trace for DDM solubilised protein, which is a single peak, whereas the LMNG extracted protein has 2 distinctly different elution profiles depending on the LMNG concentration. Was the same phenomenon observed when varying the DDM concentration?

      Response: We observed significantly more aggregation with DDM than L-MNG, so it was infrequently used and considerably less well characterised. In one purification, moderately higher DDM shifted the elution peak to be slightly later but retained a similar profile. Overall, we did not observe the same phenomenon of distinctly different elution profiles with DDM, but we have limited data.

      Ln 245. The two positions cited as important for the elevator-type mechanism are the fusion helix and the dimer interface. However, there is no evidence that the dimer interface observed in this work has any relevance to the transport mechanism. To make this statement, the interface would need to be disrupted and the effects on transport evaluated.

      Response: This has been edited as follows. “Evident in our cryo-EM maps are well-defined phospholipid densities associated with areas of HiSiaQM that may be important for the function of an elevator-type mechanism (Figure 4), but require further testing.”

      Ln 257. The authors state that the lipids form "specific and strong interactions" with the protein, but without knowing the identity of the lipids present, it is difficult to say anything about the specificity of this interaction. I think the authors could consider rewording this. Response: We have edited this by removing the term “specific” and describing the lipid interactions only as strong interactions.

      Ln 270. The authors identify a lipid-binding site and residues that likely interact with the headgroup. It would be interesting if the authors could speculate on the purpose of this lipid binding site and how it could affect transport. The residues are not conserved, which the authors suggest reflects the variety of lipid compositions in different bacteria. Are the authors suggesting that this lipid binding site is a general feature for all fused TRAP transporters and that the identity of the lipid changes depending on the species?

      Response: Yes, we speculate that the lipid binding site may be a general feature for fused TRAP transporters. We have added speculation about this binding site, specifically that “the fusion helix and concomitant lipid molecule may provide a more structurally rigid scaffold than a Q-M heterodimer, i.e., PpSiaQM, although how this impacts the elevator transition requires further testing” at Line 283.

      Though we believe that a binding pocket is likely found in a number of fused TRAPs (based on sequence and Alphafold predictions, e.g., FnSiaQM and AaSiaQM), we have now acknowledged that some fusions may not necessarily bind a lipid molecule here, by stating “While this binding pocket is likely found in a number of fused TRAPs (based on sequence predictions, e.g., FnSiaQM and AaSiaQM in Supplementary Figure 8), it is not clear whether they also bind lipids here without experimental data” at Line 290.

      Ln 306. The authors state that the HiSiaPQM has a 10-fold higher transport activity than PpSiaPQM. Unless the transport assays were performed in parallel (to mitigate small changes in experimental set-up) and the reconstitution efficiency for each proteoliposome preparation was carefully analysed, it is very difficult for this to be a meaningful comparison. Even if the amount of protein incorporated into the proteoliposomes is quantified (e.g. by evaluating protein band intensity when the proteoliposomes are analysed using SDS-PAGE), this does not account for an inactive protein that was incorporated, nor the proportion of the protein that was incorporated in the inside-out orientation, which would be functionally silent in these assays. I'm not suggesting these assays actually need to be performed, but I think the text should be modified to reflect what can actually be compared.

      Response: We agree with the reviewer that a meaningful comparison is difficult to make without a careful analysis of the reconstitution efficiency and have modified the text to reflect this. We have altered the paragraph beginning at Line 319 to the following: “The fused HiSiaPQM system appears to have a higher transport activity than the non-fused PpSiaPQM system. With the same experimental setup used for PpSiaPQM (5 M Neu5Ac, 50 M SiaP) (33), the accumulation of [3H]-Neu5Ac by the fused HiSiaPQM is ~10-fold greater. Although this difference may reflect the reconstitution efficiency of each proteoliposome preparation, it is possible that it has evolved as a result of the origins of each transporter system—P. profundum is a deep-sea bacterium and as such the transporter is required to be functional at low temperatures and high pressures… ”

      Ln 335. "S298A did not show an effect on growth when mutated to alanine previously." Suggest changing "S298A" here to "S298".

      Response: This has been changed.

      Ln 340. In addition to PpSiaQM, the large cavity was also presumably observed in the lower resolution structure of HiSiaQM?

      Response: The cavity is detectable in the lower resolution structure (7qe5), though very poorly defined by the density. Furthermore, the AlphaFold model fitted to this density has positioned sidechains inside the cavity, which we consider very likely to be an error (in comparison to our structures, VcINDY and our estimates of the volume required to house sialic acid). The cavity is generally much better defined by the structures we have referenced.

      Ln 345. Reference missing after "previously reported"? Response: This has been added. Measuring the affinity for the P-to-QM interaction is very useful, but it would have enhanced the study if some of the residues identified as important for this interaction (detailed on p.13) had been tested for their contributions to binding using this approach.

      Response: We do aim to perform this assay with these mutants in the future, but are also developing parallel assays to further test this interaction in different membrane mimetics.

      Ln 436. As stated previously, it is more accurate to say that "this is the most stable conformation" under these conditions.

      Response: We have edited this to say “The ‘elevator down’ (inward-facing) conformation is preferred in experimental conditions”. We have also changed the last sentence of this paragraph to say “However, the dimeric structures we have presented have no other proteins bound, yet exist stably in the elevator down state, suggesting this is the most stable conformation in experimental conditions, where there is no membrane bilayer, membrane potential, or chemical gradient present.”

      Ln 438. "Lipids associated with HiSiaQM are structurally and mechanistically important." This conclusion is not supported by the data presented; there is no evidence that the bound lipids influence the mechanism at all. The lipids observed are certainly interestingly placed and one could speculate about their relevance, but this statement of fact is not supported. Therefore, their importance to the mechanism needs to be tested or this conclusion needs to be substantially softened.

      Response: We have softened this statement by changing it to “Lipids have strong interactions with HiSiaQM and are likely to be important for the transport mechanism.”

      Reviewer #3 (Recommendations For The Authors):

      The fact that HiSiaQM samples consist of a mixture of compact monomer and dimer is clear, from Fig. S5 and S6. However, the analysis displayed in Fig 3 and Fig S4 would require more explanation. To my understanding, it requires the values of the sedimentation and diffusion coefficients. It could be good to provide the experimental values of D, and explain a little more about the method in the material and method section.

      Response: Yes, the analysis requires the experimental diffusion coefficients. These have been added to the Figure 3 and S4 legends and more detail has been added to the method section.

      In addition, I am puzzled when reading, in the legend of Fig 3, considerations that peak 2 could not correspond to a monomer or trimer: do these sentences correspond to other mathematical solutions, or is a given frictional ratio considered, or do they refer to Fig. S5 analysis?

      We can see where this confusion could arise from. These sentences do not correspond to a given frictional ratio or the Fig. S5 analysis (this is a separate, complementary analysis). For peak 2 not existing as a monomer is strictly a physical justification – with pure protein and an observed peak smaller than peak 2, a monomer is not possible for peak 2. For peak 2 not existing as a trimer is a mathematical solution using the s and D coefficients. The solutions identify that an unreasonably low amount of detergent would be bound to a trimer (32 molecules for L-MNG or 0 for DDM) to exist at those s and D values so we have ruled the trimer out. Reassuringly, the complementary analysis in Fig. S5/S6 agrees with the monomer-dimer outputs from the s and D analysis. We have adjusted the text in the legends of Fig. 3 and S4 to better convey these points.

    1. Author Response

      eLife assessment

      This useful study uses a mouse model of pancreatic cancer to examine mitochondrial mass and structure in atrophying muscle along with aspects of mitochondrial metabolism in the same tissue. Most relevant are the solid transcriptomics and proteomics approaches to map out related changes in gene expression networks in muscle during cancer cachexia.

      Response: We very much appreciate the positive feedback from the editors on our article and are delighted to have it published in eLife. Our sincere thanks to the Reviewers for their positive feedback on our work, and for their insightful and constructive comments.

      Reviewer #1 (Public Review):

      Summary:

      This important study provides a comprehensive evaluation of skeletal muscle mitochondrial function and remodeling in a genetically engineered mouse model of pancreatic cancer cachexia. The study builds upon and extends previous findings that implicate mitochondrial defects in the pathophysiology of cancer cachexia. The authors demonstrate that while the total quantity of mitochondria from skeletal muscles of mice with pancreatic cancer cachexia is similar to controls, mitochondria were elongated with disorganized cristae, and had reduced oxidative capacity. The mitochondrial dysfunction was not associated with exercise-induced metabolic stress (insufficient ATP production), suggesting compensation by glycolysis or other metabolic pathways. However, mitochondrial dysfunction can lead to increased production of ROS/oxidative stress and would be expected to interfere with carbohydrate and lipid metabolism, events that are linked to cancer-induced muscle loss. The data are convincing and were collected and analyzed using state-of-the-art techniques, with unbiased proteomics and transcriptomics analyses supporting most of their conclusions.

      Additional Strengths:

      The authors utilize a genetically engineered mouse model of pancreatic cancer which recapitulates key aspects of human PDAC including the development of cachexia, making the model highly appropriate and translational.

      The authors perform transcriptomic and proteomics analyses on the same tissue, providing a comprehensive analysis of the transcriptional networks and protein networks changed in the context of PDAC cachexia.

      Weaknesses:

      The authors refer to skeletal muscle wasting induced by PDAC as sarcopenia. However, the term sarcopenia is typically reserved for the loss of skeletal muscle mass associated with aging.

      Response: We agree that the term sarcopenia initially refers to aged muscle, but its use has spread to other fields, including oncology (for example, in this article, which we quote: Mintziras I et al. Sarcopenia and sarcopenic obesity are significantly associated with poorer overall survival in patients with pancreatic cancer: Systematic review and meta-analysis. Int J Surg 2018;59:19-26). Actually, the term sarcopenia is now widely used in the literature and in the clinic to describe the loss of muscle mass and strength in cancer patients (see for example, this recent review: Papadopetraki A. et al. The Role of Exercise in Cancer-Related Sarcopenia and Sarcopenic Obesity. Cancers 2023;15;5856).

      In Figure 2, the MuRF1 IHC staining appears localized to the extracellular space surrounding blood vessels and myofibers-which causes concern as to the specificity of the antibody staining. MuRF1, as a muscle-specific E3 ubiquitin ligase that degrades myofibrillar proteins, would be expected to be expressed in the cytosol of muscle fibers.

      Response: We agree that MuRF1 IHC staining was also observed in the extracellular space, which was a surprise, for which we have no explanation to date.

      Disruptions to skeletal muscle metabolism in PDAC mice are predicted based on mitochondrial dysfunction and the transcriptomic and proteomics data. The manuscript could therefore be strengthened by additional measures looking at skeletal muscle metabolites, or linking the findings to previous work that has looked at the skeletal muscle metabolome in related models of PDAC cachexia (Neyroud et al., 2023).

      Response: We agree that our omics data could be strengthened by additional measures looking at skeletal muscle metabolites. It's an excellent suggestion to parallel the transcriptomic and proteomic data we obtained on the gastrocnemius muscle with the metabolomic data obtained by Neyroud et al. on the same muscle. These authors used another mouse model of PDAC than our KIC GEMM model, namely the allograft model implanting KPC cells (derived from the pancreatic tumor of KPC mice, another PDAC GEMM model) into syngeneic recipient mice. They carried out a proteomic study on the tibialis anterior muscle and a metabolomic study on the gastrocnemius muscle. Proteomics data identified in particular a KPC-induced reduction in the relative abundance of proteins annotating to oxidative phosphorylation, consistently with our data showing reduced mitochondrial activity pathways. Metabolomic data showed reduced abundance of many amino acids as expected, and of intermediates of the mitochondrial TCA cycle (malate and fumarate) in KPC-atrophied muscle consistently with reduced mitochondrial metabolic pathways that we illustrated. In contrast, metabolites that were increased in abundance included those related to oxidative stress and redox homeostasis, which is not surprising regarding the profound oxidative stress affecting atrophied muscle. Finally, we noted in Neyroud's metabolomic data the dysregulation of certain lipids and nucleotides in atrophied muscle, which is very interesting to relate to our study describing alterations in lipid and nucleotide metabolic pathways.

      Reviewer #2 (Public Review):

      The present work analyzed the mitochondrial function and bioenergetics in the context of cancer cachexia induced by pancreatic cancer (PDAC). The authors used the KIC transgenic mice that spontaneously develop PDAC within 9-11 weeks of age. They deeply characterize bioenergetics in living mice by magnetic resonance (MR) and mitochondrial function/morphology mainly by oxygraphy and imaging on ex vivo muscles. By MR they found that phosphocreatine resynthesis and maximal oxidative capacity were reduced in the gastrocnemius muscle of tumor-bearing mice during the recovery phase after 6 minutes of 1 Hz electrical stimulation while pH was reduced in muscle during the stimulation time. By oxygraphy, the authors showed a decrease in basal respiration, proton leak, and maximal respiration in tumor-bearing mice that was associated with the decrease of complex I, II, and IV activity, a reduction of OXPHOS proteins, mitochondrial mass, mtDNA, and to several morphological alterations of mitochondrial shape. The authors performed transcriptomic and proteomic analyses to get insights into mitochondrial defects in the muscles of PDAC mice. By IPA analyses on transcriptomics, they found an increase in the signature of protein degradation, atrophy, and glycolysis and a downregulation of muscle function. Focusing on mitochondria they showed a downregulation mainly in OXPHOS, TCA cycle, and mitochondrial dynamics genes and upregulation of glycolysis, ROS defense, mitophagy, and amino acid metabolism. IPA analysis on proteomics revealed major changes in muscle contraction and metabolic pathways related to lipids, protein, nucleotide, and DNA metabolism. Focusing on mitochondria, the protein changes mainly were related to OXPHOS, TCA cycle, translation, and amino acid metabolism.

      The major strength of the paper is the bioenergetics and mitochondrial characterization associated with the transcriptomic and proteomic analyses in PDAC mice that confirmed some published data of mitochondrial dysfunction but underlined some novel metabolic insights such as nucleotide metabolism.

      There are minor weaknesses related to some analyses on mitochondrial proteins and to the fact that proteomic and transcriptomic comparison may be problematic in catabolic conditions because some gene expression is required to maintain or re-establish enzymes/proteins that are destroyed by the proteolytic systems (including the autophagy proteins and ubiquitin ligases). The authors should consider the following points.

      Point 1. The authors used the name sarcopenia as synonymous with muscle atrophy. However, sarcopenia clearly defines the disease state (disease code: ICD-10-CM (M62.84)) of excessive muscle loss and force drop during ageing (Ref: Anker SD et al. J Cachexia Sarcopenia Muscle 2016 Dec;7(5):512-514.). Therefore, the word sarcopenia must be used only when pathological age-related muscle loss is the subject of study. Sarcopenia can be present in cancer patients who also experience cachexia, however since the age of tumor-bearing mice in this study is 7-9 weeks old, the authors should refrain from using sarcopenia and instead replace it with the words muscle atrophy/ muscle wasting/muscle loss.

      Response: This issue has also been raised by the Reviewer #1. We agree that the term sarcopenia historically refers to aged muscle, but it is also used in oncology (for example, in this article, which we quote: Mintziras I et al. Sarcopenia and sarcopenic obesity are significantly associated with poorer overall survival in patients with pancreatic cancer: Systematic review and meta-analysis. Int J Surg 2018;59:19-26). Actually, the term sarcopenia is now widely used in the literature and in the clinic to describe the loss of muscle mass and strength in cancer patients (see for example, this recent review: Papadopetraki A. et al. The Role of Exercise in Cancer-Related Sarcopenia and Sarcopenic Obesity. Cancers 2023;15;5856).

      Point 2. Most of the analyses of mitochondrial function are appropriate. However, the methodological approach to determining mitochondrial fusion and fission machinery shown in Fig. 5F is wrong. The correct way is to normalize the OPA1, MFn1/2 on mitochondrial proteins such as VDAC/porin. In fact, by loading the same amount of total protein (see actin in panel 5F) the difference between a normal and a muscle with enhanced protein breakdown is lost. In fact, we should expect a decrease in actin level in tumor-bearing mice with muscle atrophy while the blots clearly show the same level due to the normalization of protein content. Moreover, by loading the same amount of proteins in the gel, the atrophying muscle lysates become enriched in the proteins/organelles that are less affected by the proteolysis resulting in an artefactual increase. The correct way should be to lyse the whole muscle of control and tumor-bearing mice in an identical volume and to load in western blot the same volume between control cachectic muscles. Alternatively, the relative abundance of mitochondrial shaping proteins related to mitochondrial transmembrane or matrix proteins (mito mass) should compensate for the loading normalization. Because the authors showed elongated mitochondria despite mitophagy genes being up, fragmentation may be altered. Moreover, DNM1l gene is suppressed and therefore DRP1 protein must be analyzed. Finally, OPA 1 protein has different isoforms due to the action of proteases like OMA1, and YME1L that elicit different functions being the long one pro-fusion while the short ones do not. The authors must quantify the long and short isoforms of OPA1.

      Response: We acknowledge that our analysis of a minor set of proteins involved in mitochondrial dynamics by Western blotting (Figure 5F) is basic and could have been improved. We thank the Reviewer for all the suggestions, which will be very useful in future projects studying the subject in greater depth and according to the molecular characteristics of each player in mitochondrial fusion, fission, mitophagy and biogenesis.

      Point 3. The comparison of proteomic and transcriptomic profiles to identify concordance or not is problematic when atrophy programs are induced. In fact, most of the transcriptional-dependent upregulation is to preserve/maintain/reestablish enzymes that are consumed during enhanced protein breakdown. For instance, the ubiquitin ligases when activated undergo autoubiquitination and proteasome degradation. The same happens for several autophagy-related genes belonging to the conjugation system (LC3, Gabarap), the cargo recognition pathways (e.g. Ubiquitin, p62/SQSTM1) and the selective autophagy system (e.g. BNIP3, PINK/PARKIN) and metabolic enzymes (e.g. GAPDH, lipin). Finally, in case identical amounts of proteins have been loaded in mass spec the issues rise in point 2 of selective enrichment should be considered. Therefore, when comparing proteomic and transcriptomic these issues should be considered in discussion.

      Response: We fully agree with the Reviewer that seeking concordance between transcriptomic and proteomic data in the case of an organ affected by a high level of proteolysis is a difficult business. Another major difficulty we discussed in the Discussion section of the article is the fact that there is no concordance between RNA and protein level for a good proportion of proteins, for multiple reasons, so each level of omics has to be interpreted independently to give information on the pathophysiology of the organ studied.

    1. Author Response

      We thank the editors and reviewers for taking the time to provide a critical assessment of our manuscript. We are delighted our work was found to have merit, and will revise the manuscript based on their valuable input.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors):

      Major comments:

      1) The immunolabeling data in Figure S4 shows no change in puncta number but reduced puncta size in Kit KO. sIPSC data show reduced frequency but little change in amplitude. These data would seem contradictory in that one suggests reduced synaptic strength, but not number, and the other suggests reduced synapse number, but not strength. How do the authors reconcile these results?

      Regarding the synaptic puncta, In Kit KO (or KL KO), we have not detected an overt reduction in the average VGAT/Gephyrin/Calbindin positive puncta density or puncta size per animal. With respect to puncta size, only in the Kit KO condition, and only when individual puncta are assessed does this modest (~10%) difference in size become statistically significant. In the revision, we eliminate this figure and focus on the per animal averages.

      We interpret that the reduction in sIPSC and mIPSC frequency likely stems from a decreased proportion of functional synapse sites. The number of MLIs, their action potential generation, the density of synaptic puncta, and the ability of direct stimulation to evoke release and equivalent postsynaptic currents, are all similar in Control vs Kit KO. It is therefore feasible that a reduced frequency of postsynaptic inhibitory events is due to a reduced ability of MLI action potentials to invade the axon terminal, and/or an impaired ability for depolarization to drive (e.g. coordinated calcium flux) transmitter release. That is, while the number of MLIs and their synapses appear similar, the reduced mIPSC frequency suggests that there is a reduced proportion of, or probability that, Kit KO synapse sites that function properly.

      2) Related to point 1, it would be helpful to see immunolabeling data from Kit ligand KO mice? Do these show the same pattern of reduced puncta size but no change in number?

      Although we have not added a figure, we have now added experiments and a corresponding analysis in the manuscript. As we had previously for Kit KO, we now for KL KO conducted IHC for VGAT, Gephyrin, and Calbindin, and we analyzed triple-positive synaptic puncta in the molecular layer of Pcp2 Cre KL KO mice and Control (Pcp2 Cre negative, KL floxed homozygous) mice. We did not find a gross reduction in the average synaptic puncta size or density, or in the PSD-95 pinceau size. From this initial analysis, it appears that the presynaptic hypotrophy is more notable in the receptor than in the ligand knockout. We speculate that this is perhaps because the Kit receptor may have basal activity in the absence of Kit ligand, that Kit may serve a presynaptic scaffolding role that is lost in the receptor (but not the ligand) knockout, or simply that the embryonic timing of the Pax2 Cre vs Pcp2 Cre recombination events is more relevant to pinceaux development, especially as basket cells are born primarily prenatally.

      3) The data using KL overexpression in PC (figure 4E,F) are intriguing, but puzzling. The reduction in sIPSC frequency and amplitude in the control PC is much greater than seen in the Kit or KL KO. The interpretation of these data, "Thus, KL-Kit levels may not set the number of MLI:PC release sites, but may instead influence the proportion of synapses that are functional for neurotransmission (Figure 4G)" is not clear and the reasoning here should be explained in more detail, perhaps in the discussion.

      We have attempted to clarify this portion of the manuscript by eliminating the cartoon of the proposed model, and by revising and adding to the discussion. Either MLI Kit KO or PC KL KO seems to preserve the absolute number of MLI:PC anatomical synapse sites (IHC) but to reduce the proportion of those synapse that are contributing to neurotransmission (mIPSC). We speculate that sparse PC KL overexpression (OX) may either 1) weaken inhibition to surrounding control PCs by either diminishing KL OX PC to KL Control PC inhibition, and/or 2) act retrogradely through MLI Kit to potentiate MLI:MLI inhibition, reducing the MLI:PC inhibition at neighboring Control PCs.

      Minor comments:

      1) In the first sentence of the results, should "Figure 1A, B" be "Figure C, D"?

      Yes, corrected.

      2) The top of page 6 states "the mean mIPSC amplitude was ~10% greater in PC KL KO than in control", this does not appear to be the case in Figure 3E. control and KL KO look very similar here.

      In this portion of the text citing the modest 10% increase in mIPSC amplitude, we are referring to the average amplitude of all individual mIPSC events in the PC KL KO condition; in the figure referred to by the reviewer (3E), we are instead referring to the average of all mIPSC event amplitudes per KL KO PC. Because of the dramatic difference in sample size for individual events vs cells, this modest difference rises to statistical, if not biological, significance. We include this individual event analysis only to suggest that, since we in fact saw a slightly higher event amplitude in the KL KO condition, it is unlikely that a reduced amplitude would have been a technical reason that we detected a lower event frequency.

      3) Figure 3 D, duration, y-axis should be labelled "ms"

      Event duration is no longer graphed or referenced. This has been replaced with total inhibitory charge.

      Reviewer #2 (Recommendations For The Authors):

      Methods:

      • Pax2-Cre line: embryonal Cre lines sometimes suffer from germline recombination. Was this evaluated, and if yes, how?

      The global loss of Kit signaling is incompatible with life, as seen from perinatal lethality in other Kit Ligand or Kit mutant mouse lines or other conditional approaches. Furthermore, a loss of Kit signaling in germ cells impedes fertility. Thus, while not explicitly ruled out, since conditional Pax2 Cre mediated Kit KO animals were born, survived, and produced offspring in normal ratios, we do not suspect that germline recombination was a major issue in this specific study.

      • Include rationale for using different virus types in different studies (AAV vs. Lenti).

      This rationale is now included and reflects the intention to achieve infection sparsity in the smaller and less dense tissue of perinatal mouse brains.

      • How, if at all, was blinding performed for histological and electrophysiological experiments?

      It was not possible for electrophysiology to be conducted blinded for the Kit KO experiments, owing to the subjects’ hypopigmentation. However, whenever feasible, resultant microscopy images or electrophysiological data sets were analyzed by Transnetyx Animal ID, and the genotypes unmasked after analysis.

      • Provide justification for limiting electrophysiology recordings to lobule IV/V and why MLIs in the middle third of the molecular layer were prioritized when inhibition of PCs is dominated by large IPSCs from basket cells. Why were 2 different internals used for recording IPSCs and EPSCs in PCs and MLIs? While that choice is justified for action potential recordings, it provides poor voltage control in PC voltage clamp. Both IPSCs and EPSCs could have been isolated pharmacologically using a CsCl internal.

      The rationale for regional focus has been added to the text. For MLI action potential recordings, we opted to sample the middle third of the molecular layer so that we would not be completely biased to either classic distal stellate vs proximal basket subtypes. It is our hope, in future optogenetic interrogations, to simultaneously record the dynamics of all MLI subtypes in a more unbiased way. With respect to internal solutions, we initially utilized a cesium chloride internal to maximize our ability to resolve differences in GABAA mediated currents, which was the hypothesis-driven focus of our study. While we agree that utilizing a single internal and changing the voltage clamp to arrive at per-cell analysis of Excitatory/Inhibitory input would have been most informative, our decision to utilize pharmacological methods was driven by our experience that achieving adequate voltage clamp across large Purkinje cells was often problematic, particularly in adult animals.

      Introduction:

      In the introduction, the authors state that inactivating Kit contributes to neurological dysfunction - their examples highlight neurological, psychiatric, and neurodevelopmental conditions.

      The language has been changed.

      General:

      Using violin plots illustrates the data distribution better than bar graphs/SEM.

      We have included violin plots throughout, and we have changed p values to numeric values, both in the interest of presenting the totality of the data more clearly.

      Synapses 'onto' PCs sounds more common than 'upon' PCs.

      We have changed the wording throughout.

      Figure 1:

      1F - there seems to be an antero-posterior gradient of Kit expression.

      Though not explicitly pursued in the manuscript, it is possible that such a gradient may reflect differences in the timing of the genesis and maturation of the cerebellum along the AP axis. Regional variability is however now briefly addressed as a motivator for focused studies within lobules IV/V.

      E doesn't show male/female ratios but only hypopigmentation.

      This language has been corrected.

      Figure 2 and associated supplementary figures:

      2A/B: The frequency of sIPSCs is very high in PCs, making the detection of single events challenging. How was this accomplished? Please add strategy to the methods.

      We have added methodological detail for electrophysiology analysis.

      How were multi-peak events detected and analyzed? 'Duration' is not specified - do the authors refer to kinetics? If so, report rise and decay. It is likely impossible to show individual aligned sIPSCs with averages superimposed, given that sIPSCs strongly overlap. Alternatively, since no clear baseline can be determined in between events, and therefore frequency, amplitude, and kinetics quantification is near-impossible, consider plotting inhibitory charge.

      Given the heterogeneity of events, we now do not refer to individual event kinetics. As suggested, we have now included an analysis of the total inhibitory charge transferred by all events during the recording epoch.

      S2: Specify how density, distribution, and ML thickness were determined in methods. How many animals/cells/lobules?

      For consistency with viral injections and electrophysiology, the immunohistochemical analysis was restricted to lobule IV/V. This is clearer in the revision and detail is added in the methods.

      S3:

      S3B: the labels of Capacitance and Input resistance are switched.

      This has been corrected.

      How were these parameters determined? Add to methods.

      Added

      In the previous figure the authors refer to 'frequency', in this figure to 'rate' - make consistent

      This has been corrected.

      D: example does not seem representative. Add amplitude of current pulse underneath traces.

      We added new traces from nearer the group means and we now include the current trace.

      F/G example traces (aligned individual events + average) are necessary.

      We added example traces near the relevant group means for each condition.

      Statement based on evoked IPCSs that 'synapses function normally' is a bit sweeping and can only be fully justified with paired recordings. Closer to the data would be the release probability of individual synapses is similar between control and Kit KO.

      Paired recordings in both Kit Ligand and Kit receptor conditional knockout conditions is indeed an informative aim of future studies should support permit. For now, we have clarified the language to be more in line with the reviewer’s welcome suggestion.

      S4:

      Histological strategy cannot unambiguously distinguish MLI-PC and PC-PC synapses. Consider adding this confound to the text.

      We have added this confound to the discussion.

      The observation that the pinceau is decreased in size could have important implications for ephaptic coupling of MLI and PC and could be mentioned.

      We agree and have added this notion to the discussion.

      Y-label is missing in B.

      Corrected.

      Figure 3 and associated supplementary figures:

      In the text, change PC-Cre to L7-Cre or Pcp2-Cre.

      Changed

      How do the authors explain a reduction in frequency, amplitude, and duration of sIPSCs in the KL KO but not in the Kit KO? Add to the discussion

      We now address this apparent discordance in the discussion. Pax2 Cre mediates recombination weeks ahead of Pcp2 Cre. We therefore suspect that postnatal PC KL KO may be more phenotypic than embryonic MLI Kit KO because there is less time for developmental compensation. A future evaluation of the impact of postnatal Kit KO would be informative to this end.

      As in Figure 2, plotting the charge might be more accurate.

      We now plot total charge transfer.

      Are the intrinsic properties in KL KO PCs altered? (Spontaneous firing, capacitance, input resistance).

      We have added to the text that we found no difference in capacitance or input resistance between Purkinje cells from KL floxed homozygous Control animals versus those from KL floxed homozygous, PCP2 Cre positive KL KO animals. We plan to characterize both basal and MLI modulated PC firing in a future manuscript, especially since Pcp2 Cre mediated KL KO seems more phenotypic than Pax2 Cre mediated Kit KO, we agree that this seems a better testbed for investigating differences in both the basal, and the MLI-mediated modulations in, PC firing.

      3D-F - Example traces would be desirable (see above, analogous to Fig. 2).

      More example traces have been added.

      Figure 4: 'In vivo mixtures' sounds unusual. Consider revision (e.g., 'to sparsely delete KL').

      Changed

      The observation that control PC sIPSC frequency is lower in KL OX PCs than in sham is interesting. This observation would be consistent with overall inhibitory synapse density being preserved. This could be evaluated with immunohistochemistry. For how far away from the injection area does this observation hold true?

      Because we have now analyzed and failed to find an overt (per animal average) change in synaptic puncta size or density in the whole animal Control vs PCP2 Cre mediated KL KO conditions, we do not have confidence that it is feasible to pursue this IHC strategy in the sparse viral-mediated KL KO or OX conditions. To the reviewer’s valid point however, we intend to probe the spatial extent/specificity of the sparse phenomenon when we are resourced to complement the KL/Kit manipulations with transgenic methods for evaluating MLI-PC synapses specifically, potentially by GRASP or related methods that would not be confounded by PC-PC synapses. Transgenic MLI access would also facilitate determining the spatial extent to which opto-genetically activated MLIs evoke equivalent responses in Control vs KL manipulated PCs.

      Y-legend in D clipped.

      Corrected

      Existing literature suggests that MLI inhibition regulates the regularity of PC firing - this could be tested in Kit and KL mutants.

      For now, based upon transgenic animal availability, we have now included an evaluation of PC firing in the (Pax2 Cre mediated) Kit KO condition. PC average firing frequency, mean ISI, and ISI CV2 were not significantly different across genotypes. A KS test of individual ISI durations for Control vs Kit KO did reveal a difference (p<0.0001). We have added a supplementary figure (S6) with this data. It is possible that in the more phenotypic PC KL KO condition that we may find a difference in these PC spiking patterns of PC firing, however, we are also eager to test in future studies whether postnatal KL or Kit KO impairs the ability of MLI activation to produce pauses or other alterations in PC firing or in PF-PC mediated plasticity.

      Reviewer #3 (Recommendations For The Authors):

      Reference to Figure 1A in the Results section is slightly inaccurate. Kit gene modifications are illustrated in Figures 1A, B. Where Figure 1A shows Kit distribution. Please rephrase. Relatedly, the reference to Figs 1B - D are shifted in the results section, and 1E is skipped.

      We have changed the text.

      Please show cumulative histograms for frequency too for consistency with amplitude (e.g. Fig 2).

      We have instead, for reasons outlined by other reviewers, documented total charge transfer for both Kit KO and KL KO experiments where sIPSC events were analyzed.

      Fig S3: include example traces of PPR.

      This is now included.

      Include quantifications of GABAergic synapse density in Fig S4.

      This is now included.

      Include inset examples of KO in Fig S4A.

      This is now included.

      Add average puncta size graphs along Figure S4B. The effect apparent in the histogram of S4B is small and statistics using individual puncta as n values (in the 20,000s) therefore misleading.

      Per animal analysis is now instead included in the figure and text.

      Figure S4B y axis label blocked.

      Corrected

      Include quantification referenced in "As PSD95 immunoreactivity faithfully follows multiple markers of pinceaux size 40, we quantified PSD95 immunoreactive pinceau area and determined that pinceaux area was decreased by ~50% in Kit KO (n 26 Control vs 43 Kit KO, p<0.0001, two-tailed t-test)."

      We added a graph of per animal averages, instead of in text individual pinceau areas.

      Include antibody dilutions in the methods.

      Added.

      It's unclear from the text where the Mirow lab code comes from.

      Detail has now been added in text.

      Typo in methods "The Kit tm1c alle was bred...".

      Corrected

      Typo in Figure S4 legend "POSD-95 immuno-reactivity".

      Corrected

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we'd like to thank the three reviewers for their meticulous work that enable us to present now an improved manuscript and substantial changes were made to the article following reviewers' and editors' recommendations. We read all their comments and suggestions very carefully. Apart from a few misunderstandings, all comments were very pertinent. We responded positively to almost all the comments and suggestions, and as a result, we have made extensive changes to the document and the figures. This manuscript now contains 16 principal figures and 15 figure supplements.

      The number of principal figures is now 16 (1 new figure), and additional panels have been added to certain figures. On the other hand, we have added 7 additional figures (supplement figures) to answer the reviewers' questions and/or comments.

      Main figures

      ▪ Figures 1, 4, 5, 10, 11, 12, 13, 14: unchanged ▪ Figure 7 and 8 were switched.

      ▪ Figure 2: we added panel F in response to reviewer 3's and request for sperm defect statistics

      ▪ Figure 3: the contrast in panel B has been taken over to homogenize colors

      ▪ Figure 6: This figure was recomposed. The WB on testicular extract was suppressed and we present a new WB allowing to compare the presence of CCDC146 in the flagella fraction. Using an anti-HA Ab, we demonstrate that the protein is localized in the flagella in epididymal sperm. Request of the 3 reviewers.

      ▪ Figure 7 (old 8): to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum. Moreover, the WB was removed and is now presented in figure 6 (improved as requested).

      ▪ Figure 8. Was old figure 7

      ▪ Figure 9: figure 9 was recomposed and improved for increased clarity as suggested by reviewer 2 and 3.

      ▪ Figure 16 was before appendix 11

      Figure supplements and supplementary files

      ▪ Figure 1-Figure supplement 1 New. Sperm parameters of the 2 patients. requested by editor (remark #1) by the reviewer 1 (Note #3)

      ▪ Figure 2-Figure supplement 1 new. Sperm parameters of the line 2 (KO animals) requested by the reviewer 1 (Note #5)

      ▪ Figure 4-Figure supplement 1 New. Experiment to evaluate the specificity of the human CCDC146 antibody. Minimal revision request and reviewer 1 note #8

      ▪ Figure 6-Figure supplement 1 New. Figure recomposed; Asked by reviewer 2 note #4 and reviewer 3

      ▪ Figure 8-Figure supplement 1 New. We now provide new images to show the non-specific staining of the midpiece of human sperm by secondary Abs in ExM experiments; Asked by reviewer 2

      ▪ Figure 10-Figure supplement 1 New. We added new images to show the non-specific staining of the midpiece of mouse sperm by secondary Abs in IF (panel B). Rewiever 1 note #9 and reviewer 2 note #5

      ▪ Figure 12-Figure supplement 1 New. Control requested by reviewer 3 Note #23

      ▪ Figure 13-Figure supplement 1 New. We provide a graph and a statistical analysis demonstrating the increase of the length of the manchette in the Ccdc146 KO. Requested by editor and reviewer 3 Note 24

      ▪ Figure 15-Figure supplement 1 New. Control requested by reviewer 2. Minor comments

      ▪ Figure supplementary 1 New. Answer to question requested by reviewer 2 note #1

      All the reviewers' and editors’ comments have been answered (see our point to point response) and we resubmit what we believe to be a significantly improved manuscript. We strongly hope that we meet all your expectations and that our manuscript will be suitable for publication in "eLife". We look forward to your feedback,

      Point by point answer

      Please note that there has been active discussion of the manuscript and the summarize points below is the minimal revision request that the reviewers think the authors should address even under this new review model system. It was the reviewers' consensus that the manuscript is prepared with a lot of oversights - please see all the minor points to improve your manuscript.

      All minimal revision requests have been addressed

      Minimal revision request

      1) Clinical report/evaluation of the two patients should be given as it was not described even in their previous study as well as full description of CCDC146.

      We provide now a new Figure 1-figure supplement 1 describing the patients sperm parameters

      2) Antibody specificity should be provided, especially given two of the reviewers were not convinced that the mid piece signal is non-specific as the authors claim. As both KO and KI model in their hands, this should be straightforward.

      To validate the specificity of the Antibody, we transfected HEK cells with a human DDK-tagged CCDC146 plasmid and performed a double immunostaining with a DDK antibody and the CCDC146 antibody. We show that both staining are superimposable, strongly suggesting that the CCDC146 Ab specifically target CCDC146. This experiment is now presented in Figure 4-Figure supplement 1. Next, to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum.

      3) The authors should improve statistical analysis to support their experimental results for the reader can make fair assessment. Combined with clear demonstration of ab specificity, this lack of statistical analysis with very few sample number is a major driver of dampening enthusiasm towards the current study.

      Several statistical analyses were carried out and are now included:

      1) distribution of the HA signal in mouse sperm cells (see point 2 Figure 7 panel B)

      2) quantification and statistical analyses of the defect observed in Ccdc146 KO sperm (figure 2 panel E)

      3) Quantification and statistical analyses of the length of the manchette in spermatids 13-15 steps (Figure 13-Figure supplement 1 new)

      4) The authors need to clarify (peri-centriolar vs. centriole)

      In figure 4A, we have clearly shown that the protein colocalizes with centrin, a centriolar core protein in somatic cells. This colocalization strongly suggests that CCDC146 is therefore a centriolar protein, and this is now clearly indicated lines 211-212. However, its localization is not restricted to the centrioles and a clear staining was also observed in the pericentriolar material (PCM). The presence of a protein in PCM and centriole was already described, and the best example is maybe gamma-tubulin (PMID: 8749391).

      or tone down (CCDC146 to be a MIP) of their claim/description.

      Concerning its localization in sperm, we agree with the reviewer that our demonstration that CCDC146 is MIP would deserve more results. Because of that, we have toned down the MIP hypothesis throughout the manuscript. See lines 491495

      Testis-specific expression of CCDC146 as it is not consistent with their data.

      We have also modified our claim concerning the testis-expression of CCDC146. Line 176

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      1) As described in general comments, this study limits how the CCDC146 deficiency impairs abnormal centriole and manchette formation. The authors should explain their relationship in developing germ cells.

      In fact, there are limited information about the relationship between the manchette and the centriole. However, few articles have highlighted that both organelles share molecular components. For instance, WDR62 is required for centriole duplication in spermatogenesis and manchette removal in spermiogenesis (Commun Biol. 2021; 4: 645. doi: 10.1038/s42003-021-02171-5). Another study demonstrates that CCDC42 localizes to the manchette, the connecting piece and the tail (Front. Cell Dev. Biol. 2019 https://doi.org/10.3389/fcell.2019.00151). These articles underline that centrosomal proteins are involved in manchette formation and removal during spermiogenesis and support our results showing the impact of CCDC146 lack on centriole and manchette biogenesis. This information is now discussed. See lines 596-603

      2) The authors generated knock-in mouse model. If then, are the transgene can rescue the MMAF phenotype in CCDC146-null mice? This reviewer strongly suggest to test this part to clearly support the pathogenicity by CCDC146.

      We indeed wrote that we created a “transgenic mice”, which was misleading. We actually created a CCDC16 knock-in expressing a tagged-protein. The strain was actually made by CRISPR-Cas9 and a sequence coding for the HA-tag was inserted just before the first amino acid in exon 2, leading to the translation of an endogenous HA-tagged CCDC146 protein. We have removed the word transgenic from the text and made changes accordingly (see lines 250-253). We can therefore not use this strain to rescue the MMAF phenotype as suggested by the reviewer.

      3) Although the authors cite the previous study (Coutton et al., 2019), the study does not describe any information for CCDC146 and clinical information for the patients. The authors must show the results for clinical analysis to clarify the attended patients are MMAF patients without other phenotypic defects.

      We have now inserted a table, indicating all sperm parameters for the patients harboring a mutation in the CCDC146 gene (Figure 1-Figure supplement 1) and is now indicated lines 159-160

      4) The authors describe CCDC146 expression is dominant in testes, However, the level in testis is only moderate in human (Supp Figure 1). Thus, this description is not suitable.

      In Figure 1-figure supplement 2 (old FigS1), the median of expression in testis is around 12 in human, a value considered as high expression by the analysis software from Genevestigator. However, for mouse, it is true that the level of expression is medium. We assumed that reviewer’s comment concerned testis expression in mouse. To take into account this remark, we changed the text accordingly. See line 176.

      5) Although the authors mentioned that two mice lines are generated, only one line information is provided. Authors must include information for another line and provide basic characterization results to support the shared phenotype within the lines.

      We now provide a revised Figure 2-figure supplement 1CD, presenting the second line and the corresponding text in the main text is found lines 178-183.

      6) In somatic cells, the CCDC146 localizes at both peri-centriole and microtubule but its intracellular localization in sperm is distinguished. The authors should explain this discrepancy.

      The multi-localization of a centriolar protein is already discussed in detail in discussion lines 520-526. We have written:

      “Despite its broad cellular distribution, the association of CCDC146 with tubulin-dependent structures is remarkable. However, centrosomal and axonemal localizations in somatic and germ cells, respectively, have also been reported for CFAP58 [37, 55], thus the re-use of centrosomal proteins in the sperm flagellar axoneme is not unheard of. In addition, 80% of all proteins identified as centrosomal are found in multiple localizations (https://www.proteinatlas.org/humanproteome/subcellular/centrosome). The ability of a protein to home to several locations depending on its cellular environment has been widely described, in particular for MAP. The different localizations are linked to the presence of distinct binding sites on the protein…. “

      7) Authors mention CCDC146 is a centriolar protein in the title and results subtitle. However, the description in results part depicts CCDC146 is a peri-centriolar protein, which makes confusion. Do the authors claim CCDC146 is centrosomal protein?

      In figure 4A, we have clearly shown that the protein colocalizes with centrin, a centriolar core protein. This colocalization strongly suggests that CCDC146 is therefore a centriolar protein in somatic cells, and is now clearly indicated lines 211-212. However, its localization is not restricted to the centrioles and a clear staining was also observed in the pericentriolar material (PCM). The presence of a protein in PCM and centriole was already described and the best example is maybe gamma-tubulin (PMID: 8749391).

      8) Verification of the antibody against CCDC146 must be performed and shown to support the observed signal are correct. 2nd antibody only signal is not proper negative control.

      It is a very important remark. The commercial antibody raised against human CCDC146 was validated in HEK293-cells expressing a DDK-tagged CCDC146 protein. Cells were co-marked with anti-DDK and anti-CCDC146 antibodies. We have a perfect colocalization of the staining. This experiment is now presented in Figure 4-figure supplement 1 and presented in the text (lines 206-208).

      9) In human sperm, conventional immunostaining reveals CCDC146 is detected from acrosome head and midpiece. However, in ExM, the signal at acrosome is not detected. How is this discrepancy explained? The major concern for the ExM could be physical (dimension) and biochemical (properties) distortion of the sample. Without clear positive and negative control, current conclusion is not clearly understood. Furthermore, it is unclear why the authors conclude the midpiece signal is non-specific. The authors must provide experimental evidence.

      Staining on acrosome should always be taken with caution in sperm. Indeed, numerous glycosylated proteins are present at the surface of the plasma membrane regarding the outer acrosomal membrane for sperm attachment and are responsible for numerous nonspecific staining. Moreover, this acrosomal staining was not observed in mouse sperm, strongly suggesting that it is not specific.

      Concerning the staining in the midpiece observed in both conventional and Expansion microscopy, it also seems to be nonspecific and associated with secondary Abs.

      For IF, we now provide new images showing clearly the nonspecific staining of the midpiece when secondary Ab were used alone (see Figure 10-figure supplement 1B).

      For ExM, we provide new images in Figure 8-figure supplement 1B (POC5 staining) showing a staining of the midpiece (likely mitochondria), although POC5 was never described to be present in the midpiece. Both experiments (CCDC146 and POC5 staining by ExM) shared the same secondary Ab and the midpiece signal was likely due to it.

      Moreover, we now provide new images (figure 7C) in ExM on mouse sperm showing no staining in the midpiece and demonstrating that the punctuated signal is present all along the flagellum. Finally, we would like to underline that we now provide new IF results, using an anti-HA conjugated with alexafluor 488 and confirming the ExM results.

      These points are now discussed lines 498-502 for acrosome and lines 503-511 for midpiece staining.

      10) For intracellular localization of the CCDC146 in mouse sperm, the authors should provide clear negative control using WT sperm which do not carry the transgene.

      This experiment was performed.

      To avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum.

      11) Current imaging data do not clearly support the intracellular localization of the CCDC146. Although western blot imaging reveal that CCDC146 is detected from sperm flagella, this is crude approach. Thus, this reviewer highly recommends the authors provide more clear experimental evidence, such as immuno EM.

      We provide now a WB comparing the presence of the protein in the flagellum and in the head fractions; see new figure 6. We show that CCDC146 is only present in the flagellum fraction; The detection of the band appeared very quickly at visualization and became very strong after few minutes, demonstrating that the protein is abundant in the flagella. It is important to note that epididymal sperm do not have centrioles and therefore this signal is not a centriolar signal. We also now provide new statistical analyses showing that the immuno-staining observed in the principal piece is very specific (Figure 7B). Altogether, these results demonstrate unequivocally the intracellular localization of CCDC146 in the flagellum. This point is now discussed lines 480-489

      12) Although sarkosyl is known to dissociate tubulin, it is not well understood and accepted that the enhanced detection of CCDC146 by the detergent indicates its microtubule inner space. Sperm axoneme to carry microtubule is also wrapped peri-axonemal components with structural proteins, which are even not well solubilized by high concentration of the ionic detergent like SDS.

      We agree with the reviewer that the solubilization of the protein by sarkozyl is not a proof of the presence of the protein inside microtubule. Taking into account this point, the MIP hypothesis was toned down and we now discuss alternative hypothesis concerning these results; See discussion lines 490-497

      13) SEM image is not suitable to explain internal structure (line 317-323).

      We agree with the reviewers and changes were made accordingly. See lines 354-357

      Minor comments

      1) In main text, supplementary figures are cited "Supp Figure". And the corresponding legends are written in "Appendix - Figure". Please unify them.

      Done Labelled now “Figure X-figure supplement Y”

      2) Line 159, "exon 9/19" is not clear.

      We have written now exons 9 and indicated earlier that the gene contains 19 exons

      3) Line 188, "positive cells" are vague.

      Positive was changed by “fluorescent”

      4) Representative TUNEL assay image for knockout testes were not shown in Supp Figure 3B.

      It was a mistake now Figure 2-figure supplement 2C

      5) Please provide full description for "IF" and "AB" when described first.

      Done

      6) Line 262, It is unclear what is "main piece".

      Changed to principal piece

      7) Line 340, Although the "stage" information might be applicable, this is information for "seminiferous tubule" rather than "spermatid". This reviewer suggests to provide step information rather than stage information.

      We agree with the reviewer that there was a confusion between “stage” and “step”. We change to step spermatids

      8) Line 342, Step 1 is not correct in here.

      OK corrected. now steps 13-15 spermatids

      9) Line 803, "C." is duplicated.

      Removed

      10) Figure 3A, it will be good to mark the defective nuclei which are described in figure legends.

      These cells are now indicated by white arrow heads

      11) Figure 5, Please provide what MT stands for.

      Now explained in the legend of figure 5

      12) Figure 6. Author requires clear blot images for C. In addition, Panel B information is not correct. If the blot was performed using HA antibody, then how "WT" lane shows bands rather than "HA" bands?

      The reviewer is correct. It was a mistake; The figure was recomposed and improved.

      Reviewer #2 (Recommendations For The Authors):

      Overall, editing oversights are present throughout the manuscript, which has made the review process quite difficult. Some repetitive figures can be removed to streamline to grasp the overall story easier. Some claims are not fully supported by evidence that need to tone down. Some figures not referenced in the main text need to be mentioned at least once.

      All figures are now referenced in the text

      Major comments:

      1) 163-164 - Please clarify the claim that there is going to be an absence of the protein or nonfunctional protein, especially for the patient with a deletion that could generate a truncated protein at two third size of the full-length protein. Similarly, 35% of the protein level is present for the patient with a nonsense mutation. Some in silico structural analysis or analysis of conserved domains would be beneficial to support these claims.

      Both mutations are predicted to produce a premature stop codons: p.Arg362Ter and p.Arg704serfsTer7, leading either to the complete absence of the protein in case of non-sense mediated mRNA decay or to the production of a truncated protein missing almost two third or one fourth of the protein respectively. CCDC146 is very well conserved throughout evolution (Figure supplementary 1), including the 3’ end of the protein which contains a large coil-coil domain (Figure 1B). In view of the very high degree of conservation, it is most likely that the 3’ end of the protein, absent in both subjects, is critical for the CCDC146 function and hence that both mutations are deleterious. This explanation is now added to the discussion. see lines 439-448

      2) 173, 423 - Please clearly state a rationale of your mouse model design (i.e., why a mouse model that recapitulate human mutation is not generated) as the truncations identified in human patients are located further towards the C-terminus, and it is not clear whether truncated proteins are present, and if so, they could still be functional. Basically, the current mouse model supports the causality of the human mutations.

      This is an important question, which goes beyond the scope of this article, and raises the question of how to confirm the pathogenicity of mutations identified by high-throughput sequencing. The production of KO or KI animals is an important tool to help confirm one’ suspicions but the first element to take into consideration is the nature of the genetic data.

      Here we had two patients with homozygous truncating variants. In human, it is well established that the presence of premature stop codons usually induces non-sense mediated mRNA decay (NMD), inducing the complete absence of the protein or a strong reduction in protein production. In the unlikely absence of NMD in our two patients, the identified variants would induce the production of proteins missing 60% and 30% of their C terminal part. Often (and it is particularly true for structural proteins) the production of abnormal proteins is more deleterious than the complete absence of the protein (and it is most likely the purpose of NMD, to limit the production of abnormal “toxic” proteins). For these reasons, to try to recapitulate the most likely consequences of the human variants, without risking obtaining an even more severe effect, we decided to introduce a stop codon in the first exon in order to remove the totality of the protein in the KO mice.

      The second element is to interpret the phenotype of the KO animals. Here, the human sperm phenotype is perfectly recapitulated in the KO mice.

      Overall, we have strong genetic arguments in human and the reproduction of the phenotype in KO mice confirming the pathogenicity of the variants identified in men.

      This point is now discussed see lines 433-438

      3) Figure 6A - the labelling is misleading as it seems to suggest that the specific cells were isolated from the testes for RT-PCR.

      We have modified the labelling to avoid any confusion.

      Figure 6B -Signal of HA-tag is shown in WT, not in transgenic. Please check the order of the labels. Figure 6C - This blot is NOT a publication-quality figure. The bands are very difficult to observe, especially in lane D18. Because it is one of the important data of this study, replacing this figure is a must.

      The figure has been completely remade, including new results. See new figure 6. Figure 6C was suppressed.

      4) Supplementary fig 6 is also not a publication-level figure, and the top part seems largely unnecessary (already in the figure legend).

      The figure has been completely remade as well (now Figure 6-Figure Supplement 1).

      5) 261/267- The conclusion that mitochondrial staining in the flagellum (in both mice and humans) is non-specific is not convincing. Supplementary fig 8 shows that the signal from secondary only IF possibly extends beyond the midpiece - but it is hard to determine as no mitochondrial-specific staining is present. Either need to tone down the conclusion or provide supporting experimental evidence.

      First, to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum. These experiments are now described lines 271-279

      Second, we provide new images of the signal obtained with secondary Abs only that shows more clearly that the secondary Ab gave a non-specific staining (Figure 10-Figure supplement 1B). This point is discussed lines 503-511

      6) Figure 9 A - Please relate the white line to Fig. 9B label in X-axis. The information from Fig 9A+D and 9E+F are redundant. The main text nor the figure legends indicate why these specific two sperm were chosen for quantification and demonstrating the outcomes. One of them could be moved to supplementary information or removed, or the two could be combined.

      As suggested by the reviewer, we have combined the two sperm to demonstrate that CCDC146 staining is mostly located on microtubule doublets. Moreover, the figure was recomposed to make it clearer.

      Minor comments:

      All of the supplementary figures are referred to as Supp Fig X in the text, however, they are actually titled Appendix - Figure X. This needs to be consistent.

      The figures are now referred as figure supplement x in both text and figures

      Line 125 - edit spacing.

      We think this issue (long internet link) will be curated later and more efficiently by the journal, during the step of formatting necessary for publication.

      144 - With which to study  with which we studied?

      We made the change as suggested.

      151 - Supp Fig 1 - the text says that the gene is highly transcribed in human and mouse testes, but the information in the figure states that the level in mouse tissues is "medium"

      We have corrected this mistake in the text; See line 176

      165 - The two mutations are most likely deleterious. Please specifically mention what analyses done to predict the deleterious nature to support these claims.

      Both variants, c.1084C>T and c.2112del, are extremely rare in the general population with a reported allele frequency of 6.5x10-5 and 6.5x10-06 respectively in gnomAD v3. Moreover, these variants are annotated with a high impact on the protein structure (MoBiDiC prioritization algorithm (MPA) score = 10, DOI: 10.1016/j.jmoldx.2018.03.009) and predicted to induce each a premature termination codon, p.(Arg362Ter) and p.(Arg704SerfsTer7) respectively, leading to the production of a truncated protein. This information is now given line 164-169

      196-200/Figure 4 - As serum starved cells/basal body (B) are not mentioned in the main text, as is, Fig 4A would be sufficient/is relevant to the text. Please make the text reflect the contents of the whole figure, or re/move to supplement.

      We agree with the reviewer that the full description of the figure should be in the text. We added two sentences to describe figure 4B see lines 217-218.

      224 - spermatozoa (plural) fits better here, not spermatozoon

      OK changed accordingly

      236 - According to the figure legend, 6B is only showing data from the epididymal sperm, not postnatal time points; should be referencing 6C. Alignment of Marker label

      As indicated above, the figure has been completely remade, including new results. See new figure 6. Figure 6C was suppressed. The corresponding text was changed accordingly see lines 249-266

      255-256 - Referenced figure 7B3, however, 7B3 only shows tubulin staining, so no CCDC146 can be observed. Did authors mean to reference fig 7B as a whole?

      Sorry for this mistake. We agree and the text is now figure 8B6 (figure 7 and 8 were switched)

      305 - "of tubules" - I presume it is meant to be microtubules?

      Yes; The text was changed as suggested

      317-321 - a diagram of HTCA would be useful here

      We have added a reference where HTCA diagram is available see line 363. Moreover, a TEM view of HTCA is presented figure 12A

      322/Fig 11A - an arrow denoting the damage might be useful, as A1 and A3 look similar. The size of the marker bar is missing. Please update the information on figure legend.

      Concerning, the comparison between A1 and A3, the take home message is that there is a great variability in the morphological damages. This point is now underlined in the corresponding text. We updated the size of the marker bar as suggested (200 nm). See line 365-367

      323 - Please mark where capitulum is in the figure

      Capitulum was changed for nucleus

      Since Fig 11B2 is not referenced in the main text, it does not seem to add anything to the data, and could be removed/moved to supplement.

      We added a sentence to describe figure 11B2 line 370

      342-343 - manchette in step I is not seen clearly - the figure needs to be annotated better. However, DPY19L2 is absent in step I in the KO, but the main text does not reflect that - why is that?

      We do not understand the remark of the reviewer “manchette in step I is not seen clearly”. The figure shows clearly the manchette (red signal) in both WT and KO (Figure 13 D1/D2).

      For steps 13-15 WT spermatids, the size of the manchette decreases and become undetectable. In KO spermatids, the shrinkage of the manchette is hampered and in contrast continue to expand (Figure 13D2). We also provide a new Figure 13-figure supplement 1 for other illustrations of very long manchettes and a statistical analysis. In the meantime, the acrosome is strongly remodeled, as shown in figure 16-new, with detached acrosome (panel H). This morphological defect may induce a loss of the DPY19L2 staining (Figure 13 D2 stage I-III). This explanation is now inserted in the text line 396399

      Figure 15B and 15C only show KO, corresponding images from the WT should be present for comparison.

      WT images are now provided in Figure 1-figure supplement 1 new

      Figure 12 - Figure 12 - JM?.

      JM was removed. It does not mean anything

      Figure 12C and Supplementary Fig 10 - structures need to be labelled, as it is unclear what is where

      Done

      338 - text mentions step III, but only sperm from step VII are shown in Figure 13

      As suggested by reviewer 3, we changed stage by step. The text was modified to take into account this remark see lines 388-396

      360 - This is likely supposed to say Supp Figure 11E-G, not 13??

      Yes, it is a mistake. Corrected

      388 Typo "in a in a".

      Yes, it is a mistake. Corrected

      820 - Fig 3 legend - in KO spermatid nuclei were elongated - could this be labelled by arrows? I am not convinced this phenotype is that different from the WT.

      In fact, the nuclei of elongating KO spermatids are elongated and also very thin, a shape not observed in the WT; We have added arrow heads and modified the text to indicate this point line 200.

      836 - Figure 5 legend says that in yellow is centrin, but that is not true for 5A, where the figure shows labelling for y-tubulin (presumably, according to the figure itself).

      We have modified the text of the legend to take into account the remark

      837- 5A supposedly corresponds to synchronized HEK293T cells, but the reasoning behind using synchronized cells is not mentioned at all in the main text; furthermore, how this synchronization is achieved is not explained in materials and methods (serum starvation? Thymidine block?).

      Yes, figure 5A was obtained with synchronized cells. We have added one paragraph in the MM section. For cell synchronization experiments, cells underwent S-phase blockade with thymidine (5 mM, SigmaAldrich) for 17 h followed by incubation in a control culture medium for 5 h, then a second blockade at the G2-M transition with nocodazole (200 nM, Sigma-Aldrich) for 12 h. Cells were then fixed with cold methanol at different times for IF labelling. See line 224 for changes made in the result section and lines 700-704 for changes made in the MM section.

      845- figure legend says that the RT-PCR was done on CCDC146-HA tagged mice, but the main text does not reflect that.

      We made changes and the description of the KI is now presented before (line 240) the RT-PCR experiment (line 257).

      949 - it is likely supposed to say A2, not B1 (B1 does not exist in Fig 15)

      Yes, it is a mistake. Corrected

      971 - Appendix Fig 3 legend - I believe that the description for B and C are swapped.

      Yes, it is a mistake. Corrected

      Furthermore, some questions to address in A would be: Which cross sections were from which animal/points? How many per animal? Were they always in the same location?

      Yes, we have a protocol for arranging and orienting all testes in the same way during the paraffin embedding phase. The cross-sections are therefore not taken at random, and we can compare sections from the same part of the testis. The number of animals was already indicated in the figure legend (see line 1128)

      Reviewer #3 (Recommendations For The Authors):

      1) There are a number of grammatical and orthographical errors in the text. Careful proofreading should be performed.

      We have sent the manuscript to a professional proofreader

      2) The author should also check for redundancies between the introduction and the discussion.

      The discussion has modified to take into account reviewers’ remarks. Nevertheless, we did our best to avoid redundancies between introduction and discussion.

      3) Can the authors provide a rationale why they have chosen to tag their gene with an HA tag for localisation? One would rather think of fluorescent proteins or a Halo tag.

      Because the functional domains of the protein are unknown, adding a fluorescent protein of 24 KDa may interfere with both the localization and the function of CCDC146. For this reason, we choose a small tag of only 1.1 KDa, to limit as such as possible the risk of interfering with the structure of the protein. This rational is now indicated in the manuscript lines 251-254. It is worth to note, that the tagged-strain shows no sperm defect, demonstrating that the HA-tag does not interfere with CCDC146 function.

      4) In the abstract, line 53, "provide evidence" is not the right term for something that is just suggestive. The term "suggests" would be more appropriate.

      The text was modified to take into account this remark

      5) Line 74: "genetic deficiency" sounds strange here, do the authors mean simply "mutation"?

      Infertility may be due to several genetic deficiency such as chromosomal defects (XXY (Klinefelter syndrome)), microdeletion of the Y chromosome or mutations in a single gene. Therefore, mutation is too restrictive. Nevertheless, we modified the sentence which is now “…or a genetic disorder including chromosomal or single gene deficiencies”

      6) Lines 163-164: the authors describe the mutations (premature stop mutations) and say that they could either lead to complete absence of the gene product, or the expression of a truncated protein. Did they test this, for example, with some immuno blot analyses?

      As stated above, unfortunately, we were unable to verify the presence of RNA-decay in these patients for lack of biological material.

      7) Line 184 and Fig 2E: the sperm head morphologies should be quantitatively assessed.

      We provide now a full statistical analysis of the observed defects: see new panel in Figure 2 F

      8) Fig 3: The annotation should be more precise - KO certainly means CDCC146-KO. The colours of the IH panels is different, which attracts attention but is clearly a colour-adjustment artefact. Colours should be adjusted for the panels to look comparable. It would be also helpful to add arrowheads into the figure to point at the phenotypes that are highlighted in the text.

      We have added Ccdc146 KO in all figures. We have added arrow heads to point out the spermatids showing a thin and elongated nucleus. Concerning adjustment of colors, we attempted to make images of panel B comparable. See new figure 3.

      9) Fig 6A: the authors use RT PCR to determine expression dynamics of their gene of interested, and use actin (apparently) as control. However, actin and CDCC146 expression levels follow the same trend. How is the interpreted?

      The reviewer did not understand the figure. The orange bars do not correspond to actin expression and the grey bars to Ccdc146 expression but both bars represent the mRNA expression levels of Ccdc146 relative to Actb (orange) and Hprt (grey) expression in CCDC146-HA mouse pups’ testes. We tested two housekeeping genes as reference to be sure that our results were not distorted by an unstable expression of a housekeeping gene. We did not see significant difference between both house keeping genes. Actin was not used.

      10) In line 235, the authors suggest posttranslational modifications of their protein as potential cause for a slightly different migration in SDS PAGE as predicted from the theoretical molecular weight. This is not necessarily the case, some proteins do migrate just differently as predicted.

      We have changed the text accordingly and now provide alternative explanation for the slightly different migration. See lines 258-259

      11) The annotation of Fig 6 panels is problematic. First, why do the authors write "Laemmli" as description of the gel? It would be more helpful to write what is loaded on the gel, such as "sperm". Second, in panels B and C it would be helpful to add the antibodies used. It is not clear why there is a signal in the WT lane of panel B, but not in the HA lane (supposing an anti-HA antibody is used: why has WT a specific HA band?). In panel C, it is not clear why the blot that has so beautifully shown a single band in panel B suddenly gives such a bad labelling. Can the authors explain this? Also, they cut off the blot, likely because to too much background, but this is bad practice as full blots should be shown. In the current state, the panel C does not allow any clear conclusion. To make it conclusive, it must be repeated.

      Several mistakes were present in this figure. This figure was recomposed. The WB on testicular extract was suppressed and we now present a new WB allowing to compare the presence of CCDC146 in the flagella and head fractions from WT and HA-CCDC146 sperm. Using an anti-HA Ab, we demonstrate that in epididymal sperm the protein is localized in the flagella only. See new figure 6. The corresponding text was changed accordingly.

      12) The authors have raised an HA-knockin mouse for CDCC146, which they explained by the unavailability of specific antibodies. However, in Fig 7, they use a CDCC146 antibody. Can they clarify?

      The commercial Ab work for HUMAN CCDC146 but not for MOUSE CCDC146. We have added few words to make the situation clearer, we have added the following information “the commercial Ab works for human CCDC146 only”. See line 240

      13) In Fig 7A (line 258), the authors hypothesise that they stain mitochondria - why not test this directly by co-staining with mitochondria markers?

      We chose another solution to resolve this question:

      To avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the whole flagellum.

      14) It seems that in both, Fig 7 and 8, the authors use expansion microscopy to localise CDCC146 in sperm tails. However, the staining differs substantially between the two figures. How is this explained?

      In figure 8 we used the commercial Ab in human sperm, whereas in figure 7 we used the anti-HA Abs in mouse sperm. Because the antibodies do not target the same part of the CCDC146 protein (the tag is placed at the N-terminus of the protein, and the HPA020082 Ab targets the last 130 amino acids of the Cter), their accessibility to the antigenic site could be different. However, it is important to note that both antibodies target the flagellum. This explanation is now inserted see lines 304-312

      15) Fig 8D and line 274: the authors do a fractionation, but only show the flagella fraction. Why?

      Showing all fractions of their experiment would have underpinned the specific enrichment of CDCC146 in the flagella fraction, which is what they aim to show. Actually, given the absence of control proteins, the fact that the band in the flagellar fraction appears to be weaker than in total sperm, one could even conclude that there is more CDCC146 in another (not analysed) fraction of this experiment. Thus, the experiment as it stands is incomplete and does not, as the authors claim, confirm the flagellar localisation of the protein.

      We agree with the reviewer’s remark. We provide now new results showing both flagella and nuclei fractions in new figure 6A. This experiment is presented lines 253-256

      16) Line 283, Fig 9D,F: The description of the microtubules in this experiment is not easy to understand. Do the authors mean to say that the labelling shows that the protein is associated with doublet microtubules, but not with the two central microtubules? They should try to find a clearer way to explain their result.

      As suggested by reviewer 2, we have changed the figure to make it clearer. The text was changed accordingly. See new figure 9 and new corresponding legend lines 1006.

      17) Fig 9G - how often could the authors observe this? Why is the axoneme frayed? Does this happen randomly, or did the authors apply a specific treatment?

      Yes, it happens randomly during the fixation process.

      18) Line 300 and Fig 10A - the authors talk about the 90-kDa band, but do say anything about what they think this band is representing.

      We have now added the following sentence lines 340-342: “This band may correspond to proteolytic fragment of CCDC146, the solubilization of microtubules by sarkosyl may have made CCDC146 more accessible to endogenous proteases.”

      19) Fig 11A, lines 321-322: the authors write that the connecting piece is severely damaged. This is not obvious for somebody who does not work in sperm. Perhaps the authors could add some arrow heads to point out the defects, and briefly describe them in the text.

      We realized from your remark that our message was not clear. In fact, there is a great variability in the morphological damages of the HTCA. For instance, the HTCA of Ccdc146 KO sperm presented in figure 10A2 is quite normal, whereas that in figure 10A4 is completely distorted. This point is now underlined in the corresponding text. See lines 367-369

      We also added the size of the marker bar (200 nm), which were missing in the figure’s legend.

      20) Line 323: it will be important to name which tubulin antibody has been used to identify centrioles, as they are heavily posttranslationally modified.

      The different types of anti-tubulin Abs are described in the corresponding figure’s legend

      21) Fig 11B - phenotypes must be quantified to make these observations meaningful.

      We agree that a quantification would improve the message. However, testicular sperm are obtained by enzymatic separation of spermatogenic cells and the number of testicular sperm are very low. Moreover, not all sperm are stained. Taking these two points into account, it seems to us that quantification could be difficult to analyze. For this reason, the quantification was not done; however, it is important to note that these defects were not observed in WT sperm, demonstrating that these defects are cased by the lack of CCDC146. We have added a sentence to underline this point; See lines 374-375

      22) Line 329: Figure 12AB - is this a typo - should it read Figure 12B?

      We have split the panel A in A1 and A2 and changed the text accordingly. See line 378

      23) Why are there not wildtype controls in Fig 12B, C?

      We provide now as Figure 12-figure supplement 1, a control image for fig 12B. For figure 12C, the emergence of the flagellum from the distal centriole in WT is already shown in Fig 12A1

      24) Fig 13: the authors write that the manchette is "clearly longer and wider than in WT cells" (lines 342-343). How can they claim this without quantitative data?

      We now provide a statistical analysis of the length of the manchette. See figure 13-figure supplement 1A. We also provide a new a new image illustrating the length of the manchette in Ccdc146 KO spermatids; See Figure 13-figure supplement 1B.

    1. Author Response

      We appreciate the insightful and constructive feedback from the reviewers regarding our manuscript, "Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling." The comments have provided us with a number of valuable perspectives that will undoubtedly strengthen the impact and clarity of our work.

      We recognize the need for a more detailed and comparative analysis of the perceptual tasks used in our pupil and fMRI experiments. To address these points directly: the jittered intertrial intervals (ITIs) in the fMRI work were deemed necessary to effectively deconvolve the BOLD response (see Stottinger et al., 2018). In our fMRI work, each image was randomly preceded and followed by varying ITIs (2, 4, 6, and 8 seconds), ensuring an equitable distribution across sets and subjects. Importantly, our analysis of both fMRI and behavioral studies, including eye tracking data, indicates that perceptual switch behavior – the point at which switches occur – is consistent across modalities. If more predictive or preparatory activity were present in the fMRI version of the task, we would expect earlier switches or choices and altered reaction time distributions – neither of these signatures was observed in the original study (Stottinger et al., 2018). Importantly, this suggests that the additional time available in the fMRI experiments did not significantly alter behavioral outcomes. Thus, our findings suggest that despite the differences in timing and task structure, the behavioural responses remain consistent across both experimental setups. We will clarify this in the revised manuscript.

      In response to the reviewer's comments on our computational model, particularly regarding the modelling of noradrenaline (NA) effects in the RNN, we agree that modelling gain as stationary is a substantial approximation. However, given the slow ramping of pupil diameter, which served as our proxy for gain, it is an approximation that we believe is justified: in the revised manuscript, we will run additional simulations to ensure the validity of this approximation. In addition, whilst we agree that the model is more complicated than is needed for the task, we opted for RNN modelling, in lieu of a simpler modelling approach, because we wanted to use RNN modelling as a method for both hypothesis testing and generation. To build the RNN, the only key elements of model structure we had to specify in advance were the inputs and the target outputs of the network. The solution the RNN arrived at, although involving many more parameters than a simpler model, was entirely determined by optimisation (i.e., not our a priori hypotheses). We feel that this strengthens the result considerably. Importantly, this approach also allowed us to be surprised by the results of the model – for instance, we did not anticipate that the effect of gain on the energy landscape to be primarily mediated by inhibitory gain. In the revised manuscript, we will integrate this line of thinking into the paper. We are also sensitive to the fact that this result is both counterintuitive and difficult to study in high-dimensional dynamical systems like RNNs. In revisions, we will provide further analysis of the RNN and build a 2D approximation to the RNN that can be studied on the phase plane to better conceptually illuminate the mechanisms at play.

      Furthermore, we agree with the suggestion to consider alternative mechanisms that might contribute to perceptual switches, such as attention and top-down processing. While our study primarily focuses on LC-mediated gain modulation, we acknowledge the complexity of neural processes involved in perception and will expand our discussion to include these potential mechanisms. Furthermore, noting the importance of moderating the causal language used in our manuscript. We will revise our wording to more accurately reflect the correlational nature of our findings and ensure that our conclusions are firmly grounded in the data presented.

      In conclusion, we are enthusiastic about the opportunity to refine our manuscript based on these valuable comments. In an updated version, we will address the overall points by providing clearer explanations of our methods, refining our figures for better readability, and ensuring that our conclusions are supported by robust analysis. We believe that these revisions will not only address the concerns raised but also significantly enhance the overall quality of our research. We thank the reviewers for their thorough and thoughtful critiques and look forward to submitting our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors explore the effects of DNA methylation on the strength of regulatory activity using massively parallel reporter assays in cell lines on a genome-wide level. This is a follow-up of their first paper from 2018 that describes this method for the first time. In addition to adding more indepth information on sequences that are explored by many researchers using two main methods, reduced bisulfite sequencing and sites represented on the Illumina EPIC array, they now show also that DNA methylation can influence changes in regulatory activity following a specific stimulation, even in absence of baseline effects of DNA methylation on activity. In this manuscript, the authors explore the effects of DNA methylation on the response to Interferon alpha (INFA) and a glucocorticoid receptor agonist (dexamethasone). The authors validate their baseline findings using additional datasets, including RNAseq data, and show convergences across two cell lines. The authors then map the methylation x environmental challenge (IFNA and dex) sequences identified in vitro to explore whether their methylation status is also predictive of regulatory activity in vivo. This is very convincingly shown for INFA response sequences, where baseline methylation is predictive of the transcriptional response to flu infection in human macrophages, an infection that triggers the INF pathways.

      Thank you for your strong assessment of our work!

      The extension of the functional validity of the dex-response altering sequences is less convincing.

      We agree. We note that genes close to dex-specific mSTARR-seq enhancers tend to be more strongly upregulated after dex stimulation than those near shared enhancers, which parallels our results for IFNA (lines 341-344). However, there is unfortunately no comparable data set to the human flu data set (i.e., with population-based whole genome-bisulfite sequencing data before and after dex challenge), so we could not perform a parallel in vivo validation step. We have added this caveat to the revised manuscript (lines 555-557).

      Sequences altering the response to glucocorticoids, however, were not enriched in DNA methylation sites associated with exposure to early adversity. The authors interpret that "they are not links on the causal pathway between early life disadvantage and later life health outcomes, but rather passive biomarkers". However, this approach does not seem an optimal model to explore this relationship in vivo. This is because exposure to early adversity and its consequences is not directly correlated with glucocorticoid release and changes in DNA methylation levels following early adversity could be related to many physiological mechanisms, and overall, large datasets and meta-analyses do not show robust associations of exposure to early adversity and DNA methylation changes. Here, other datasets, such as from Cushing patients may be of more interest.

      Thank you for making these important points. We have expanded the set of caveats regarding the lack of enrichment of early adversity-reported sites in the mSTARR-data set (lines 527-533). Specifically, we note that the relationship between early adversity and glucocorticoid physiology is complex (e.g., Eisenberger and Cole, 2012; Koss and Gunnar, 2018) and that dex challenge models one aspect of glucocorticoid signaling but not others (e.g., glucocorticoid resistance). Nevertheless, we also see little evidence for enrichment of early adversity-associated sites in the mSTARR data set at baseline, independently of the dex challenge experiment (lines 483-485; Figure 4).

      We also agree that large data sets (e.g., Houtepen et al., 2018; Marzi et al., 2018) and reviews (e.g., Cecil et al., 2020) of early adversity and DNA methylation in humans show limited evidence of associations between early adversity and DNA methylation levels. However, the idea that early adversity impacts downstream outcomes remains pervasive in the literature and popular science (see Dubois et al., 2019), which we believe makes tests like ours important to pursue. We also hope that our data set (and others generated through these methods) will be useful in interpreting other settings in which differential methylation is of interest as well—in line with your comment below. We have clarified both of these points in the revised manuscript (lines 520-522; 536-539).

      Overall, the authors provide a great resource of DNA methylation-sensitive enhancers that can now be used for functional interpretation of large-scale datasets (that are widely generated in the research community), given the focus on sites included in RBSS and the Illumina EPIC array. In addition, their data lends support that differences in DNA methylation can alter responses to environmental stimuli and thus of the possibility that environmental exposures that alter DNS methylation can also alter the subsequent response to this exposure, in line with the theory of epigenetic embedding of prior stimuli/experiences. The conclusions related to the early adversity data should be reconsidered in light of the comments above.

      Thank you! And yes, we have revised our discussion of early life adversity effects as discussed above.

      Reviewer #1 (Recommendations For The Authors):

      While the paper has a lot of strengths and provides new insight into the epigenomic regulation of enhancers as well as being a great resource, there are some aspects that would benefit from clarification.

      a. It would be great to have a clearer description of how many sequences are actually passing QC in the different datasets and what the respective overlaps are in bps or 600bp windows. Now often only % are given. Maybe a table/Venn diagram for overview of the experiments and assessed sequences would help here. This concern the different experiments in the K652, A549, and Hep2G cell lines, including stimulations.

      We now provide a supplementary figure and supplementary table providing, for each dataset, the number of 600 bp windows passing each filter (Figure 2-figure supplement 1; Supplementary File 9), as well as a supplementary figure providing an upset plot to show the number of assessed sequences shared across the experiments (Figure 2-figure supplement 2).

      b. It would also be helpful to have a brief description of the main differences in assessed sequences and their coverage of the old (2018) and new libraries in the main text to be able better interpret the validation experiments.

      We now provide information on the following characteristics for the 2018 data set versus the data set presented for the first time here: mean (± SD) number of CpGs per fragment; mean (± SD) DNA sequencing depth; and mean (± SD) RNA sequencing depth (lines 169-170 provide values for the new data set; in line 194, we reference Supplementary File 5, which provides the same values for the old data set). Notably, the coverage characteristics of analyzed windows in both data sets are quite high (mean DNA-seq read coverage = 94x and mean RNA-seq read coverage = 165x in the new data set at baseline; mean DNA-seq read coverage = 22x and mean RNA-seq read coverage = 54x in Lea et al. 2018).

      c. Statements of genome-wide analyses in the abstract and discussion should be a bit tempered, as quite a number of tested sites do not pass QC and do not enter the analysis. From the results it seems like from over 4.5 million sequences, only 200,000 are entering the analysis.

      The reason why many of the windows are not taken forward into our formal modeling analysis is that they fail our filter for RNA reads because they are never (or almost never) transcribed—not because there was no opportunity for transcription (i.e., the region was indeed assessed in our DNA library, and did not show output transcription, as now shown in Figure 2-figure supplement 1). We have added a rarefaction analysis (lines 715-722 in Materials and Methods) of the DNA fragment reads to the revised manuscript which supports this point. Specifically, it shows that we are saturated for representation of unique genomic windows (i.e., we are above the stage in the curve where the proportion of active windows would increase with more sequencing: Figure 1figure supplement 4). Similarly, a parallel rarefaction curve for the mSTARR-seq RNA-seq data (Figure 1-figure supplement 4) shows that we would gain minimal additional evidence for regulatory activity with more sequencing depth. We now reference these analyses in revised lines 179-184 and point to the supporting figure in line 182.

      In other words, our analysis is truly genome-wide, based on the input sequences we tested. Most of the genome just doesn’t have regulatory activity in this assay, despite the potential for it to be detected given that the relevant sequences were successfully transfected into the cells.

      d. Could the authors comment on the validity of the analysis if only one copy is present (cut-off for QC)?

      We think this question reflects a misunderstanding of our filtering criteria due to lack of clarity on our part, which we have modified in the revision. We now specify that the mean DNA-seq sequencing depth per sample for the windows we subjected to formal modeling was quite high:

      93.91 ± 10.09 SD (range = 74.5 – 113.5x) (see revised lines 169-170). In other words, we never analyze windows in which there is scant evidence that plasmids containing the relevant sequence were successfully transfected (lines 170-172).

      Our minimal RNA-seq criteria require non-zero counts in at least 3 replicate samples within either the methylated condition or the unmethylated condition, or both (lines 166-168). Because we know that multiple plasmids containing the corresponding sequence are present for all of these windows—even those that just cross the minimal RNA-seq filtering threshold—we believe our results provide valid evidence that all analyzed windows present the opportunity to detect enhancer activity, but many do not act as enhancers (i.e., do not result in transcribed RNA). Notably, we observe a negligible correlation between DNA sequencing depth for a fragment, among analyzed windows, and mSTARR-seq enhancer activity (R2 = 0.029; now reported in lines 183-184). We also now report reproducibility between replicates, in which all replicate pairs have r > 0.89, on par with previously published STARR-seq datasets (e.g., Klein et al., 2020; Figure 1-figure supplement 6, pointed to in line 193).

      e. While the authors state that almost all of the control sequences contain CpGs sites, could the authors also give information on the total number of CpG sites in the different subsets? Was the number of CpGs in a 600 bp window related to the effects of DNA methylation on enhancer activity?

      We now provide the number of CpG sites per window in the different subsets in lines 282-284. As expected, they are higher for EPIC array sites and for RRBS sites because the EPIC array is biased towards CpG-rich promoter regions, and the enzyme typically used in the starting step of RRBS digests DNA at CpG motifs (but control sequences still contain an average of ~13 CpG sites per fragment). We also now model the magnitude of the effects of DNA methylation on regulatory activity as a function of number of CpG sites within the 600 bp windows. Consistent with our previous work in Lea et al., 2018, we find that mSTARR-seq enhancers with more CpGs tend to be repressed by DNA methylation (now reported in lines 216-219 and Figure 1figure supplement 11).

      f. In the discussion, a statement on the underrepresented regions, likely regulatory elements with lower CG content, that nonetheless can be highly relevant for gene regulation would be important to put the data in perspective.

      Thanks for this suggestion. We agree that regulatory regions, independent of CpG methylation, can be highly relevant, and now clarify in the main text that the “unmethylated” condition of mSTARR-seq is essentially akin to a conventional STARR-seq experiment, in that it assesses regulatory activity regardless of CpG content or methylation status (lines 128-130).

      Consequently, our study is well-designed to detect enhancer-like activity, even in windows with low GC content. We now show with additional analyses that we generated adequate DNA-seq coverage on the transfected plasmids to analyze 90.2% of the human genome, including target regions with no or low CpG content (lines 148-149; 153-156; Supplementary file 2). As noted above, we also now clarify that regions dropped out of our formal analysis because we had little to no evidence that any transcription was occurring at those loci, not because sequences for those regions were not successfully transfected into cells (see responses above and new Figure 1-figure supplement 4 and Figure 2-figure supplement 1).

      g. To control for differences in methylation of the two libraries, the authors sequence a single CpGs in the vector. Could the authors look at DNA methylation of the 600 bp windows at the end of the experiment, could DNA methylation of these windows be differently affected according to sequence? 48 hours could be enough for de-methylation or re-methylation.

      We agree that variation in demethylation or remethylation depending on fragment sequence is possible. We now state this caveat in the main text (lines 158-159), and specify that genomic coverage of our bisulfite sequencing data across replicates are (unfortunately) too variable to perform reliable site-by-site analysis of DNA methylation levels before and after the 48 hour experiment (lines 1182-1185). Instead, we focus on a CpG site contained in the adapter sequence (and thus included in all plasmids) to generate a global estimate of per replicate methylation levels. We also now note that any de-methylation or re-methylation would reduce our power to detect methylation-dependent activity, rather than leading to false positives (lines 163-165).

      h. The section on the method for correction for multiple testing should be more detailed as it is very difficult to follow. Why were only 100 permutations used, the empirical p-value could then only be <0.01? The description of a subsample of the N windows with positive Betas is unclear, should the permutation not include the actual values and thus all windows - or were the no negative Betas? Was FDR accounting for all elements and pairs?

      We have now expanded the text in the Materials and Methods section to clarify the FDR calculation (lines 691, 695-699, 702, 706). We clarify that the 100 permutations were used to generate a null distribution of p-values for the data set (e.g., 100 x 17,461 p-values for the baseline data set), which we used to derive a false discovery rate. Because we base our evidence on FDRs, we therefore compare the distribution of observed p-values to the distribution of pvalues obtained via permutation; we do not calculate individual p-values by comparing an observed test statistic against the test statistics for permuted data for that individual window.

      We compare the data to permutations with only positive betas because in the observed data, we observe many negative betas. These correspond to windows which have no regulatory activity (i.e., they have many more input DNA reads than RNA-seq reads) and thus have very small pvalues in a model testing for DNA-RNA abundance differences. However, we are interested in controlling the false discovery rate of windows that do have regulatory activity (positive betas). In the permuted data, by contrast and because of the randomization we impose, test statistics are centered around 0 and essentially symmetrical (approximately equally likely to be positive or negative). Retaining all p-values to construct the null therefore leads to highly miscalibrated false discovery rates because the distribution of observed values is skewed towards smaller values— because of windows with “significantly” no regulatory activity—compared to the permuted data. We address that problem by using only positive betas from the permutations.

      i. The interpretation of the overlap of Dex-response windows with CpGs sites associated with early adversity should be revisited according to the points also mentioned in the public review and the authors may want to consider exploring additional datasets with other challenges.

      Thank you, see our responses to the public review above and our revisions in lines (lines 555559). We agree that comparisons with more data sets and generation of more mSTARR-seq data in other challenge conditions would be of interest. While beyond the scope of this manuscript, we hope the resource we have developed and our methods set the stage for just such analyses.

      Reviewer #2 (Public Review):

      This work presents a remarkably extensive set of experiments, assaying the interaction between methylation and expression across most CpG positions in the genome in two cell types. To this end, the authors use mSTARR-seq, a high-throughput method, which they have previously developed, where sequences are tested for their regulatory activity in two conditions (methylated and unmethylated) using a reporter gene. The authors use these data to study two aspects of DNA methylation:

      1) Its effect on expression, and 2. Its interaction with the environment. Overall, they identify a small number of 600 bp windows that show regulatory potential, and a relatively large fraction of these show an effect of methylation on expression. In addition, the authors find regions exhibiting methylation-dependent responses to two environmental stimuli (interferon alpha and glucocorticoid dexamethasone).

      The questions the authors address represent some of the most central in functional genomics, and the method utilized is currently the best method to do so. The scope of this study is very impressive and I am certain that these data will become an important resource for the community. The authors are also able to report several important findings, including that pre-existing DNA methylation patterns can influence the response to subsequent environmental exposures.

      Thank you for this generous summary!

      The main weaknesses of the study are: 1. The large number of regions tested seems to have come at the expense of the depth of coverage per region (1 DNA read per region per replicate). I have not been convinced that the study has sufficient statistical power to detect regulatory activity, and differential regulatory activity to the extent needed. This is likely reflected in the extremely low number of regions showing significant activity.

      We apologize for our lack of clarity in the previous version of the manuscript. Nonzero coverage for half the plasmid-derived DNA-seq replicates is a minimum criterion, but for the baseline dataset, the mean depth of DNA coverage per replicate for windows passing the DNA filter is quite high: 12.723 ± 41.696 s.d. overall, and 93.907 ± 10.091 s.d. in the windows we subjected to full analysis (i.e., windows that also passed the RNA read filter). We now provide these summary statistics in lines 148-149 and 169-170 and Supplementary file 5 (see also our responses to Reviewer 1 above). We also now show, using a rarefaction analysis, that our data set saturates the ability to detect regulatory windows based on DNA and RNA sequencing depth (new Figure 1-figure supplement 4; lines 179-184; 715-722).

      2) Due to the position of the tested sequence at the 3' end of the construct, the mSTARR-seq approach cannot detect the effect of methylation on promoter activity, which is perhaps the most central role of methylation in gene regulation, and where the link between methylation and expression is the strongest. This limitation is evident in Fig. 1C and Figure 1-figure supplement 5C, where even active promoters have activity lower than 1. Considering these two points, I suspect that most effects of methylation on expression have been missed.

      Thank you for pointing this out. We agree that we have not exhaustively detected methylationdependent activity in all promoter regions, given that not all promoter regions are active in STARR-seq. However, there is good evidence that some promoter regions can function like enhancers and thus be detected in STARR-seq-type assays (Klein et al., 2020). This important point is now noted in lines 187-189; an example promoter showing methylation-dependent regulatory activity in our dataset is shown in Figure 3E.

      We also now clarify that Figure 1C shows significant enrichment of regulatory activity in windows that overlap promoter sequence (line 239). The y-axis is not a measure of activity, but rather the log-transformed odds ratio, with positive values corresponding to overrepresentation of promoter sequences in regions of mSTARR-seq regulatory activity. Active promoters are 1.640 times more likely to be detected with regulatory activity than expected by chance (p = 1.560 x 10-18), which we now report in a table that presents enrichment statistics for all ENCODE elements shown in Figure 1C for clarity (Supplementary file 4). Moreover, 74.1% of active promoters that show regulatory activity have methylation-dependent activity, also now reported in Supplementary file 4.

      Overall, the combination of an extensive resource addressing key questions in functional genomics, together with the findings regarding the relationship between methylation and environmental stimuli makes this a key study in the field of DNA methylation.

      Thank you again for the positive assessment!

      Reviewer #2 (Recommendations For The Authors):

      I suggest the authors conduct several tests to estimate and/or increase the power of the study:

      1) To estimate the potential contribution of additional sequencing depth, I suggest the authors conduct a downsampling analysis. If the results are not saturated (e.g., the number of active windows is not saturated or the number of differentially active windows is not saturated), then additional sequencing is called for.

      We appreciate the suggestion. We have now performed a downsampling/rarefaction curve analysis in which we downsampled the number of DNA reads, and separately, the number of RNA reads. We show that for both DNA-seq depth and RNA-seq depth, we are within the range of sequencing depth in which additional sequencing would add minimal new analysis windows in the dataset (Figure 1-figure supplement 4; lines 179-184; 715-722).

      2) Correlation between replicates should be reported and displayed in a figure because low correlations might also point to too few reads. The authors mention: "This difference likely stems from lower variance between replicates in the present study, which increases power", but I couldn't find the data.

      We now report the correlations between RNA and DNA replicates within the current dataset and within the Lea et al., 2018 dataset (Figure 1-figure supplement 6). The between-replicate correlations in both our RNA libraries and DNA libraries are consistently high (r ≥ 0.89).

      3) The correlation between the previous and current K562 datasets is surprisingly low. Given that these datasets were generated in the same cell type, in the same lab, and using the same protocol, I expected a higher correlation, as seen in other massively parallel reporter assays. The fact that the correlations are almost identical for a comparison of the same cell and a comparison of very different cell types is also suspicious.

      Thanks for raising this point. We think it is in reference to our original Figure 1-Figure supplement 6, for which we now provide Pearson correlations in addition to R2 values (now Figure 1-Figure supplement 8). We note that this is not a correlation in raw data, but rather the correlation in estimated effect sizes from a statistical model for methylation-dependent activity. We now provide Pearson correlations for the raw data between replicates within each dataset (Figure 1-Figure supplement 6), which for the baseline dataset are all r > 0.89 for RNA replicates and r > 0.98 for DNA replicates, showing that replicate reproducibility in this study is on par with other published studies (e.g., Klein et al., 2020 report r > 0.89 for RNA replicates and r > 0.91 for DNA replicates).

      We do not know of any comparable reports in other MPRAs for effect size correlations between two separately constructed libraries, so it’s unclear to us what the expectation should be. However, we note that all effect sizes are estimated with uncertainty, so it would be surprising to us to observe a very high correlation for effect sizes in two experiments, with two independently constructed libraries (i.e., with different DNA fragments), run several years apart—especially given the importance of winner’s curse effects and other phenomena that affect point estimates of effect sizes. Nevertheless, we find that regions we identify as regulatory elements in this study are 74-fold more likely to have been identified as regulatory elements in Lea et al., 2018 (p < 1 x10-300).

      4) The authors cite Johnson et al. 2018 to support their finding that merely 0.073% of the human genome shows activity (1.7% of 4.3%), but:

      a. the percent cited is incorrect: this study found that 27,498 out of 560 million regions (0.005%) were active, and not 0.165% as the authors report.

      We have modified the text to clarify the numerator and denominator used for the 0.165% estimate from Johnson et al 2018 (lines 175-176). The numerator is their union set of all basepairs showing regulatory activity in unstimulated cells, which is 5,547,090 basepairs. The denominator is the total length of the hg38 human genome, which is 3,298,912,062 basepairs.

      Notably, the denominator (the total human genome) is not 560 million—while Johnson et al (2018) tested 560 million unique ~400 basepair fragments, these fragments were overlapping, such that the 560 million fragments covered the human genome 59 times (i.e., 59x coverage).

      b. other studies that used massively parallel reporter assays report substantially higher percentages, suggesting that the current study is possibly underpowered. Indeed, the previous mSTARR-seq found a substantially larger percentage of regions showing regulatory activity (8%). The current study should be compared against other studies (preferably those that did not filter for putatively active sequences, or at least to the random genomic sequences used in these studies).

      We appreciate this point and have double checked comparisons to Johnson et al., 2018 and Lea et al., 2018. Our numbers are not unusual relative to Johnson et al., 2018 (0.165%), which surveyed the whole genome. Also, in comparing to the data from Lea et al., 2018, when processed in an identical manner (our criteria are more stringent here), our values of the percent of the tested genome showing significant regulatory activity are also similar: 0.108% in the Lea et al., 2018 dataset versus 0.082% in the baseline dataset. Finally, our rarefaction analyses (see our responses above) indicate that we are not underpowered based on sequencing depth for RNA or DNA samples. We also note that there are several differences in our analysis pipeline from other studies: we use more technical replicates than is typical (compare to 2-5 replicates in Arnold et al., 2013; Johnson et al., 2018; Muerdter et al., 2018), we measure DNA library composition based on DNA extracted from each replicate post-transfection (as opposed to basing it on the pre-transfection library: [Johnson et al., 2018], and we use linear mixed models to identify regulatory activity as opposed to binomial tests [Johnson et al., 2018; Arnold et al., 2013; Muerdter et al., 2018].

      I find it confusing that the four sets of CpG positions used: EPIC, RRBS, NR3C1, and random control loci, add up together to 27.3M CpG positions. Do the 600 bp windows around each of these positions sufficient to result in whole-genome coverage? If so, a clear explanation of how this is achieved should be added.

      Thanks for this comment. Although our sequencing data are enriched for reads that cover these targeted sites, the original capture to create the input library included some off target reads (as is typical of most capture experiments, which are rarely 100% efficient). We then sequenced at such high depth that we ultimately obtained sequencing coverage that encompassed nearly the whole genome. We now clarify in the main text that our protocol assesses 27.3 million CpG sites by assessing 600 bp windows encompassing 93.5% of all genomic CpG sites (line 89), which includes off-target sites (line 149).

      scatter plot showing the RNA to DNA ratios of the methylated (x-axis) vs unmethylated (y-axis) library would be informative. I expect to see a shift up from the x=y diagonal in the unmethylated values.

      We have added a supplementary figure showing this information, which shows the expected shift upwards (Figure 1-figure supplement 9).

      Another important figure missing is a histogram showing the ratios between the unmethylated and methylated libraries for all active windows, with the significantly differentially active windows marked.

      We have added a supplementary figure showing this information (Figure 1-Supplementary Figure 10).

      Perhaps I missed it, but what is the distribution of effect sizes (differential activity) following the various stimuli?

      This information is provided in table form in Supplementary Files 3, 10, and 11, which we now reference in the Figure 2 legend (lines 365-366).

      Minor changes

      It is unclear what the lines connecting the two groups in Fig.3C represent, as these are two separate groups of regions.

      We now clarify in the figure legend that values connected by a line are the same regions, not two different sets of regions. They show the correlation between DNA methylation and gene expression at mSTARR-seq-identified enhancers in individuals before and after IAV stimulation, separately for enhancers that are shared between conditions (left) versus those that are IFNAspecific (right). The two plots therefore do show two different sets of regions, which we have depicted to visualize the contrast in the effect of stimulation on the correlation on IFNA-specific enhancers versus shared enhancers. We have revised the figure legend to clarify these points (line 458-460).

      L235-242 are unclear. Specifically - isn't the same filter mentioned in L241-242 applied to all regions?

      Yes, the same filter for minimal RNA transcription was applied to all regions. We have modified the text (lines 264-265, 271, 275-277) to clarify that the enrichment analyses were performed twice, to test whether the target types were: 1) enriched in the dataset passing the RNA filter (i.e., the dataset showing plasmid-derived RNA reads in at least half the sham or methylated replicates; n = 216,091 windows) and 2) enriched in the set of windows showing significant regulatory activity (at FDR < 1%; n = 3,721 windows).

      To improve cohesiveness, the section about most CpG sites associated with early life adversity not showing regulatory activity in K562s can be moved to the supplementary in my opinion.

      Thank you for this suggestion. Because ELA and the biological embedding hypothesis (via DNA methylation) were major motivations for our analysis (see Introduction lines 42-48; 75-79), and we also discuss these results in the Discussion (lines 518-520), we have respectfully elected to retain this section in the main manuscript. We have added text in the Discussion explaining why we think experimental tests of methylation effects on regulation are relevant to the literature on early life adversity (lines 520-522), and have added discussion on limits to these analyses (lines 527-533).

      References:

      Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A (2013) Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science, 339, 1074-1077.

      Cecil CA, Zhang Y, Nolte T (2020) Childhood maltreatment and DNA methylation: A systematic review. Neuroscience & Biobehavioral Reviews, 112, 392-409.

      Dubois M, Louvel S, Le Goff A, Guaspare C, Allard P (2019) Epigenetics in the public sphere: interdisciplinary perspectives. Environmental Epigenetics, 5, dvz019.

      Eisenberger NI, Cole SW (2012) Social neuroscience and health: neurophysiological mechanisms linking social ties with physical health. Nature neuroscience, 15, 669-674.

      Houtepen L, Hardy R, Maddock J, Kuh D, Anderson E, Relton C, Suderman M, Howe L (2018) Childhood adversity and DNA methylation in two population-based cohorts. Translational Psychiatry, 8, 1-12.

      Johnson GD, Barrera A, McDowell IC, D’Ippolito AM, Majoros WH, Vockley CM, Wang X, Allen AS, Reddy TE (2018) Human genome-wide measurement of drug-responsive regulatory activity. Nature communications, 9, 1-9.

      Klein JC, Agarwal V, Inoue F, Keith A, Martin B, Kircher M, Ahituv N, Shendure J (2020) A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nature Methods, 17, 1083-1091.

      Koss KJ, Gunnar MR (2018) Annual research review: Early adversity, the hypothalamic–pituitary– adrenocortical axis, and child psychopathology. Journal of Child Psychology and Psychiatry, 59, 327-346.

      Marzi SJ, Sugden K, Arseneault L, Belsky DW, Burrage J, Corcoran DL, Danese A, Fisher HL, Hannon E, Moffitt TE (2018) Analysis of DNA methylation in young people: limited evidence for an association between victimization stress and epigenetic variation in blood. American journal of psychiatry, 175, 517-529.

      Muerdter F, Boryń ŁM, Woodfin AR, Neumayr C, Rath M, Zabidi MA, Pagani M, Haberle V, Kazmar T, Catarino RR (2018) Resolving systematic errors in widely used enhancer activity assays in human cells. Nature methods, 15, 141-149.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Can the authors statistically define the egg-laying classes? In some parts of the manuscript, the division between the different classes could be more ambiguous. I understand that the class III strains are divided by the kcnl-1 genotype, but given the different results for diverse traits, it could be more clear to keep them as one class. Also, overall, the authors choose a collection of 15 strains across the different classes to phenotype for many traits and perform genome edits. It is understandable that they cannot test all strains, but given the variation across traits and classes, it might be good to add a few more caveats about how these strains might not be representative of all strains across the species.

      Response: The egg-laying classes were defined as in Figure 1A by arbitrarily chosen cut-offs (at 10, 10-25, and 25 eggs in utero) to simplify subsequent analyses. We added this explanation to the first paragraph of the results section. However, the differences in average egg retention are significantly different between the four defined classes using the 15 selected strains (Fig. 2A).

      We think that the distinction between Class IIIA and IIIB strains is important and justified because the two Classes significantly differ in mean egg retention (Fig. 2A) and because Class IIIB harbour the large-effect variant KCNL-1 V530L whereas Class IIIA do not.

      We agree that the 15 selected strains are not necessarily representative of all strains across the species. We have added a note of caution regarding this point to the first paragraph of the section “Temporal progression of egg retention and internal hatching”: “Note that this strain selection, especially concerning the largest Class II, is unlikely to reflect the overall strain diversity observed across the species". In addition, we have reworded the first sentence of this paragraph as follows: “ To better characterize natural variation in C. elegans egg retention, we focused on a subset of 15 strains from divergent phenotypic Classes I-III, with an emphasis on Class III strains exhibiting strong egg retention (at mid-L4 + 30h) (Fig. 2A and 2B).”

      2) For the GWAS experiments, the authors should describe if any of the QTL overlap with hyper-divergent regions in the strain set. The QTL could be driven by these less well defined regions.

      Response: We have added the following sentence: “The three QTLs do not align with any of the recently identified hyper-divergent regions of the genome (Lee et al., 2021).

      3) The authors should look at correlations between the mod-5(n822) edit phenotypes and the exogenous 5-HT and SSRI phenotypes to demonstrate how the traits can differ. Some correlation plots might help that point as well.

      Response: We examined all possible correlations as suggested: none are significant and strain effects on trait differences are idiosyncratic, as written in our results section. The correlational analyses remain of limited value due to small samples: N=10 for mean strain values for measured phenotypes. We therefore feel that these analyses do not provide any additional insights beyond our figures (4C, 4D, 5C, 5D, S5A-C ) and our statement on page 15: “As in previous experiments (Fig. 4C and 5C), we find again that strains sharing the same egg retention phenotype may differ strongly in egg-laying behaviour in response to modulation of both exo- and endogenous serotonin levels (Class IIIA: ED3005 and JU2829) (Fig. 5D and S5C).”

      4) Figure 6D, was there any censoring of the data? Normally, these types of studies are plagued by an increase in censored animals that can decrease significance. The effects among the classes seem large, but statistical comparisons might help as well.

      Response: There was no censoring of animals (censoring of animals in lifespan studies is usually done by removing “bags of worms”, which here was our study phenotype). We now mention this in the corresponding figure legend. We also added a statistical analysis showing that mean survival was significantly different between all Classes.

      5) Many of the traits, edits, and deeper analyses are performed on the JU751 genetic background. This choice is sensible, otherwise, the work can increase exponentially. However, the authors should add a caveat about how these results might be limited to JU751 and other strains might respond differently.

      Response: For certain experiments, it was not feasible to include multiple strains from all phenotypic classes, so we selected JU751 (Class IIIB) and JU1200 (Class II), for which we had established CRISPR-engineered lines to modulate the egg retention phenotype by a single amino acid change in KCNL-1. To emphasize that these experimental observations cannot be generalized, we added the following statement in the relevant results section: “These experimental results offer preliminary evidence (bearing in mind that our analysis was primarily centered on a single genetic background) that laying of advanced-stage embryos may enhance intraspecific competitive ability, particularly in scenarios where multiple genotypes compete for colonization and exploitation of limited, patchily distributed resources.”

      6) The authors argue that evolution could be acting on specific parts of the egg-laying machinery (e.g., muscledirected signaling components). It might be useful to look at levels of standing variation and selection at groups of loci compared to genomic controls to see if this conclusion can be strengthened.

      Response: This is a good idea but how to select pertinent candidate loci is unclear (there are over 300 genes with effects on egg laying, www.wormbase.org). In addition, the genetics of muscle-directed signalling components in egg laying is only starting to be explored, with no specific candidate genes having been identified (Medrano & Collins, 2023, Curr Biol). We therefore think that such an analysis is currently not possible.

      7) Completely optional: The authors present a compelling and interesting case for transitions and trade-offs between oviparity and viviparity. The C. vivipara species has a different egg-laying mode than other Caenorhabditis species. The authors could add a short section describing their expectations about the neuronal morphology, 5-HT circuits, and muscle function in this species given their results. What genes or circuits should be the focus of future studies to address this question in Caenorhabditis. Also, Loer and Rivard present some similar ideas based on the differences in 5-HT staining neurons across diverse nematodes. Those results can be incorporated and discussed as well.

      Response: Our current research focuses on the evolution of egg laying in different Caenorhabditis species. So far, however, it remains difficult to provide specific hypotheses on how the egg-laying circuit has changed in C. vivipara. We rephrased the final paragraph of the discussion to incorporate some of the reviewer’s suggestions: “Nematodes display frequent transitions from oviparity to obligate viviparity in many distinct genera (Sudhaus, 1976; Ostrovsky et al., 2015), including in the genus Caenorhabditis, with at least one viviparous species, C. vivipara (Stevens et al., 2019). Although evidence exists for the evolution of egg-laying circuitry across oviparous Caenorhabditis species (Loer and Rivard, 2007), the specific cellular and genetic changes responsible for the transition to obligate viviparity in C. vivipara have yet to be examined. Resolving the genetic basis of intraspecific variation in C. elegans egg retention, including partial or facultative viviparity, may thus shed light on the molecular changes underlying the initial steps of evolutionary transitions from oviparity to obligate viviparity in invertebrates.”

      Specific edits:

      1) Perhaps a silly point, but "parity" (to my knowledge) does not have a biological meaning on its own. I suggest "egg-laying mode" or "birth mode".

      Response: This term has been used previously in the literature (e.g.https://onlinelibrary.wiley.com/doi/10.1111/jeb.13886 or https://doi.org/10.1101/2023.10.22.563505). However, as the referee rightly points out, this is not a standard term. We therefore replaced “parity mode” with “egg-laying mode”.

      2) "Against fluctuating environmental fluctuations" is a bit strange

      Response: Corrected.

      3) The first publications of Egl mutants were by the Horvitz lab so some citations are not in all of the first descriptions of the trait (early in Results)

      Response: We have added the relevant work (Trent 1982, Trent 1983, Desai & Horvitz 1989) to this paragraph in the early results section.

      4) "Strong egg retention usually strongly..." is a bit strange

      Response: Corrected.

      1. Figure 8G font looks smaller than the others.

      Response: Corrected.

      Reviewer #2:

      1) In Figure 1A, I infer that in the graph class I measurements are represented by dark blue dots and class II by purple dots. I am having a really hard time distinguishing between these two colors in the graph. In the pie chart I have no problem, but in the graph the black lines around the colored dots seem to obscure the colors. Not sure how to fix this graphical problem, but it is preventing the graph from communicating the results effectively.

      Response: We have changed the colours, spacing and format of this figure to resolve this problem.

      2) The behavioral analysis of Figure 3B-3F is problematic. The experimental methods used and the interpretation of the results each have issues. This is cause for concern since this is the most direct analysis of the actual variations in egg-laying behavior across strains presented in this paper.

      This experiment is modeled after the work of Waggoner et al. 1998, who recorded egg laying events of individual worms on video over several hours and noted the exact time of individual egg laying events. Waggoner et al. found in the reference C. elegans strain N2 that egg-laying events occurred in ~2 minute clusters ("active phases") separated by ~20 minute silent periods ("inactive phases"). Mignerot et al. did not take continuous videos of animals, but rather examined plates bearing a single worm only every 5 minutes and noted the number of new eggs that appeared on the plate in each 5-minute interval. From these data, the authors claim they have measured the intervals between "egg-laying phases" (the term used in the Figure 3 legend). In the Results, the authors explicitly claim they are measuring the timing and frequency of actual active and inactive egg-laying phases. Apparently, all the eggs laid within one 5-minute interval are considered to have been laid in a single active phase, and the time between 5-minute intervals containing egg laying events is considered an "inactive phase" and is measured only with a resolution of 5 minutes. It is not explained anywhere how the authors handle the situation of seeing eggs laid in two consecutive 5-minute intervals. Is that one active phase that is 10 minutes long, or is that two separate active phases with a 5-minute active phase in between? Because of this ambiguity in how they define active and inactive phases, I find it impossible to understand and judge the data presented in Fig. 3D-3F. The authors in the results state that "Class I and Class IIIB displayed significantly accelerated and reduced egg laying activity respectively (Fig. 3C to 3E)" . I assume they are referring to the statistical analysis described in the figure legend, which is quite difficult to understand. Frankly, just looking at the graphs in Fig. 3D3F, it is hard for the reader to identify specific features shown in the graphs can explain why, for example, Class I strains have fewer retained eggs than Class III strains. So, I found this analysis very unsatisfying.

      I also feel the authors are making an unwarranted assumption that their non-N2 strains will have distinguishable active and inactive phases of egg-laying behavior analogous to those seen in the N2 strain. Given the possibly large variations in egg-laying behavior in the various strains examined, that assumption should be questioned. Thus, framing the entire analysis of behavior patterns in terms of the length of active and inactive phases might not be appropriate.

      Response: This comment validly highlights important problems and limitations of our scan-sampling method to quantify strain differences in egg-laying behaviour. We acknowledge that we failed to present the data with due diligence, and clarity regarding terminology and interpretation. However, we think that some of these results are still of value after revised presentation. Our biggest mistake was to use the terms “active and inactive phase”, as coined by Waggoner et al. 1998. We are aware that our measures are not equivalent to these previously defined measures but have been sloppy with terminology. We therefore carefully reworded this entire results section, using clear definitions to indicate differences between the Waggoner assay and our assay (including a graphical representation of our assay design in the revised Fig. 3B). In brief, our simplified assay is useful to estimate the frequency and approximate duration of prolonged inactive periods of egg laying because we can unambiguously determine intervals in which eggs were laid or not. In contrast, as pointed out by the reviewer, we cannot determine if multiple active phases occurred within a 5-min interval, nor can we estimate the duration of an active “phase”. We now state this limitation explicitly in the manuscript. What our results do show is that the number of intervals during which egg laying occurred is significantly different between strains and Classes: Class I (low retention) have a higher number of intervals with egg-laying events, whereas Class IIIB showed a reduced number of such events (Fig. 3D). We can therefore also roughly estimate the mean time (per individual) between two egg-laying intervals, giving us a proxy for prolonged periods when egg-laying is inactive (Fig. 3E); we note that our estimate for N2 is very close to what has been previously measured (~20 min). Therefore, we can confidently conclude that there are natural strains which have both shorter (Class I) and longer (Class IIIB) inactive periods of egg laying. These results partly align with observed variation in egg retention. However, we agree with the reviewer – as we had stated both in results and discussion sections – that these behavioural differences act together with differences in the sensing of egg accumulation in utero (as suggested by results shown in Fig. 3G and 3H). We also agree that it seems very plausible that the observed behavioural differences, as revealed by scan-sampling, may only have a secondary role in accounting for natural variation in egg retention. We will be testing these hypotheses specifically in our future research.

      Note: The statistical analyses are nested ANOVAs to ask (a) does the value differ between strains within a given class and (b) does the value differ between Classes? Classes labelled with different letters in the figures therefore significantly differ in their mean values, demonstrating that measured behavioural phenotypes consistently differ between some (but not all) phenotypic classes, yet largely in line with their egg retention phenotypes (Fig. 3D and 3E).

      3) Figure 4A is a schematic diagram of how the egg-laying circuit works based on previous literature, and the authors cite Collins et al. 2015 and Kopchock et al. 2021 as their sources. One feature of this figure seems unwarranted, namely the part indicating that egg accumulation acts on the UM muscles, and the statement in the legend that "mechanical excitation of uterine muscles (UM) in response to egg accumulation favours exit from the inactive state (Collins et al., 2016)". I believe Collins et al. 2016 showed that egg accumulation favors egg laying and may have speculated that it does so by stretching the um muscles, but this idea remains speculative and has not been established by any experimental data. I point out this issue,in particular, because it may bear on the nice data the authors of this manuscript show in Figure 3G and 3H, which show that some strains accumulate many eggs in the uterus before they initiate egg laying.

      Also, in Figure 4A and 4B, the legend does not explain the logic of the green areas labeled "egg-laying active phase" and the yellow area labeled "egg-laying inactive state". I was not sure what sure how to interpret these features of the graphics.

      Response: The input from uterine muscles remains indeed hypothetical, and we have corrected the figure accordingly, now simply referring to the feedback of egg accumulation on egg laying activity, as recently characterized in more detail by Medrano & Collins (2023, Curr Biol).

      The green/yellow backgrounds shown in figures 4A (and 4B) are not useful and we have removed them.

      4) Results, page 11: "We used standard assays, in which animals are reared in liquid M9 buffer without bacterial food." In the standard assays, animals are reared on NGM agar plates with bacterial food, and then at the start of the egg-laying assay, are transferred to liquid M9 buffer without bacterial food. I assume that is what these authors did, and they should correct the language of the text to make it more accurate.

      Response: The reviewer is correct. We have incorporated this change to improve accuracy.

      5) The authors note that "serotonin induced a much stronger egg-laying responds in the Class IIIA strain ED3005 than in other strains (Fig. 4C)". I would like to point out to the authors that strains such as ED3005 that have a very large number of unlaid eggs in their uterus are prone to lay a very large number of eggs when treated with exogenous serotonin, simply for the trivial reason that they have more eggs to release. This was previously seen in, for example, in Desai and Horvitz (1989) in certain egg-laying defective mutants.

      Response: This is an important point and our comparison of ED3005 to ALL other strains is problematic. We changed this result description by stating that ED3005 shows possible serotonin hypersensitivity compared to strains with similar levels of egg retention (Class IIIA): “In addition, serotonin induced a much stronger egg-laying response in the strain ED3005 than in other Class IIIA strains with similar levels of egg retention (Fig. 4B). ED3005 may thus exhibit serotonin hypersensitivity, which has been observed in certain egg-laying mutants where perturbed synaptic transmission impacts serotonin signalling (Schafer and Kenyon, 1995; Schafer et al., 1996).”

      6) In Figure 4 the authors show that all strains lay eggs in response to fluoxetine and imipramine, but some strains (Class IIIB) do not lay eggs in response to serotonin. They then cite a series of papers, starting with Trent et al. 1983, that they claim show that this specific phenotype demonstrates that the HSN neurons are functionally releasing serotonin (bottom of page 11). This statement needs to be removed - it is incorrect. It is true that egg laying in response to fluoxetine and/or imipramine AS WELL AS egg laying in response to serotonin has been interpreted as indicating the presence of HSN neurons that functionally release serotonin to stimulate egg laying (these were referred to as Category C by Trent et al., 1983). However, the mutants that Mignerot et al. are talking about (those that don't respond to serotonin but do respond to imipramine/fluoxetine) were called Category D by Trent et al., 1983, and to my knowledge these have never been interpreted as necessarily having functionally intact HSN neurons. Mutants such as these that can lay eggs in some circumstances but cannot lay eggs in response to exogenous serotonin have usually been interpreted as having egg-laying muscles that are defective in responding to serotonin.

      How can we interpret strains that respond to imipramine/fluoxetine and not serotonin? Mignerot et al. cite some of the papers (Kullyev et al. 2010; Wenishenker et al., 1999; Yue et al., 2018) showing that imipramine and fluoxetene have off-target effects and can stimulate egg laying by acting through proteins other than the serotonin-reuptake inhibitor. The authors later in their discussion at the top of Page 24 also cite Dempsey et al 2005, a paper that also argues that imipramine and fluoxetene act via off target effects. However, currently in Figure 4B Mignerot et al. emphasize that the serotonin reuptake inhibitor is the target of these drugs. Since the results presented for Class IIIB strains are not in accord with this interpretation, this seems misleading to me. The bottom line for me is that class IIIB strains cannot respond to exogenous serotonin, but can lay eggs in other conditions, so perhaps there is something specifically wrong with their ability to respond to serotonin.

      Response: We thank the reviewer for this important comment – we misinterpreted some of these past findings and our statements were either inexact or incorrect. We have revised this section accordingly: “Both drugs also stimulated egg laying in the Class IIIB strains and the Class IIIA strain JU2829 for which exogenous serotonin either inhibited egg laying or had no effect on it (Fig. 4B). In the past, mutants unresponsive to serotonin yet responsive to other drugs, including fluoxetine and imipramine, have been interpreted as being defective in the serotonin response of vulval muscles (Trent et al., 1983; Reiner et al., 1995; Weinshenker et al., 1995). This is indeed the likely case of Class IIIB strains carrying the KCNL-1 V530L variant thought to specifically reduce excitability of vulval muscles (Vigne et al., 2021). Our results therefore suggest that JU2829 (Class IIIA) may exhibit a similar defect in vulval muscle activation via serotonin caused by an alternative genetic change. Overall, these pharmacological assays do not allow us to conclude if and how HSN function has diverged among strains because the mode of action and targets of tested drugs has not been fully resolved. Nevertheless, our results are consistent with previous models proposing that these drugs do not simply block serotonin reuptake but can stimulate egg laying, to some extent, through mechanisms independent of serotonergic signaling (Trent et al., 1983; Desai and Horvitz, 1989; Reiner et al., 1995; Weinshenker et al., 1995, 1999; Dempsey et al., 2005; Kullyev et al., 2010; Branicky et al., 2014; Yue et al., 2018).”

      We removed the oversimplified Fig. 4B to avoid any misinterpretation.

      8) In Figure 7B and 7C, the authors should add some type of error bars to the graphs to and give the readers an idea of whether the differences between strains that they write about are statistically significant or not.

      Response: These are frequency data to describe temporal dynamics of hatching (N=45-72 eggs per strain) (Fig. 7B) and development in single cohorts (N=48-177 eggs per strain) (Fig. 7C), hence, the absence of error bars.

      We agree that this representation of the data is not very telling. We therefore changed the data representation in these two figures to show that there are clear, statistically significant, negative correlations between egg retention and time to hatching / egg-to-adult developmental time.

      9) When the authors reference a list of papers in a single list, e.g. "(Burton et al., 2021; Fausett et al., 2021; Garsin et al., 2001; Padilla et al., 2002; Van Voorhies and Ward, 2000)" they seem to do so in alphabetical order by the first author's last name. I believe the usual practice is to list references by year of publication, with the earliest first.

      Response: We corrected citation style according to eLIFE format.

      10) At the top of page 24, the authors write "It seems unlikely, however, that any of these variants strongly alter central function of HSN and HSN-mediated signalling because fluoxetine and imipramine, known to act via HSN (Dempsey et al., 2005; Trent et al., 1983; Weinshenker et al., 1995), triggered a robust stimulatory effect on egg laying in all examined strains (Fig. 4C)." I believe that the Weinshenker paper in fact showed that imipramine does not act via the HSN, and the Dempsey paper suggested that both drugs can act at least in part independently of the HSN. Therefore, the authors should revise their statement.

      Response: We have removed the sentence.

      Reviewing Editor:

      Minor suggestions:

      1) p. 2, fifth line from bottom: "lead" instead of "leads";

      2) p. 2, last line: "muscle" instead of "muscles";

      3) p. 3, first full paragraph, 17th line: "populations" instead of "population";

      4) p. 5, fourth line from bottom: Delete first comma;

      5) p. 6, Figure 1D: "of" instead of "off";

      6) p. 7, fifth line: "KCNL-1";

      7) p. 9, third paragraph, second line: please clarify "late mid-L4";

      8) p. 16, first line: "exogenous";

      9) p 20, first paragraph, beginning of second sentence: "Whether" instead of "If";

      10) p. 22, ninth line from bottom: delete "shaped by";

      11) p. 23, last paragraph, third and eighth lines from bottom: change "between" to "among"

      Response: Thank you. All corrected.

      Additional changes:

      Figure 5A: We removed figure 5A showing a cartoon of mod-5/SERT and its effects on serotonin signalling. This figure was incorrectly showing that MOD-5 is expressed in HSN (Jafari et al 2011 J. Neuroscience, Hammarlund et al 2018 Neuron).

      Abstract: We reworded the abstract to reduce its length.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work describes new validated conditional double KO (cDKO) mice for LRRK1 and LRRK2 that will be useful for the field, given that LRRK2 is widely expressed in the brain and periphery, and many divergent phenotypes have been attributed previously to LRRK2 expression. The manuscript presents solid data demonstrating that it is the loss of LRRK1 and LRRK2 expression within the SNpc DA cells that is not well tolerated, as it was previously unclear from past work whether neurodegeneration in the LRRK double Knock Out (DKO) was cell autonomous or the result of loss of LRRK1/LRRK2 expression in other types of cells. Future studies may pursue the biochemical mechanisms underlying the reason for the apoptotic cells noted in this study, as here, the LRRK1/LRRK2 KO mice did not replicate the dramatic increase in the number of autophagic vacuoles previously noted in germline global LRRK1/LRRK2 KO mice.

      We thank the editors for handling our manuscript and for the succinct summary that recognizes the significance of our findings and points out interesting directions for future studies. We also thank the reviewers for their helpful comments and positive evaluation of our work. Below, we have provided point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Summary:

      This is an important work showing that loss of LRRK function causes late-onset dopaminergic neurodegeneration in a cell-autonomous manner. One of the LRRK members, LRRK2, is of significant translational importance as mutations in LRRK2 cause late-onset autosomal dominant Parkinson's disease (PD). While many in the field assume that LRRK2 mutant causes PD via increased LRRK2 activity (i.e., kinase activity), it is not a settled issue as not all disease-causing mutant LRRK2 exhibit increased activity. Further, while LRRK2 inhibitors are under clinical trials for PD, the consequence of chronic, long-term LRRK2 inhibition is unknown. Thus, studies evaluating the long-term impact of LRRK deficit have important translational implications. Moreover, because LRRK proteins, particularly LRRK2, are known to modulate immune response and intracellular membrane trafficking, the study's results and the reagents will be valuable for others interested in LRRK function.

      Strengths:

      This report describes a mouse model where the LRRK1 and LRRK2 gene is conditionally deleted in dopaminergic neurons. Previously, this group showed that while loss of LRRK2 expression does not cause brain phenotype, loss of both LRRK1 and LRRK2 causes a later onset, progressive degeneration of catecholaminergic neurons and dopaminergic (DAergic) neurons in the substantia nigra (SN), and noradrenergic neurons in the locus coeruleus (LC). However, because LRRK genes are widely expressed with some peripheral phenotypes, it was unknown if the neurodegeneration in the LRRK double knockout (DKO) was cell autonomous. To rigorously test this question, the authors have generated a double conditional (cDKO) allele where both LRRK1 and LRRK2 genes were targeted to contain loxP sites. In my view, this was beyond what is usually required, as most investigators might might combine one KO allele with another floxed allele. The authors provide a rigorous validation showing that the Driver (DAT-Cre) is expressed in most DAergic neurons in the SN and that LRRK levers are decreased selectively in the ventral midbrain. Using these mice, the authors show that the number of DAergic neurons is normal at 15 but significantly decreased at 20 months of age. Moreover, the authors show that the number of apoptotic neurons is increased by ~2X in aged SN, demonstrating increased ongoing cell death, as well as an increase in activated microglia. The degeneration is limited to DAergic neurons as LC neurons are not lost as this population does not express DAT. Overall, the mouse genetics and experimental analysis were performed rigorously, and the results were statistically sound and compelling.

      Weaknesses:

      I only have a few minor comments. First is that in PD and other degenerative conditions, loss of axons and terminals occurs prior to cell bodies. It might be beneficial to show the status of DAergic markers in the striatum. Second, previous studies indicate that very little, if any, LRRK1 is expressed in SN DAergic neurons. This also the case with the Allen Brain Atlas profile. Thus, authors should discuss the discrepancy as authors seem to imply significant LRRK1 expression in DA neurons.

      We appreciate the reviewer’s recognition of the importance of the study as well as our rigorous experimental approaches and compelling results. Our responses to the reviewer's two minor comments are below.

      1) DAergic markers in the striatum: We performed TH immunostaining in the striatum and quantified TH+ DA terminals in the striatum of DA neuron-specific LRRK cDKO and littermate control mice at the ages of 15 and 24 months. We found similar levels of TH immunoreactivity in the striatum of LRRK cDKO and littermate control mice at the age of 15 months (p = 0.6565, unpaired Student’s t-test) and significantly reduced levels of TH immunoreactivity in the striatum of LRRK cDKO, compared to control mice at the age of 24 months (~19%, p = 0.0215), suggesting an age-dependent loss of dopaminergic terminals in the striatum of DA neuron-specific LRRK cDKO mice. These results are now included as Figure 5 of the revised manuscript.

      2) LRRK1 expression in the SNpc: It is shown in the Mouse brain RNA-seq dataset and the Allen Mouse brain ISH dataset (https://www.proteinatlas.org/ENSG00000154237-LRRK1/brain) that LRRK1 is broadly expressed in the mouse brain and is expressed at modest levels in the midbrain, comparable to the cerebral cortex. Indeed, our Western analysis also showed that levels of LRRK1 detected in the dissected ventral midbrain and the cerebral cortex of control mice are similar (40µg total protein loaded per lane; Figure 2E). Furthermore, we previously demonstrated that deletion of LRRK2 (or LRRK1) alone does not cause age-dependent loss of DA neurons in the SNpc, but deletions of both LRRK1 and LRRK2 result in age-dependent loss of DA neurons in LRRK DKO mice, indicating the functional importance of LRRK1 in the protection of DA neuron survival in the aging mouse brain (Tong et al., PNAS 2010, 107: 9879-9884, Giaime et al., Neuron 2017, 96: 796-807).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shen and collaborators described the generation of cDKO mice lacking LRRK1 and LRRK2 selectively in DAT-positive DAergic neurons. The Authors asked whether selective deletion of both LRRK isoforms could lead to a Parkinsonian phenotype, as previously reported by the same group in germline double LRRK1 and LRRK2 knockout mice (PMID: 29056298). Indeed, cDKO mice developed a late reduction of TH+ neurons in SNpc that partially correlated with the reduction of NeuN+ cells. This was associated with increased apoptotic cell and microglial cell numbers in SNpc.

      Unlike the constitutive DKO mice described earlier, however, cDKO mice did not replicate the dramatic increase in the number of autophagic vacuoles. The study supports the authors' hypothesis that loss of function rather than gain of function of LRRK2 leads to PD.

      Strengths:

      The study described for the first time a model where both the PD-associated gene LRRK2 and its homolog LRRK1 are deleted selectively in DAergic neurons, offering a new tool to understand the physiopathological role of LRRK2 and the compensating role of LRRK1 in modulating DAergic cell function.

      Weaknesses:

      The model has no construct validity since loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD. The evidence of a Parkinsonian phenotype in these cDKO mice is limited and should be considered preliminary.

      We thank the reviewer for commenting on the usefulness of this new PD mouse model.

      The reviewer did not include a reference citation for the statement "loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD." It is possible that the reviewer was referring to a human population study (Whiffin et al., Nat Med 2020, 26: 869-877), entitled "The effect of LRRK2 lossof-function variants in humans." In this study, the authors analyzed 141,456 individuals sequenced in the Genome Aggregation Database, 49,960 exome-sequenced individuals from the UK Biobank, and more than 4 million participants in the 23andMe genotyped dataset, and they looked for human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants). The reported findings were interesting, and the authors were careful in stating their conclusions. However, this is not a linkage study of large pedigrees carrying a single, clear-cut loss-of-function mutation (e.g. large deletions of most exons and coding sequences). Therefore, the experimental evidence is not compelling enough to conclude whether loss-of-function mutations in LRRK2 cause PD or do not cause PD.

      The current report is an unbiased genetic study in an effort to reveal the normal physiological role of LRRK in dopaminergic neurons. It was not intended to produce Parkinsonian phenotypes in LRRK cDKO mice, which would be a biased effort. However, the unequivocal discovery of the cell intrinsic role of LRRK in the protection of DA neurons from age-dependent degeneration and apoptotic cell death should be considered seriously, while we contemplate the disease mechanism and how LRRK2 mutations may cause DA neuron loss and PD.

      Reviewer #3 (Public Review):

      Kang, Huang, and colleagues investigated the impact of LRRK1 and LRRK2 deletion, specifically in dopaminergic neurons, using a novel cDKO mouse model. They observed a significant reduction in DAergic neurons in the substantia nigra in their conditional LRRK1 and LRRK2 KO mice and a corresponding increase in markers of apoptosis and gliosis. This work set out to address a longstanding question within the field around the role and importance of LRRK1 and LRRK2 in DAergic neurons and suggests that the loss of both proteins triggers some neurodegeneration and glial activation.

      The studies included in this work are carefully performed and clearly communicated, but additional studies are needed to strengthen further the authors' claims around the consequences of LRRK2 deletion in DAergic neurons.

      1) In Figures 2E and F, the authors assess the protein levels of LRRK1 and LRRK2 in their cDKO mouse model to confirm the deletion of both proteins. They observe a mild loss of LRRK1 and LRRK2 signals in the ventral midbrain compared to wild-type animals. While this is not surprising given other cell types that still express LRRK1 and LRRK2 would be present in their dissected ventral midbrain samples, it does not sufficiently confirm that LRRK1 and LRRK2 are not expressed in DAergic neurons. Additional data is needed to more directly demonstrate that LRRK1 and LRRK2 protein levels are reduced in DAergic neurons, including analysis of LRRK1 and LRRK2 protein levels via immunohistochemistry or FACS-based analysis of TH+ neurons.

      We thank the reviewer for highlighting this incredibly important but often overlooked issue. We agree that the data in Figure 2E, F alone would be inadequate to validate DA neuron-specific LRRK cDKO mice.

      Cell type-specific conditional knockouts are a mosaic with KO cells mixed with other cell types expressing the gene normally. DA neuron-specific cDKO is particularly challenging, as DA neurons are a subset of cells embedded in the ventral midbrain. Rather than using immunostaining, which relies upon specific, good LRRK1 and LRRK2 antibodies for IHC, or FACS sorting of TH+ neurons followed by Western blotting (few cells, mixed cell populations, etc.), we chose a clean genetic approach by generating germline mutant mice carrying the deleted LRRK1 and LRRK2 alleles in all cells from the floxed LRRK1 and LRRK2 alleles. This approach permits characterization of these deletion mutations in germline mutant mice using molecular approaches that yield unambiguous results.

      We crossed CMV-Cre deleter mice with floxed LRRK1 and LRRK2 mice to generate respective germline LRRK1 KO and LRRK2 KO mice, in which all cells carry the LRRK1 or LRRK2 deleted alleles that are identical to those in DA neurons of cDKO mice. We then performed Northern, extensive RTPCR followed by sequencing, and Western analyses to show the absence of the full length LRRK1 and LRRK2 mRNA (Figure 1G, H, Figure 1-figure supplement 8 and 10), and the expected truncation of LRRK1 and LRRK2 mRNA (Figure 1-figure supplement 9 and 11), and the absence of LRRK1 and LRRK2 proteins (Figure 1I). These analyses together demonstrate that in the presence of Cre, either CMV-Cre expressed in all cells or DAT-Cre expressed selectively in DA neurons, the floxed LRRK1 and LRRK2 exons are deleted, resulting in null alleles. We further demonstrated the specificity of DAT-Cremediated recombination (deletion) by crossing DAT-Cre mice with a GFP reporter, showing that 99% TH+ DA neurons in the SNpc are also GFP+ (Figure 2A, B), indicating that DAT-Cre-mediated recombination of the floxed alleles occurs in essentially all TH+ DA neurons in the SNpc.

      2) The authors observed a significant but modest effect of LRRK1 and LRRK2 deletion on the number of TH+ neurons in the substantia nigra (12-15% loss at 20-24 months of age). It is unclear whether this extent of neuron loss is functionally relevant. To strengthen the impact of these data, additional studies are warranted to determine whether this translates into any PD-relevant deficits in the mice, including motor deficits or alterations in alpha-synuclein accumulation/aggregation.

      Yes, the reduction of DA neurons in the SNpc of cDKO mice at the age of 20-24 months is modest. At 15 months of age, the number of TH+ DA neurons in the SNpc is similar between LRRK cDKO mice (10,000 ± 141) and littermate controls (10,077 ± 310, p > 0.9999). At 20 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,948 ± 273) is significantly reduced (-12.7%), compared to control mice (10,244 ± 220, F1,46 = 16.59, p = 0.0002, two-way ANOVA with Bonferroni’s post hoc multiple comparisons, p = 0.0041). By 24 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,188 ± 452) relative to controls (9,675 ± 232, p = 0.0010) is further reduced (15.4%).

      Similar results were obtained by an independent quantification by another investigator, also conducted in a genotype blind manner, using the fractionator and optical dissector method, by which TH+ cells were quantified in 25% areas. These results are included as Figure 3-figure supplement 1 in the revised manuscript. Because of the more limited sampling, the quantification data are more variable, compared to quantification of TH+ cells in all areas of the SNpc, shown in Figure 3. With both methods, we quantified TH+ cells in every 10th sections encompassing the entire SNpc (3D structure), as sampling using every 5th or every 10th sections yielded similar results.

      We also performed behavioral analysis of LRRK cDKO mice and littermate controls at the ages of 10 and 25 months using the beam walk test (10 mm and 20 mm beam) and the pole test, which are sensitive to impairment of motor coordination. We found that LRRK cDKO mice at 10 months of age showed significantly more hindlimb errors (p = 0.0005, unpaired two-tailed Student’s t-test) and longer traversal time (p = 0.0075) in the 10mm beam walk test, compared to control mice, though their performance is similar in the 20 mm beam walk (hindlimb slips: p = 0.0733, traversal time: p = 0.9796) and in the pole test. At 22 months of age, the performance of LRRK cDKO mice and littermate controls is more variable and worse, compared to the younger mice, and is not significantly different between the genotypic groups. These results are now included as Figure 9 of the revised manuscript.

      3) The authors demonstrate that, unlike in the germline LRRK DKO mice, they do not observe any alterations in electron-dense vacuoles via EM. Given their data showing increased apoptosis and gliosis, it remains unclear how the loss of LRRK proteins leads to DAergic neuronal cell loss. Mechanistic studies would be insightful to understand better potential explanations for how the loss of LRRK1 and LRRK2 may impair cellular survival, and additional text should be added to the discussion to discuss potential hypotheses for how this might occur.

      We agree that this phenotypic difference between germline DKO and DA neuron-specific cDKO mice is intriguing, suggesting a non-cell autonomous contribution of LRRK in age-dependent accumulation of autophagic and lysosomal vacuoles in SNpc neurons of germline LRRK DKO mice. We will discuss the phenotypic difference further in the revised manuscript. We are generating microglial specific LRRK cDKO mice to investigate the role of LRRK in microglia and whether microglia contribute in a cell extrinsic manner to the regulation of the autophagy-lysosomal pathway in DA neurons.

      4) The authors discuss the potential implications of the neuronal cell loss observed in cDKO mice for LRRK1 and LRRK2 for therapeutic approaches targeting LRRK2 and suggest this argues that LRRK2 variants may exert their effects through a loss-of-protein function. However, all of the data generated in this work focus on a mouse in which both LRRK1 and LRRK2 have been deleted, and it is therefore difficult to make any definitive conclusions about the consequences of specifically targeting LRRK2. The authors note potential redundancy between the two LRRK proteins, and they should soften some of their conclusions in the discussion section around implications for the effects of LRRK2 variants. Human subjects that carry LRRK2 loss-of-function alleles do not have an increased risk for developing PD, which argues against the author's conclusions that LRRK2 variants associated with PD are loss-o-ffunction. Additional text should be included in their discussion to better address these nuances and caution should be used in terms of extrapolating their data to effects observed with PD-linked variants in LRRK2.

      We will modify the discussion accordingly in the revised manuscript.

    1. Author Response

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. Since we used suite2p for our data analysis, and have records of the fast-z correction applied by the microscope, we can supply these as quantifications of movement corrections that were applied across our sample of mice. We hope to supply this information as a supplement in the revised manuscript.

      Currently, we have chosen to show that the corrected, post- suite2p registration movement artifacts are very close to zero. We will revise the manuscript with clear descriptions of methods that we have found important, such as fully tightening all mounting devices, utilizing the air table properly, implanting the cranial window with proper, even pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We will revise the manuscript so that it is clear that we made use of two imaging configurations for the 2-photon mesoscope data and the benefits and limitations of these two preparations. The dorsal mount and the side mount each have their advantages and disadvantages, but together form a powerful tool for imaging much of the dorsal and lateral cortex in awake, behaving mice.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: We will modify the manuscript so that a more detailed comparison with other published techniques is included. The preparation by Esmaeili et al. 2021 has some similarities, but also differences, from our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We will compare these preparations more accurately in the revised manuscript.

      If you compare the imageable extent of our cranial window for mesoscale 2-photon imaging to that of their through-the-skull widefield preparation, which is a bit of an “apples to oranges” comparison, then you are likely correct that their field of view is larger than ours, if you are referring to our 10 mm radius-bend glass. However, use of our 9 mm radius bend glass (i.e. a tighter bend) allows us to image additional ventral auditory areas. We could show an example of this, perhaps, although we did not make as much use of this alternative window in the large FOV experiments, because the increased curvature of the glass relative to the 10 mm radius bend window prevents imaging of the entire preparation in a single 2-photon z-plane. With the 9 mm radius bend glass we mostly imaged in the multiple, small FOV configuration (see Fig. S2).

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion or areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons. We could provide additional documentation of these issues, but we would like to point out that, in our case, we were not trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, will be discussed more thoroughly in the revised manuscript.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparison across mice indicates that side and dorsal mount mice show similar behavioral variability.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective to 20 degrees to the right, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set. The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance. The dataset for this paper contains both neural and behavioral data for 91 sessions across 18 mice from both dorsal and side mount preparations. The complete analysis of this dataset exceeds the capacity of the present study. We will include more individual examples in the revised version, along with data showing the amount of between session and across mouse variance. We will include in the revised manuscript a comparison of the stability of B-SOiD measures across sessions, as a demonstration of what may be expected with this method.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We are revising the manuscript to give more detail without reducing readability, so as to increase clarity of presentation. Since this is a methods paper, we are modifying the manuscript to include more details and clear explanations so that the reader may replicate our methods and results.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We will re-write the motivation behind the study to clarify the general problems that are being addressed. As the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details will necessarily deal specifically with this system. We will briefly compare the methods and results from our Thorlabs system to that of other systems, based on what we are able to glean from the literature on their strengths and weaknesses.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We intend to make the data set used in making our main figures available to the public, perhaps using FigShare, so that they may check the validity of the methods and analysis. We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with a second in-depth analysis paper that is currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We will include a workflow diagram in the revised manuscript for the methods, data collection, and analysis.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453, “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks,” we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For line 463, “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”, we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107). We will also include this reference in the paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: There are two statistical comparisons that we feel may be useful to add to the single session data displayed in this figure, in order to address the point that you raise. The first would allow us to assess whether for each Rastermap group, the distribution of neuron densities across CCF areas differs from a null, uniform distribution. The second would allow us to examine differences between Rastermap groups associated with different qualitative behaviors in order to know with which patterns of neural activity they are reliably associated.

      For the first comparison, we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group for each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons (or ~6% density) per CCF area.) Our current figure legend states that the maximum of the scale bar look-up value (reds) for each group ranges from ~8% to 32%. So indeed, adding these significances would be informative in this case.

      For the second comparison, we could compare the density of neurons for each CCF area across Rastermap groups for this session. For example, it may be the case that the density of neurons in primary and secondary visual areas belonging to Rastermap groups that predominate during the “walk” behavior is higher than in the Rastermap group that predominates during the “whisk” behavior, or that the density of neurons in the “whisk” and “twitch” Rastermap groups in primary and secondary motor areas is higher than in the Rastermap groups that are active during the “walk” and “oscillate” behaviors.

      Such a comparison should in fact be robust to Rastermap group variability across sessions and mice, as long as the same qualitative behaviors recur. However, our current qualitative methods for discretization of the Rastermap groups likely limits our ability to extend such an analysis accurately across our entire dataset. We are pursuing more rigorous analysis methods in this vein for our second, results oriented paper.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up this methods paper with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript. Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion and will be included in the revised manuscript, so that readers can more readily follow our workflow, data collection, and analysis.

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data to be expected using our methods. We hope to provide a more complete analysis of data obtained using our methodology in the near future in a second manuscript.

      However, we may be amenable to including preliminary linear model fit results, as supplementary material, for the two example sessions highlighted in this paper (i.e. the one dorsal mount session in Fig. 4, and the one side mount session shown in Figs. 5 and 6).

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli, while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We will correct these statements and incorporate these, and other relevant, references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, speed, etc., and we will highlight a few of these without an extensive literature review.

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity data from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we will include images and data demonstrating the average changes in the window preparation, as well as the degree of variability and a range of outcome scenarios that we observed over the prolonged time periods of our study. We will also include methodological details that we found were useful for facilitating long term use of these preparations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study addresses how protein synthesis in activated lymphocytes keeps up with their rapid division, with important findings that are of significance to cell biologists and immunologists endeavouring to understand the 'economy' of the immune system. The work is supported by solid data but because it proposes non-conventional mechanisms, it requires additional explanation and justification to align with the current understanding in the field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors examine the fascinating question of how T lymphocytes regulate proteome expression during the dramatic cell state change that accompanies the transition from the resting quiescent state to the activated, dividing state. Orthogonal, complementary assays for translation (RPM/RTA, metabolic labeling) are combined with polyribosome profiling and quantitative, biochemical determinations of protein and ribosome content to explore this question, primarily in the OT-I T lymphocyte model system. The authors conclude that the ratio of protein levels to ribosomes/protein synthesis capacity is insufficient to support activation-coupled T cell division and cell size expansion. The authors hint at cellular mechanisms to explain this apparent paradox, focusing on protein acquisition strategies, including emperipolesis and entosis, though these remain topic areas for future study.

      The strengths of the paper include the focus on a fundamental biological question - the transcriptional/translational control mechanisms that support the rapid, dramatic cell state change that accompanies lymphocyte activation from the quiescent to activated state, the use of orthogonal approaches to validate the primary findings, and the creative proposal for how this state change is achieved.

      The weakness of the work is that several cellular regulatory processes that could explain the apparent paradox are not explored, though they are accessible for experimental analysis. In the accounting narrative that the authors highlight, a thorough accounting of the cellular process inventory that could support the cell state change should be further explored before committing to the proposal, provocative as it is, that protein acquisition provides a principal mechanism for supporting lymphocyte activation cell state change.

      Appraisal and Discussion:

      1) relating to the points raised above, two recent review articles explore this topic area and highlight important areas of study in RNA biology and translational control that likely contribute to the paradox noted by the authors: Choi et al. 2022, doi.org/10.4110/in.2022.22.e39 ("RNA metabolism in T lymphocytes") and Turner 2023, DOI: 10.1002/bies.202200236 ("Regulation and function of poised mRNAs in lymphocytes"). These should be cited, and the broader areas of RNA biology discussed by these authors integrated into the current manuscript.

      Good suggestion. We have added these references with a short discussion.

      2) The authors cite the Wolf et al. study from the Geiger lab (doi.org/10.1038/s41590-020-07145, ref. 41) though largely to compare determined values for ribosome number. Many other elements of the Wolf paper seem quite relevant, for example, the very high abundance of glycolytic enzymes (and whose mRNAs are quite abundant as well), where (and as others have reported) there is a dramatic activation of glycolytic flux upon T cell activation that is largely independent of transcription and translation, the evidence for "pre-existing, idle ribosomes", the changes in mRNA copy number and protein synthesis rate Spearman correlation that accompanies activation, and that the efficiencies of mRNA translation are heterogeneous. These data suggest that more accounting needs to be done to establish that there is a paradox.

      As one example, what if glycolytic enzyme protein levels in the resting cell are in substantial excess of what's needed to support glycolysis (likely true) and so translational upregulation can be directed to other mRNAs whose products are necessary for function of the activated cell? In this scenario, the dilution of glycolytic enzyme concentration that would come with cell division would not necessarily have a functional consequence. And the idle ribosomes could be recruited to key subsets of mRNAs (transcriptionally or post-transcriptionally upregulated) and with that a substantial remodeling of the proteome (authors ref. 44). The study of Ricciardi et al. 2018 (The translational machinery of human CD4+ T cells is poised for activation and controls the switch from quiescence to metabolic remodeling (doi.org/10.1016/j.cmet.2018.08.009) is consistent with this possibility. That study, and the short reviews noted above, are useful in highlighting the contributions of selective translational remodeling and the signaling pathways that contribute to the cell state change of T cell activation.

      Our study focuses on the central issue of whether measured ribosome translation rates support rapid division. The abundance of glycolytic enzymes, mRNA copy numbers etc., are clearly interesting and critical to cell metabolism, but are irrelevant to measuring the overall translation rate and capacity of T cells.

      From this perspective, an alternative view can be posited, where the quiescent state is biologically poised to support activation, where subsets of proteins and mRNAs are present in far higher levels than that necessary to support basal function of the quiescent lymphocyte. In such a model, the early stages of lymphocyte activation and cell division are supported by this surplus inventory, with transcriptional activation, including ribosomal genes, primarily contributing at later stages of the activation process. An obvious analogy is the developing Drosophila embryo where maternal inheritance supports early-stage development and zygotic transcriptional contributions subsequently assuming primary control (e.g. DOI 10.1002/1873-3468.13183 , DOI: 10.1126/science.abq4835). To pursue that biological logic would require quantifying individual mRNAs and their ribosome loading states, mRNA-specific elongation rates, existing individual protein levels, turnover rates of both mRNAs and proteins, ribosome levels, mean ribosome occupancy state, and how each of these parameters is altered in response to activation. Such accounting could go far to unveil the paradox. This is a considerable undertaking, though, and outside the scope of the current paper.

      The reviewer is essentially proposing RiboSeq analysis of pre- and post-activation T cells, whereby individual mRNAs can be queried for ribosome occupancy, and where translation inhibitors could be used to quantify mRNA-specific transit rates. This is important information but would not provide a more accurate accounting of protein synthesis rates than our much more direct measurement. We note that other labs have begun to work on this exact topic, however – see both PMID: 36002234 and PMID: 32330465.

      Reviewer #2 (Public Review):

      This paper takes a novel look at the protein economy of primary human and mouse T-cells - in both resting and activated state. Their findings in primary human T-cells are that:

      1) A large fraction of ribosomes are stalled in resting cultured primary human lymphocytes, and these stalled ribosomes are likely to be monosomes.

      2) Elongation occurs at similar rates for HeLa cells and lymphocytes, with the active ribosomes in resting lymphocytes translating at a similar rate as fully activated lymphocytes.

      They then turn their attention to mouse OT-1 lymphocytes, looking at translation rates both in vitro and in vivo. Day 1 resting T-cells also show stalling - which curiously wasn't seen on freshly purified cells - I didn't understand these differences.

      This is clarified and discussed starting in the third paragraph of “Protein synthesis in mouse lymphocytes ex vivo” section. Cells cultured ex vivo for 1 day with no activation show signs of stalling, as we observed in isolated human cells. But cells immediately out of an animal show a measurable decay rate since they are obviously synthesizing proteins in vivo and are processed rapidly.

      In vivo, they show that it is possible to monitor accurate translation and measure rates. Perhaps most interestingly they note a paradoxically high ratio of cellular protein to ribosomes insufficient to support their rapid in vivo division, suggesting that the activated lymphocyte proteome in vivo may be generated in an unusual manner.

      This was an interesting and provocative paper. Lots of interesting techniques and throwing down challenges to the community - it manages to address a number of important issues without necessarily providing answers.

      Reviewer #3 (Public Review):

      This manuscript provides a more or less quantitative analysis of protein synthesis in lymphocytes. I have no issue with the data as presented, as I'm sure all measurements have been expertly done. I see no need for additional experimental work, although it would be helpful if the authors could comment on the possibility of measuring the rate of synthesis of a defined protein, say a histone, in cells prior to and after activation. The conclusion the authors leave us with is the idea that the rates of protein synthesis recorded here are incompatible with observed rates of T cell division in vivo. Indeed, in the final paragraph of the discussion, the authors note the mismatch between what they consider a requirement for cell division, and the observed rates of protein synthesis. They then invoke unconventional mechanisms to make up for the shortfall, without -in this reviewer's opinion- discussing in adequate detail the technical limitations of the methodology used.

      Points #1-3 in the Discussion relate to potential pitfalls of our analyses; in point #3 we now add further limitations of RTA based on non-random detection of nascent chains due either to bias in either puromycylation or antibody detection of puromycylated nascent chains.

      A key question is the broad interest, novelty, and extension of current knowledge, in comparison with Argüello's (reference 27) 'SunRise' method. It would be helpful for the authors to stake out a clear position as to the similarities and differences with reference 27: what have we learned that is new? The authors could cite reference 27 in the introduction of their manuscript, given the similarity in approach. That said, the findings reported here will generate further discussion.

      We did cite this reference (27) in the section “Flow RPM measures ribosome elongation rate in live cells” giving credit where credit is due. We independently devised the method in 2014, and uniquely, to our knowledge, have applied it in vivo. We now further discuss the importance of our CHX modification to limit dissociation and increase the accuracy of RTA (second and third paragraphs of “Protein synthesis in mouse lymphocytes and innate immune cells in vivo”).

      The manuscript would increase in impact if the authors were to clearly define why a particular measurement is important and then show the actual experiment/result. As an example, it would be helpful to explain to the non-expert why the distinction between monosomes, polysomes, and stalled versions of the same is important, and then explain the rationale of the actual experiment: how can these distinctions be made with confidence, and what are confounding variables?

      We believe this is addressed in the section “Resting human lymphocytes have a dominant monosome population”.

      The initial use of human cells, later abandoned in favor of the OT-1 in vitro and in vivo models, requires contextualization. If the goal is to address the relationship between rates of translation and cell division of antigen-activated T cells in vivo, then a lot of the work on the human model and the in vitro experiments becomes more of a distraction, unless properly contextualized. Is there any reason to assume that antigen-specific activation in vivo will impact translation differently than the use of the PMA/ionomycin/IL2 cocktail? The way the work is presented leaves me with the impression that everything that was done is included, regardless of whether it goes to the core of the question(s) of interest.

      Donor PBMCs are clearly the more relevant model for understanding human T cell biology, which is why started our studies with this model. Had the manuscript strictly described mouse studies it is likely that we would be criticized for not studying human cells: Catch 22! However, as we state in the manuscript, the human cell model has a variety of technical downsides, including donor heterogeneity. PMA/ionomycin activation is also physiologically questionable, and while we could deliver a defined TCR to redirect their specificity, this is typically done after cells have been activated, since lentiviral delivery is poor in resting lymphocytes. A main point we try to make from this work is that cells derived from human blood donors show signs of ribosomal stalling by the time they are isolated and put into culture. This may limit the usefulness of studying them preactivation, although based on our mouse data, some level of stalled ribosomes may be a feature as well – to prime T cells to be ready for their massive expansion. The move to the OT-I system gave us complete control over the system, including in vivo delivery of translation inhibitors.

      It would be helpful if the authors made explicit some of the assumptions that underlie their quantitative comparisons. Likewise, the authors should discuss the limitations of their methods and provide alternative interpretations where possible, even if they consider them less/not plausible, with justification. As they themselves note, improvements in the RPM protocols raised the increase in translating ribosomes upon activation from 10-fold to 15-fold. Who's to say that is the best achievable result? What about the reliability/optimization of the other measurements?

      We expanded discussion of potential pitfalls of the RPM techniques and others in the Discussion section. Regarding RPM per se, we use it as a readout of ribosome time decay, so even if further optimizations can be made, the decay rates we have made should still be accurate. In addition, for our cell accounting measurements in Figure 6, we do not use RPM data and rather calculate based on the assumption that every ribosome is used for protein synthesis at a “maximal” rate of mRNA transit.

      The composition of the set of proteins produced upon activation will differ from cell to cell (CD4, CD8, B, resting vs. dividing). Even if analyses are performed on fixed cells, the ability of the monoclonal anti-puromycin antibody to penetrate the matrix of the various fixed cell types may not be equal for all of them, depending on protein composition, susceptibility to fixation etc. Is it possible for puromycin to occupy the ribosome's A site and terminate translation without forming a covalent bond with the nascent chain? This could affect the staining with anti-puromycin antibodies and also underestimate the number of nascent chains.

      Yes, the method (like every other one) is imperfect. Harringtonine run-off experiments show that RPM staining only detects nascent chains. Note that reference 47 reports that 75% of translation in activated T cells is devoted to synthesizing ~250 housekeeping proteins, which are likely to be highly similar between lymphocyte subsets.

      I believe that the concept of FACS-based quantitation also requires an explanation for the nonexpert. For the FACS plots shown, the differences between the highest and lowest RPM scores for cells that divided and that have a similar CFSE score is at least 10-fold. Does that mean that divided cells can differ by that margin in terms of the number of nascent chains present? If I make the assumption that cells stimulated with PMA/ionomycin/IL2 respond more or less synchronously, why would there be a 10-fold difference in absolute fluorescence intensity (anti=puromycin) for randomly chosen cells with similar CFSE values? While the use of MFI values is standard practice in cytofluorimetry, the authors should devote some comments to such variation at the population level.

      We believe that the referee is referring to Sup Fig. 1B. In this experiment the T cells are polyclonal and represent the full range of naïve to potentially exhausted differentiation states. Looking at our initial in vivo RPM study (reference 22) and comparing Figure 2 (OTI’s) to Figure 3 (endogenous CD4s or CD8s), reveals more spread in the RPM values polyclonal vs. monoclonal T cells - now clarified in the third paragraph of “Protein synthesis in mouse lymphocytes and innate immune cells in vivo”). Flow cytometry is by far the most accurate method for measuring fluorescence in individual cells. It is likely to be an accurate measure of the variation of nascent chains in cells in the same division cohort but likely represents the diversity of T cell activation profiles in blood of healthy donors.

      It is assumed that for cells to complete division, they must have produced a full and complete copy of their proteome and only then divide. What if cells can proceed to divide even when expressing a subset of the proteome of departure (=the threshold set required for initiation of division), only to complete synthesis of the 'missing ' portion once cell division is complete? Would this obviate the requirement for an unusual mechanism of protein acquisition (trogocytosis; other)?

      There must be a steady state level of translation and proteome replenishment, though. If a cell can divide when it affords daughter cells with 90% of its G0 proteome (as an example), that daughter cell would either 1) be 10% smaller, or 2) require extra translation to make up for the missing proteome during its own division cycle. Though T cells do typically shrink slightly after an initial activation, cell size stabilizes over time. Requiring each daughter cell to make more and more missing proteome could be plausible, considering that initial bursts of division do take longer over time, but still, even in vitro activated T cells divide rapidly for weeks without large decreases in their division rates.

      Translation is estimated to proceed at a rate of ~6 amino acids per second, but surely there is variability in this number attributable to inaccuracies of the methods used, in addition to biological variability. Were these so-called standard values determined for a range of different tissues? It stands to reason that there might be variation depending on the availability of initiation/elongation factors, NTPs, aminoacyl tRNAs etc. What is the margin of error in calculating chain elongation rates based on the results shown here?

      We refer to all relevant studies we know of, including new in vivo estimates of elongation rates (reference 40).

      Reviewer #1 (Recommendations For The Authors):

      A "limitations of study" section would be a helpful way to detail potential contributing mechanisms that were not explored in the current study.

      We have expanded the methodological limitations in the Discussion section.

      Major:

      1) Broaden the scope of biological models that could explain the paradox.

      In the Discussion, we suggest that T cells acquire some fraction of their proteome through external sources and highlight some examples of this occurring.

      Minor:

      1) Include Mr markers for Fig. 2C.

      Done.

      2) Though commonly used interchangeably, historically the term protein synthesis was the consequence of mRNA translation. In other words, proteins are not translated.

      Good point! We have changed the text accordingly.

      3) Include more meaningful X-axis legend in polysome gradient panels i.e., Fig. S2, e.g., fraction number.

      In most experiments, fractions were not collected. Rather, the x-axis refers to time that the sample took to be queried by the detector.

      4) Figure 3A does not report polysome profiles as described in the text, pg. 5, though this is reported in Fig S2D.

      The figure callouts were correct but confusing. We now separately refer to out each result to clarify.

      5) In Fig 5A, SDS-PAGE/anti-Puro blots would be more convincing and contain more information. The dot-blot is difficult to interpret.

      Disagree. To quantitate total anti-puromycin signal a dot blot is far better than immunoblotting, which is compromised by unequal transfer of different protein species.

      6) It's not clear why a degree of monosome translation is necessarily surprising (pg. 7).

      It’s surprising since for many decades it was believed that translation by monosomes is a tiny fraction of translation. But separately, with this particular mode of activation, activated T cells displayed a preponderance of monosomes during their burst of division. When the activation method was improved, polysomes dominated. But monosome translation clearly supported T cell division during activation without cognate peptide, which was interesting.

      Reviewer #2 (Recommendations For The Authors):

      1) One concern is the dose of puromycin used. My understanding is that puromycin acts as a chain termination inhibitor - but is being used here predominantly as a label for nascent polypeptide chains. My concern, therefore, is the dose being used - here at 50ug/ml - which seems high and I would be concerned that at this dose it would act as a translational inhibitor rather than just labelling nascent chains, and is therefore resulting in a lower signal/background ration than expected. In human cell lines 0.1ug/ml is optimal and doses published (in cell lines) range between 1 and 10ug/ml so it will be interesting to understand why this high dose was used.

      Do they have a dose-response curve - is this high dose necessary because these are primary Tcells. Can the authors show that 50 µg/mL of puromycin is optimal for studying protein translation in primary human T cells? A titration curve will help answer this question and could be included in Suppl Figure 1. This experiment is critical as the authors use a higher dose than previous studies (commonly between 1 and 10 µg/mL).

      The reviewer is referencing puromycin concentrations typically used in the selection of cells – for the RPM assay, puromycin is used at saturating doses to label the maximal number of nascent chains stalled by CHX or EME pretreatment.

      2) None of the figures show statistical significance.

      Statistics on relevant comparisons are now indicated on figures and in legends.

      3) The authors mention: "We performed RPM on cells labelled with CFSE to track cell division by dye dilution (Supplemental Figure 1B). On day 2, activated cells exhibited multiple populations, with nearly all divided cells showing a high RPM signal.". However, on day 2 it is hard to see any dividing cells in the dot plot included in the supplemental figure. Dividing cells only appear on day 5? Their statements make the subsequent paragraphs also difficult to follow.

      We modified the text to clarify this data – there is likely activation-induced cell death occurring which is why there are relatively few CFSE-low cells at this timepoint, and they do exhibit a fairly wide range of RPM staining. The main point is that by day 5, nearly all divided cells exhibit high RPM.

      4) "Many divided cells exhibited near baseline RPM signals, however, consistent with their return to the resting state. Interestingly, although non-activated cells did not divide, ~50% demonstrated increased RPM staining.". Again, it is hard to see the ~50% of cells with increased RPM the authors refer to in the provided supplemental figure.

      This is from quantification of the flow data and is described more fully later when we discuss ribosome stalling.

      5) The authors say "Thus, we cannot attribute the persistence of flow RPM staining in translation initiation inhibitor-treated cells to incomplete inhibition of protein synthesis.' - but it's unclear what this refers to as in the previous paragraph they also say: 'Initiation inhibitors, however, clearly discriminated between day 1 resting and activated cells. RPM signal was diminished by up to 8090% on day 5 post-activation.' - this is all somewhat confusing. It would be helpful to have this clarified and in the text to make more liberal use of referring to specific figures.

      Figure 1B shows that RPM is maintained at fairly high levels during treatment with EME or CHX (in contrast to the initiation inhibitors HAR/PA). To rule out that the drugs were simply not active, tritiated leucine labeling was conducted to confirm that incorporation of the radiolabeled amino acid dropped to near-baseline (Figure 1C). Therefore, we can conclude that the drugs are indeed working as intended, but EME/CHX does not decrease RPM signal to the same extent that they prevent leucine incorporation.

      6) Page 5 Fig 3A - I don't understand the difference between freshly isolated OT-1 cells - which don't stall and day 1 OT-1 cells which do. Why are freshly isolated cells not behaving like the naïve cells- isn't this what they would predict? Also - I accept that there is a move from monosome to polysome population between day 1 and 2 - the effect isn't huge - it would be helpful/interesting to know what has happened by day 5 - is the effect much more significant?

      Freshly isolated cells are harvested from animals and immediately queried, whereas day 1 cells are cultured for 24h in the absence of any activation. Presumably, the ex vivo culture without any activation causes the mouse T cell ribosomes to stall, just as we observed in cells obtained from human donors that took hours to collect and bring to the bench. The appearance of polysomes is really related to how the activation of the cells is done… refer to Figure 5B to see how significant the polysome buildup can be!

      7) Fig S3C - I don't understand how they reach the conclusion from this figure that: '~15-fold increase in translating ribosomes in activated OT-I T cells in vivo (Supplemental Figure 3C) as compared to the 10-fold increase we previously reported using the original protocol. It would very much help the reader if these calculations could be better explained.

      These are simply quantifications of the RPM staining done in Supplemental Figure 3C compared to experiments done in the absence of the CHX-modified method.

      8) Page 7 - They conclude that the Tan paper has superior lymphocyte activation - but presumably this depends on the signal as to whether there is more activation and how this affects the shift from monosome to polysome -ie maybe a stronger activation signal affects the distribution more - perhaps their method is the more physiological? Is their conclusion fair - that 'These findings indicate that monosomes make a major contribution to translation in resting T cells but are likely to make a minor contribution in fully activated cells.'

      Yes, we believe that their published method would be more physiological with the use of the natural OT-I peptide. We conclude that although monosome translation is present (as others have published), there are relatively few monosomes in fully activated T cells. Therefore, the monosome contribution to overall translation in activated T cells appears to be minor.

      9) Contrary to observations in vitro, ribosomes are not stalled in naïve mouse T cells in vivo, as we show via RTA analysis of non-activated T cells. - yes - this seems somewhat surprising - what is the explanation?

      We presume this is due to the stress/non-native environment that ex vivo cultured cells are subjected to.

      10) Whilst I understand the point that the authors are trying to make in Figure 1D about resting T cells having high background RPM staining due to stalled ribosomes, it is intriguing that there is almost no difference (no statistical significance provided) after 2 or 5 days of activation. Isn't this finding contrary to the one provided in Figure 1A and Suppl Figure 1B?

      Figure 1A is showing the difference between no activation and activation conditions. Figure 1D is predominantly meant to show that the increase in RPM from activated cells at day 1 and day 5 are not as different as one might predict. The reason, as we describe in further experiments, is likely that cells exhibiting ribosomal stalling can incorporate puromycin, damping the “fold change” we calculate (unlike what we observe in metabolic labeling experiments in the same figure panel). Statistics have now been displayed on the graphs in Figure 1D for further clarification.

      11) "Including EME with HAR prevented decay of the RPM signal, as predicted, since EME blocks elongation while enabling (even enhancing) puromycylation21,26." I find this very confusing. I understand that emetine blocks protein elongation whilst enabling puromycilation, but why does it block the effect of the protein initiation inhibitor Harringtonin? Do they compete with each other?

      When ribosomes are frozen with emetine, they cannot transit mRNA and “fall off”. Therefore, the inclusion of EME in these experiments is a control to ensure that we are looking at true transit and runoff of ribosomes with harringtonine treatment (explanation in the second paragraph of “Flow RPM measures ribosome elongation rates in live cells” section)

      12) Can the authors explain why the RPM signal of activated OT-I cells (PMA/Iono) increases 20fold compared to resting cells, but there is only a ~2-fold increase in signal in human cells? The authors previously mentioned: "We noted that the RPM signal in activated cells was only 2- to 5fold higher than in non-activated cells. This increase is modest compared to the ~15-fold activation-induced increase in protein synthesis in original studies 10,11. To examine this discrepancy, we incubated cells for 15 min with harringtonin (HAR) or pactamycin (PA) to block translation initiation or emetine (EME) or cycloheximide (CHX) to block elongation." Would the authors have followed the same path if they had started the paper with OT-I cells?

      Human cells are not as well activated as OT-I in our study. The last question is beyond the scope of our reasoning as empirical evidence-based scientists, but we have applied for funding from the HG Wells Foundation for a time machine to answer this question.

      13) Authors should include representative raw data of the flow cytometries used to perform the "Ribosome Transit Assay (RTA) in Figures 2 and 3 as supplemental data.

      Done; now included in Supplemental Figures 1 and 3.

      14) It would be interesting to compare RPM in T cells activated with a more physiological stimulus, such as beads anti-CD3 anti-CD28 vs PMA/Iono. Particularly after showing that peptide-specific stimulation (with SIINFEKL) is more effective than PMA/Iono in activating OT-I cells and inducing polysome formation (Figures 5B and Suppl Figure 4A).

      We tried plate bound anti- CD3 and anti-CD28 early in these studies, and they didn’t induce as much early activation.

      15) Can the authors include the gating strategy to call "activated OT-I cells" to the cells shown in Suppl Figure 3c?

      A new Supplemental Figure 3D has been added showing the exact gating strategy for the OT-I cell RTA assays described in Supplemental Figure 3C and elsewhere.

      16) In Figure 6B, the authors mention an increase in the volume of the cells based on the assumption of spherical morphology but then show an increase in diameter. It would be more consistent to show both parameters in the same graph.

      The graph was changed to volume calculations instead of diameter for clarity. But they are linked as volume scales by radius cubed.

      17) The paper's main conclusion (i.e., that the ratio of proteins to ribosomes in T cells activated in-vivo does not support their doubling time) is exciting. They conclude this after measuring cell volume, protein abundance, and ribosomes per cell. As no changes in cell volume and protein abundance between T cells activated in vitro vs in vivo were observed (Figures 6B and 6C), the difference is exclusively attributable to a reduced number of ribosomes per cell in T cells activated in vivo (Figure 6F). Critically, the measurement of ribosomes per cell in T cells activated in vivo (Figure 6F, "ex vivo day 2") includes only two data points. It is hard to understand how the authors calculated this figure's means and standard deviations as it is not described in the figure legend. From the dispersion observed for "day 1" and "day 2" in vitroactivated T cells, it seems that the variability of the assay to measure ribosome content could explain part of the phenotype. Additionally, there are several missing data points in Figure 6H. As this figure is just a transformation of Figures 6D and 6G, it isn't easy to understand why. Can I suggest that they include more data points for Figures 6F, G, and H in the ex vivo day 2' category as the two data points shown with little variability is out of keeping with the rest of the data, and may be skewing their data?

      Figure 6F does not have the same number of data points as other panels because it required measurement of both protein content and ribosome number. Since the ribosome quantification method described here was developed later than some of our earlier protein measurements, not all experiments had both sets of data to properly calculate the proteins per ribosome. All data that had both values are included, though.

      Reviewer #3 (Recommendations For The Authors):

      Minor points:

      If an increase in cell diameter is recorded upon activation, why not also provide the value for the increase in volume?

      Done

      Regarding the writing, the erratic punctuation/hyphenation - or lack thereof - doesn't improve readability. One example: "....consistent with the idea that the flow RPM signal in day 1 resting lymphocytes...." Perhaps better: "... consistent with the idea that the RPM signal, obtained by flow cytometry for lymphocytes analyzed on day 1 and maintained in the absence of any activating agent,..." I understand that this can make for longer sentences, but I object to the use of 'flow' as shorthand for 'flow cytometry', and to the use of day 1 as an adverb or adjective. That works as lab jargon, it's less effective in a written text. The abbreviation 'DRiPs' is not defined. Words like 'notably', and 'surprisingly' can be eliminated.

      This work would benefit from the inclusion of a section describing 'Limitations of the study'.

      This is now expanded in the Discussion, as described above.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D-mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice. The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies. However, the study presents a potential for further dissection of relevant pathophysiological parameters using additional techniques, to explain certain seemingly associative results, and allow for a more effective translation.

      Several results in the study suggest multiple factors and pathways influencing the phenotype seen, which remain unexplored. The inferences of this study also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. While this does not undermine the importance of this elegant study, it is essential to emphasize a holistic picture while interpreting the results.

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:

      Here the authors draw together evidence form. multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:

      Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, that makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In the revised manuscript, the authors have addressed a significant number of concerns raised. The restructuring and incorporation of a number of discussion points have improved the readability. Moreover, the authors have also incorporated some more figures to address certain questions raised.

      However, the authors could reconsider a few more points which would improve the readability of the manuscript.

      For e.g.

      1) While it is appreciated that the authors have provided the schematic of the study design for the VDAART trial, the visualization for the RNA-seq analysis may be helpful.

      We have created a visualization of the workflow for the RNA seq analysis as part of Figure 1 – figure supplement 1C.

      2) Quantification of images would not require any additional experiments, yet can reinforce the results with objectivity.

      We appreciate this comment. We chose to display histology images to allow a glimpse at the inflammatory condition in the lung tissue. For histological quantification, lung tissue should have been harvested and analyzed in a systematic and randomized way as well as in sufficient animal numbers to allow statistical analyses. This has not been done for these mouse models since the focus was in analyzing cytokine production by lung tissue CD4+ T cells as the driver of inflammation.

      3) The authors have not addressed the discrepancy of the sample sizes in the experiments. Some dot plots still don't match the legends, and there is a wide variation in the numbers chosen for different experiments and different groups in the same experiments.

      We appreciate the thorough screening of our manuscript and apologize for this oversight. We corrected the errors in the respective figure legends.

      The in vivo experiments comprise studies performed in (A) VDR-KO mice and (B) WT mice fed with vit-D supplemented chow.

      Sample size calculations for the mouse models of allergic airway inflammation based on BAL cell numbers revealed a minimum of n=8 per group for correct statistical analysis. In both experimental settings, the respective mouse lines were bred in the mouse facilities of MGH (A) and BWH (B). Depending on the litter sizes, additional mice were added in the HDM group, since bigger variability was expected in this group than the saline group.

      Intracellular CD4+ cytokine staining was performed for all mice, however some stainings failed and could not be reliably interpreted and were therefore excluded.

      Reviewer #2 (Recommendations For The Authors):

      The authors have largely replied to the reviewer comments, amended some noted typos & figure legend issues, as well as discussed the reviewers concerns in text and in their rebuttal.

      The data presented are novel and of significant interest, conceptually moving this field forward, but in this reviewer's opinion reflect one pathway, of likely several, linked to protective effects of vitamin D on airway disease. This reviewer recommends a further slight editing of the text to present this broader scenario.

      i) Treg cells are highly dependent on IL-2 (both Foxp3+ and IL-10+ cells, not always the same population), constitutively express the IL-2R, and there is already a significant literature regarding vitamin D and IL-10/Treg in control of immune-mediated conditions. A simple statement acknowledging this and that there are likely more than one mechanisms by which vitamin D may regulate allergic airway disease (directly or indirectly) would be appreciated - this is no way detracts from the novelty and contribution of the current findings.

      We thank the reviewer for this suggestion. We have added the following statement to the manuscript (lines 623-625):

      “Additional pathways, including the induction of IL-10 production by CD4+ T cells as well as a direct induction of Foxp3+ T reg cells could have further contributed to the observed protective effect of vitamin D supplementation (PMID: 21047796; 22529297).”

      ii) More comprehensive referencing of earlier papers proposing effects of vitamin D in controlling Treg/IL-10 and dampening Th2 responses in mouse (and human) models

      (e.g. Taher, Y. A., van Esch, B. C. A. M., Hofman, G. A., Henricks, P. A. J. & van Oosterhout, A. J. M. 1alpha,25-dihydroxyvitamin D3 potentiates the beneficial effects of allergen immunotherapy in a mouse model of allergic asthma: role for IL-10 and TGF-beta. J. Immunol. 180, 5211-21 (2008). Vassiliou JE et al, 2014. Vitamin D deficiency induces Th2 skewing and eosinophilia in neonatal allergic airways disease. Allergy DOI10.1111/all.12465).

      We have included the reference in the discussion section of our manuscript in lines 617-619:

      “Similar findings regarding the effects of vitamin D in controlling Treg/IL-10 and dampening Th2 responses have been reported, e.g., in (PMID: 18390702) and in offspring of mice that had been subjected to vitamin D deficiency in the third trimester of their pregnancy (PMID: 24943330).”

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defense related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants was also measured.

      The paper addresses some interesting points, however some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these. AU values do not allow for conclusions regarding total quantities, and while I understand that this is not the main focus of this paper, it raises a lot of uncertainty for readers (for example, the references cited show that TMTT has been found to accumulate at similar levels of caryophyllene, however the AU values reported are an order of magnitude higher for TMTT. Again, without actual quantification this is meaningless, but for readers it is confusing).

      With regards to the correlation analyses shown in figure 6, the results presented in many of the correlation plots are not actually informative. While there is a trend, I do not think that this is an appropriate way to show the data, as there are clearly other relationships at play. The comparison between plants under continuous light and normal light:dark conditions is interesting.

      This paper addresses a very interesting idea and I look forward to seeing further work that builds on these ideas.

      As mentioned in our previous response, we have added the quantification of GLVs in order to increase the comparability of our work to other studies.

      Regarding the comment about TMTT (only measured as internal pools), the purpose of the inclusion of these internal pool data, was simply to determine whether terpenes were accumulating in leaf tissue during the night when emissions are hindered (likely due to closed stomata). The data clearly show that internal terpene pools do not accumulate above daytime levels during darkness – this is further supported by gene expression data that show downregulation of terpene synthase genes during darkness. While quantification would certainly increase the ability to compare internal pools, it would not change the interpretation of our results. Also note that absolute quantification is challenging for compounds such as TMTT, which are not readily available.

      Regarding the comment on Figure 6, while we agree there may be interesting patterns beyond linear relationships, as stated in our previous response, the purpose of our analysis was to determine if the higher terpene burst in receiver plants on the second day may be explained by sender plants emitting more GLVs on the second day. Figure 6 shows that this is not the case. Further analyses would not provide additional significant insights into the hypothesis that we tested here.

      We thank the reviewer for their overall positive outlook on our paper and for the constructive comments.

      Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light. Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control was now performed for the revision in which the herbivory treatment was started in the evening hours and lights were left on. This experiment revealed that the burst of terpenoid emission indeed shifted somewhat. Circadiane and diurnal processes must thus interact.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after onset of the volatile exposure and thus parallel with the burst of terpene release.

      An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. All data are now well discussed.

      Overall, this study provides intriguing insights in the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

      We thank the reviewer for their positive outlook on our paper and for their constructive comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The authors did a great job with the revision. The additional experiments strengthened their conclusions. Thanks also for performing the suggested test for potential differences in induction capacity at different times of day, the new data are very interesting.

      Thank you very much.

      Line 49-52: The newly added sentence could be clarified in wording.

      We will clarify the sentence.

      Line 254-255: The newly added sentence needs to be corrected. This is no full sentence and it is not clear what the authors wanted to say here.

      We will clarify this sentence.

      Figure 6: In those instances, in which the correlation is not significant, the line should not be shown.

      We will remove the lines when correlations are not significant.

      The names of chemical compounds and terpene synthases should be written in lower case letters (see legend Fig 6, e.g. hexenal, not Hexenal; legend fig. 2: terpene synthase, not Terpene synthase)

      In the last round of revisions, I commented on Line 23: consequences on community dynamics are not investigated here, so this is a bit misleading. ... Your response was "We have deleted the sentence about community dynamics ..." which, however, in fact was not done! Please change!

      Apologies for that, we will delete mention of community dynamics in that sentence (for real).


      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study examines the effects of herbivory-induced maize volatiles on neighboring plants and their responses over time. Measurements of volatile compound classes and gene expression in receiver plants exposed to these volatiles led to the conclusion that the delayed emission of certain terpenes in receiver plants after the onset of light may be a result of stress memory, highlighting the role of priming and induction in plant defenses triggered by herbivore-induced plant volatiles (HIPVs). Most experimental data are compelling but additional experiments and accurate quantifications of the compounds would be required to confirm some of the main claims.

      Response: We thank the editors for their overall positive feedback on our MS. We have added additional experiments to quantify green leaf volatile emissions in both sender plants and synthetic dispensers (Reviewer 1) and address the importance of the precise time of day plants are induced (Reviewer 2). These additions strengthen the main conclusions of our study.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory-induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defence-related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants were also measured.

      The paper addresses some interesting points, however, some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time-consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. As experiments with VOC dispensers were also used in this experiment, I find it even more baffling that the authors didn't confirm the concentration of the emission from the plants they used to make sure they matched. The references cited justifying the concentration used (saying it was within the range of GLVs emitted by their plants) to prepare the dispenser were for either a different variety of maize (delprim versus B73) or arabidopsis. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these.

      Response: We thank the reviewer for their comment. We have now quantified both the emission of dispensers and maize seedlings infested with 3 4th-instar Spodoptera exigua larvae. Averaged across 1 h, HAC dispensers emitted roughly 2x higher molar concentrations than total GLV molar concentrations emitted by plants infested by 3 caterpillars. Of note, GLV emissions induced by caterpillars vary over time, and can be more than 2-fold higher than the average during times of strong active feeding (Supplemental Fig 4). Thus, the release rate of the dispensers is well within the plant’s physiological range.

      Note that the references cited were included to support the claim of the biological activity of all three GLVs rather than to justify concentration of our dispensers. We have rephrased this sentence to reflect this (see L330-333).

      With regards to the correlation analyses shown in Figure 6, the results presented in many of the correlation plots are not actually informative. By blindly reporting the correlation coefficient important trends are being ignored, as there are clearly either bimodal relationships (e.g. upper left panel, HAC/TMTT, HAC/MNT) or even stranger relationships (e.g. upper left panel, IND/SQT, IND/MNT) that are not being well explained by a correlation plot. It is not appropriate to discuss the correlation factors presented here and to draw such strong conclusions on emission kinetics. The comparison between plants under continuous light and normal light:dark conditions is interesting, but I think there are better ways to examine these relationships, for example, multivariate analysis might reveal some patterns.

      Response: We thank the reviewer for their comment. With our analysis we aimed at testing specifically whether the high release of known bioactive volatiles (GLVs and indole) by sender plants on the second day can explain the higher terpene emissions in the receiver plants. We explicitly mention this in the text (L176-L186). Indeed, under normal light conditions (light and dark phase), there are clear positive correlations between the GLV release of sender plants and the terpene release of receiver plants over time (see also Fig 1 and Fig 5). However, under continuous light conditions, GLV emissions in sender plants no longer correlate with terpene emissions in receiver plants (also apparent by comparison of Fig 4 and Fig 5). This shows that temporal variation in GLV emissions are insufficient to explain the delayed terpene burst. This is the relevant conclusion we draw from this analysis. As presented, we find the data to provide strong evidence that the delayed burst in receiver plant terpene emissions cannot be solely explained by higher availability of active signals on the second day. The priming experiment in Figure 7 then provides a direct additional test for this concept. While more complex analyses could indeed reveal additional patterns, these would not be particularly informative for the question at hand.

      In Figure 2, the elevated concentrations of beta-caryophyllene found in the control plants at 8h and 16.75h measurement timepoints are curious. Is this something that is commonly seen in B73?

      Response: We thank the reviewer for this comment. A small number of untreated plants indeed accumulated β -caryophyllene at night, which is likely the result of biological variability between samples. Our plants were soil-grown, and it is for instance possible that variation in soil biota may account for this variability. Alternatively, some plants may have been slightly stressed during handling. Note that this variability does not affect any of the conclusions in our manuscript.

      While there can be discrepancies between emissions and compounds actually present within leaf tissue, it is a little bit odd that such high levels of b-caryophyllene were found at these timepoints, however, this is not reflected in the PTR-ToF-MS measurements of sesquiterpenes. It would be beneficial to include an overview of the mechanism of production and storage of sesquiterpenes in maize leaves, which would clarify why high amounts were found only in the GC-MS analysis and not the PTR-ToF-MS analysis, which is a more sensitive analytical tool. It is possible that the amounts of b-caryophyllene present in the leaf are actually extremely low, however as the values are not given as a concentration but rather arbitrary units, it is not possible to tell. I would include a line explaining what is seen with b-caryophyllene.

      Response: Thank you for this comment. It is important to note that accumulation in maize leaves can differ substantially from emission, especially at night when stomata are closed. This has been observed before in maize leaves (Seidl-Adams et al., 2015). As the reviewer suspects, earlier work indeed found that β-caryophyllene is a minor sesquiterpene compared to β-farnesene and α-bergamotene in B73 ( Block et al., 2018). The PTR-ToF-MS does not discriminate between terpenes with the same m/z and thus measures total sesquiterpene emissions. Given that sesquiterpene emissions are strongly regulated by stomatal aperture and that overall sesquiterpene accumulation in control plants is low, it is not surprising that we measure only minor amounts of sesquiterpene emissions in general, and in control plants in particular. We now text to the manuscript to explain these aspects (L116-L122). Block, A.K., Hunter, C.T., Rering, C. et al. Contrasting insect attraction and herbivore-induced plant volatile production in maize. Planta 248, 105–116 (2018).

      Seidl-Adams I, Richter A, Boomer KB, Yoshinaga N, Degenhardt J, Tumlinson JH. Emission of herbivore elicitor-induced sesquiterpenes is regulated by stomatal aperture in maize (Zea mays) seedlings. Plant Cell Environ. 38, 23-34 (2015).

      Additionally, it seems like the amounts of TMTT within the leaf are extraordinarily high (judging only by the au values given for scale), far higher than one would expect from maize.

      Response: We are unsure about the reviewer’s interpretation here. The AU values do not allow for conclusions regarding total quantities. An earlier study found that TMTT in induced B73 plants accumulates to similar amounts as β-caryophyllene (Block et al., 2018), thus it is not surprising to detect significant TMTT pools in induced maize leaves. It is important to note that the aim of the experiment here was to test the hypothesis that plants may be hyperaccumulating volatiles when the stomata are closed at night, which could potentially explain the delayed terpene burst on the second day. We do not observe such a hyperaccumulation, thus ruling out this as the primary factor responsible for the observed phenomenon. This is further supported by the continuous light experiments, where the delayed burst in terpene emission is not hindered by the lack of a dark phase.

      Block, A.K., Hunter, C.T., Rering, C. et al. Contrasting insect attraction and herbivore-induced plant volatile production in maize. Planta 248, 105–116 (2018).

      Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence of whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. A potential caveat exists with respect to the exact timing and the day-night cycle. The timing may be critical, i.e. at which time-point after onset of light herbivores were placed on the plants and how long the terpene emission lasted before the light was turned off. If the rhythm or a potential internal clock matters, then this information should also be highly relevant. Moreover, light on/off is a rather arbitrary treatment that is practical for experiments in the laboratory but which is not a very realistic setting. Particularly with regard to terpene emission, the sudden turning on of light instead of a smooth and continuous change to lighter conditions may trigger emission responses that are not found in nature.

      Response: We thank the reviewer for their comment. Although not explicitly mentioned it in the initial draft of the MS, we employed 15 min transition periods for light and dark phase transitions with a light intensity of 60 µmol m-2 s-1 (compared to 300 µmol m-2 s-1 at full light) to achieve a more gradual transition. We now included this information in the manuscript (L291-L292).

      As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light (just for the experiment or continuously?). Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Response: This is a very interesting point raised by the reviewer. We now conducted an additional experiment under continuous light where we started the herbivory treatment just before the start of the dark phase (ca. 20:00 PM). We found a similar pattern: a distinct delay in the highest burst. However, interestingly, the burst was shifted from 12-18 hr to 10-12 hr (Supplemental Fig 1). This burst aligned reasonably well with the point at which lights would normally be turned on again. In light of this, and, as the herbivore additions typically started ca. 5 hrs after the onset of light following a dark period (Figures 1-7), we wanted to rule out the possibility that the lack of a burst on the first day, was simply due to a difference in induction capacity depending on how shortly after the onset of light plants became exposed to GLVs. As such, we designed an additional experiment to examine whether exposure to GLVs immediately after the lights come on induce higher terpene emissions than plants exposed to GLVs ca. 5 hr after lights come on (Supplemental Fig 2). Interestingly, emissions across the terpenes were similar, regardless how long after the onset of lights on plants were exposed to GLVs. This suggests that the delayed burst is not due to the fact that, on the second day, plants are exposed to GLVs immediately after the lights come on whereas the first day they are only exposed 5 hr after the lights come on. Both continuous light experiments (normal timing and shifted timing) show bursts that occur slightly earlier than we observe with under normal day : night light conditions (L159-L166 and L207-L211), suggesting an interaction between circadian and diurnal processes. For instance, it is possible that plants would start producing volatiles slightly earlier than the onset of the day, however, light and stomatal opening limits the exact timing of the burst under normal light:dark transitions. The additional data provide further evidence for the delayed burst as a timed response in maize plants.

      Additionally, we have added explanation the continuous light figure legends that plants were grown under normal conditions and lights were only left on following treatment.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after the onset of the volatile exposure and thus parallel with the burst of terpene release. An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. Some data are not discussed, e.g. the jasmonate and gene induction pattern.

      Response: Thanks for this comment. We have added a sentence regarding the jasmonate data suggesting that, in addition to providing an additional layer of evidence for the observed delay, suggest that other JA-dependent defenses in maize may follow similar temporal patterns (L254-L257).

      Overall, this study provides intriguing insights into the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

      Reviewer #1 (Recommendations For The Authors):

      Could the authors please explain why they chose not to calculate concentrations for VOCs? Perhaps it is that B73 is a very unique variety in that it contains very high levels of TMTT, even in control plants? This should be clarified by the authors.

      Response: We address this comment in the public review portion

      For the legend within Figure 2, I would move it to be in the upper left or right corners of the figure. It is not easy to see in its current position.

      Response: We have moved the figure legend based on the reviewers recommendation

      Figures depicting PTR-ToF-MS data: add m/z values to either the figures themselves and/or the legends.

      Response: We have added m/z values to the legends and added molecular formulas of protonated compounds to each panel.

      Overall, here are some other suggestions: I am slightly weary of the term "clocked response". I'm not sure this is the correct fit for what you are trying to convey. I think "regulated" is a better term than "clocked". I understand that it is likely a stylistic choice to use this word, however, I advise reconsidering for the sake of clarity of the results.

      Response: Thank you. We find clocked to be an appropriate term, as it highlights the temporal aspect of the burst, and have thus left the title as is.

      Have another look at the references as some are not in the correct format (i.e., species not in italics).

      Response: We have checked and corrected the references

      Reviewer #2 (Recommendations For The Authors):

      Line 23: consequences on community dynamics are not investigated here, so this is a bit misleading.

      Last sentence of the abstract: It would be nice to read the answer to this long-standing question here.

      Response: We have deleted he sentence about community dynamics and provided a more concrete final sentence (L38-L40)

      Lines 48-50: The example does not fit so well with the first sentence and is not entirely clear (relation to temporal dynamics; similar to what?).

      Response: We have reworded the sentence for clarity (L49-L52)

      Line 56: "volatiles" should be plural.

      Response: Changed (L58)

      Line 58: "to be produced" rather than "to produce"

      Response: This seems a stylistic choice, and have left it as is.

      End of abstract: Did you have any hypotheses? These should be stated here.

      Response: The listing of hypotheses is also a stylistic choice, which is in some cases required by journals, but not eLife. As such we have not included a discrete list of hypotheses and instead describe what we aimed to investigate and what we found.

      Line 93: "This response disappeared at night." Does this mean: "No volatiles were emitted during night"? Or was this a gradual disappearance? How many hours after the onset of light did the herbivore treatment start and how many hours after the first emission of volatiles was the light turned off?

      Response: We have added when herbivory began (L92-L93) and changed the text to ‘as soon as light was restored’ (L97-L98).

      Line 93: "as soon as the night was over" means practically rather "as soon as the light was switched on".

      Response: See above

      Line 91: "small induction" - do you mean "low amounts of xxx"?

      Response: We mean a small induction. Terpene emission is relatively low (hence small), but still induced relative controls.

      Line 91: which mono- and sesquiterpenes were monitored?

      Response: It is PTR-ToF-MS a thus we cannot identify individual sesquiterpenes and monoterpenes (as they all have the same mass), and thus group them generally.

      Figure 1: What exactly is the "control"? And what does the vertical hatched line in the beginning represent?

      Response: We have defined the control and added a sentence describing the vertical hatched line

      "Black points represent the same but with undamaged sender plants" - what is "the same" here? I find that a bit confusing!

      Response: We have rephrased

      Line 104: how do you define an "overaccumulation"?

      Response: We have added ‘above daytime levels’ to clarify that we mean over daytime levels (L106)

      Why was the oldest developing leaf chosen? Is this the largest one when plants are two weeks old? How many leaves do they have then? Is this the leaf with the highest biomass?

      Response: We chose this leaf as it is the largest and also highly responsive to HIPVs. We have added this sentence (with a reference) in the methods section (L369-L370)

      Line 107: "started increasing after 3 hours" - they may already have started before. The following description also sounds like the dynamics were investigated here. However, instead the authors measured samples at four distinct time-points and cannot say whether something "began" or "remained" etc. The wording should be changed to a more appropriate description, describing the differences at a given time-point.

      Response: We changed the wording to ‘were marginally induced after 3 hr’ see L110

      Line 113: What do you mean by "delete BELOW NIGHTTIME levels"?

      Response: The word we used was ‘deplete’ to ‘drop’ (L116)

      Line 114: "the expression of terpene synthases" add "in the receiver plants exposed to HIPVs."

      Response: Added

      Figure 2ff: The situation of receiver plants exposed to control plant volatiles is not explained in the method section and also not depicted in the Suppl. Fig. 1. Here, the sender plants seem to always have been induced (if the red star-like structure should resemble an induction - a legend may be helpful here).

      Response: We have changed to ‘connected to undamaged sender plants’. We additionally added a sentence to the methods section describing controls L300

      Line 140: This treatment is not described in the methods section. Were the plants only kept under constant conditions for the 2 experimental days? Compared to the induction shown in Fig. 1, the amount of released volatiles seems less here.

      Response: We have added explanation of this to the figure legends, explaining that plants were grown under normal conditions and lights were only left on following treatment

      Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Response: See public review comment. We have added this experiment and discuss it accordingly in the MS (L159-L166 and L207-L211)

      Line 157: Check sentence/grammar

      Response: Checked and modified

      Figure 5: I suggest using a different colour for volatiles released from the sender plants, not again the green also used in the other figures for the receiver plants. This would help the reader to quickly see which plants are in focus in each figure.

      Response: We have changed the color of the figures for clarity

      Figure 6 legend: check grammar in several sentences (use of singular vs. plural)

      Response: We have made the tense uniform

      The diurnal rhythm of jasmonates (and potentially also terpene synthases?) is not considered in the discussion.

      Response: See above, and we have added a sentence to the discussion mentioning the jasmonates (L254-L257)

      Line 230-231: check grammar. Given the complexity, the response pattern may not be so predictable.

      Response: We do not understand this comment, but have checked the grammar throughout the manuscript.

      Line 235: I like the discussion on potential ecological consequences.

      While some interpretation for each experiment is already given in the results section, not all results are discussed in the discussion section. For example, the jasmonate data are not discussed. This should be added.

      Response: See above

      Line 266: To get an idea about the plant size: How many leaves do the plants have in that stage?

      Response: Added a sentence describing the size L287-L288

      Line 321: change to "as in the greenhouse"

      Response: Changed

      Line 334: How were the terpenoids identified and, in particular, quantified?

      Response: Added (L379-L380)

      Line 354: Maybe rather change to: "Plant treatments and tissue collection for phytohormone sampling were identical as described above for terpene and gene expression analysis.

      Response: Changed

      Line 357: add "material" or "leaf tissue" after "flash frozen"

      Response: Added

      Line 359: What was the source of the isotopically labelled phytohormones?

      Response: Added (L400-L403)

      Line 360: The phytohormones are "analyzed" using UPLC. The "quantification" is then done afterward. Please correct.

      Response: Corrected (L404)

      Overall: a great approach and a wonderful idea!

      Thanks

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Strengths:

      The major strength of this paper is the series of laser cutting experiments supporting that asters position via pushing forces acting both on the boundary (see below for a relevant comment) and between asters. The combination of imaging, data analysis and mathematical modeling is also powerful.

      Author Response: We thank the Reviewer for the positive comments, especially in recognising the power of our quantitative approaches.

      Weaknesses:

      This paper has weaknesses, mainly in the presentation but also in the quality of the data which do not always support the conclusions satisfactorily (this might in part be a presentation issue).

      Author Response>: We address these concerns below.

      My overall suggestion for the authors is to explain better the motivation and interpretation of their experiments and also to remove some of the observations which seem to be there because they could be done rather than because they add to the main message of the paper, which I find straightforward, valuable and supported by the data in Figure 4.

      Author Response: We have extended the motivation of the study in the Introduction, and at the beginning of appropriate Results sections. We better motivate the force potential and especially the key results from Figure 4. We outline specific changes below.

      In Figure 2, it is difficult for me to understand what is being tracked. I believe that the authors track the yolk granules (visible as large green blobs) and not lipid droplets. There is some confusion between the text, legends and methods so I could not tell. If the authors are tracking yolk granules as a proxy for hydrodynamics flows it seems appropriate to cite previous papers that have used and verified these methods. More notably, this figure is somewhat disconnected with the rest of the paper. I find the analysis interesting in principle but would urge the authors to propose some interpretation of the experiments in the context of their big-picture message. At this point, I cannot understand what the Figure adds.

      Author Response: Indeed, we track the yolk droplets that move around the aster. In the extraction protocol, we likely get a mixture of lipid droplets and yolk granules; this is due to the extraction procedure involving shear forces within the pipette. We are not certain about the exact nature of these droplets, but they are likely to a large extent yolk. We have clarified the terminology in the text, the legend and methods section. In this figure, we now show that the droplets do not move towards the aster center as the hydrodynamic pulling model would suggest. Instead, they appear to passively respond to a repulsive force, that results in them streaming around the aster. We have added additional panels to the figure that illustrates the directionality of yolk granule movements (lines 159-164). We agree with the Reviewer that the context could have been clarified. The role of fluid flows in biological systems is, as the Reviewer highlights, well studied. We have added additional contextualisa8on in the text (lines 140-146). We also motivate more clearly the figure, as it provides evidence that the asters generate forces over 20µm scale (lines 159-164). This is highly relevant for one of the paper’s main conclusions – that the Drosophila blastocyst asters generate pushing forces that enable regular packing.

      In Figure 3, it is not surprising that the aster-aster interactions are different from interactions with the boundary which is likely more rigid. It is also hard to understand why the force and thus velocity should scale as microtubule length. This Figure should be better conceptualized. I think that it becomes clear at the end of the paper that the authors are trying to derive an effective potential to use in a mathematical model in Figure 5 to test their hypotheses. I think that should be told from the start, so a reader understands why these experiments are being shown.

      Author Response: We don’t claim that the force scales with microtubule length on a single microtubule. However, at larger distances from the aster, the microtubule density decreases, and hence the effective force decreases.

      The Reviewer is correct that we use these results to motivate our effective potential. We have brought this motivation forward in the manuscript to guide the reader (lines 169-171) and included a further note at the end of the section (lines 216-218).

      The experiments in Figure 4 are very nice in suppor8ng a pushing model. However, it would help if the authors could speculate what the single aster is pushing against in this experiment. The experiments reported in Figure 1 seemed to suggest that the aster mainly pushed against the boundary. In the experiments in Figure 4 do the individual asters touch the boundary on both sides? I think that readers need more information on what the extract looks like for those experiments.

      Author Response: We now include an additional panel B in Figure 4– that shows an example of an explant during aster ablation. The distance between asters is typically less than the distance to the explant boundary. Boundary effects likely play a small role in the aster-aster separation, in terms of potentially determining the axis of separation. However, the separation of asters occurs along a straight line for a substan8al period (>1 min) of separation; if boundary effects were more dominant, we may expect to see curving of the aster-aster separation trajectories as they also receive feedback from the boundary.

      Figure 4F could use some statistics. I doubt that the acceleration in the pink curves would be significant. I believe that the decelera8on is and that is probably the most crucial result. Since the authors present only 3 asters pairs it is important to be sure that these conclusions are solid.

      Author Response: We agree with the Reviewer. These experiments are challenging to do, as they require carefully controlled conditions. In two out of three experiments we see significant increase in acceleration in the pink curves. Of course, the interpretation of this must be caveated as our experimental number is low. These details are now provided in the revision (lines 263267).

      Reviewer 2

      Strengths:

      This study reveals a unique aster positioning mechanics in the syncytial embryo explant, which leads to an understanding of the mechanism underlying the positioning of multiple asters associated with nuclei in the embryo. The use of explants enabled accurate measurement of aster motility and, therefore, the construc8on of a quantitative model. This is a notable achievement.

      Author Response: We thank the Reviewer for their review, and in highlighting how our quantitative model is a clear step forward in our understanding of aster dynamics.

      Weaknesses:

      The main conclusion that aster repulsion predominates in this system has already been drawn by the same authors in their recent study (de-Carvalho et al., Development, 2022). As the present work provides additional support to the previous study using different experimental system, the authors should emphasize that the present manuscripts adds to it (but the conceptual novelty is limited).

      Author Response: While this study is related to the previous work, there are major differences. First, here we quantitatively assess aster dynamics within a “clean” system. Such accurate measurements are not possible in vivo currently. Further, experiments like laser ablation are much better defined within the explant system. We do recognise more clearly the previous work in the Introduc8on and lines 291-293, 299-300. Combined, with the different perspectives provided in these papers on the problem of aster positioning in syncytia, we believe these papers provide new and well-supported insights.

      The molecular mechanisms underlying aster repulsion remain unexplored since the authors were unable to identify specific factor(s) responsible for aster repulsion in the explant.

      Author Response: Given that the nature of the aster dynamics were not previously characterised, our work presents a major step forward. We show compelling evidence that an effective pushing force potential plays a role in aster interactions. With this critical knowledge, we can now explore for the potential molecular mechanisms – but such information lies beyond the current manuscript scope. This is particularly challenging due to the lack of specific microtubule drug inhibitors in Drosophila. We highlight related issues in the Discussion: paragraph starting on line 340 and lines 367-370.

      Specific suggestions:

      Microtubules should be visualized more clearly (either in live or fixed samples). This is particularly important in Figure 4E and Video 4 (laser ablation experiment to create asymmetric asters).

      Author Response: This is similar to Reviewer 1 final comment above. These experiments are very challenging and being able to see the microtubules with sufficient clarity is not straightforward. Given our controls and previous experience, we are confident we are ablating the microtubules.

      Minor points:

      1) The authors explain the roles of microtubule asters in several model systems in the first paragraph of the introduction part. Please specify the species and/or cell types in each description.

      Author Response: We have provided as suggested.

      2) In lines 164 and 172, the citing figure numbers should be modified to Supplementary Fig. 1A and 1B, respectively.

      Author Response: We thank the Reviewer for spotting this error. It has now been corrected.

      3) The authors showed in the previous study that the boundary in the explant does not have an intact cell cortex and f-actin compartments (de-Carvalho et al., Development, 2022). This important informa8on should also be described in the current manuscript. It is also valuable to mention whether the pulling force mechanism operates in embryos where the intact cell cortex is present.

      Author Response: This is an interesting point We have added a sentence in the discussion with this information. We have now added additional text in the Discussion (lines 324-327).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      It is somewhat speculative that the structure represents the EIIa-bound regulatory state. There's a strong enough case that it should be analyzed in the discussion, but I don't think it is firmly established. Therefore, the title of the paper should be changed.

      Our answer: Thank you for the comment. We have changed the title to “Mobile barrier mechanisms for Na+-coupled symport in an MFS sugar transporter”

      Reading through the manuscript, it was challenging to distinguish what is new in the current manuscript and what has been done previously. There were a lot of parts where it was hard for me to identify the main point of the current study among all the details of previous studies. It would also benefit from shortening. For example:

      -Page 6: Nb725 binding has already been characterized extensively in the very nice JBC paper earlier this year. It's important to test 725-4 for binding, but since it doesn't change the binding interaction, and probably wouldn't be expected to, the entire section could be written more succinctly. The main point, which is that 725-4 behaves like 725, is lost among all the details

      Our answer: Thanks for this instructive suggestion. We have shortened the description in this section.

      -Page 9-10. I don't understand what summarizing all of the results from the previous D59C studies adds to the current story. It's important because it provides an indication of the substrate binding site, but its mechanism of action does not seem relevant to the current work.

      Our answer: We have shortened the description of the sugar-binding site and moved the previous Fig. 3b to supplementary figure sFig. 11. According to your comment about showing the location of the binding sites, which is also suggested by Reviewer #2, we modified Fig. 3 and added two panels to map the location of the bound Na+ in the inward-facing structure and the bound sugar in the outward-facing structure.

      The sugar-binding site identified in the published structure is critical to construct the mobile barrier mechanism. The sugar-binding residues identified in the published structure provided essential data to support the conclusion that the sugar-binding pocket is broken in the inward-facing structure. Thus, this published structure is mechanistically relevant to the current study.

      -Page 12. Too much summary of the previous outward structure. Since this is already part of the literature, it would be more efficient to reference the previous data when it is important to interpret the new data (or show as a figure).

      Our answer: The introduction of the previous sugar-binding sit is important for the detailed comparison between the two states as discussed above, but we agree with this reviewer and have significantly shortened the paragraph by moving the detailed description into the legend to the sFig. 11.

      -Instead of providing the PDB ID in figures of the current structure, just say "current work" or similar. Then it is obvious you are not citing a previous structure.

      Our answer: To distinguish clearly the new data and published results, the citation of the cryoEM structure [PDP ID 8T60] has been completely removed from the main text but kept in sTable 1.

      -An entire panel of Figure 3 is dedicated to ligand binding in a previous outward-facing structure.

      Showing it in the overlay would be sufficient.

      Our answer: It is the first time for us to show a structure with a bound-Na+. Fig. 3 also illustrates the spatial relationship between the sugar-binding pocket and the cation-binding pocket since both binding sites are determined now. As stated above, according to two reviewers’ comments, we have modified the Figures and the Fig. 3d is the overlay.

      Please increase the size of the font in all figures. It should be 6-8 point when printed on a standard sheet of paper. Labels in Figure 3, distances in Figure 4, and everything in Figure 5 is hard to see.

      Our answer: Thank you for the comments and the enlargement of the figure size and label font in all figures have been made.

      Figure 2: would be helpful to show Figure S8 in the main text, orienting the reader to the approximate location of substrate binding. What is known about the EIIA-Glc binding interface? Has anyone probed this by mutagenesis? Where are these residues on the overall structure, and are they somewhere other than the nanobody interface?

      Our answer: Thank you for this comment. We have added a panel for orienting the readers about the substrate location in MelB in Figure 3c. The sFig. 8 actually focuses on the details of Nb interactions with MelB. Our current data strongly supported the notion that the Nb-bound MelBSt structure mimics the EIIAGlc-bound MelB but is not structurally resolved, so we have tuned down our statement on EIIAGlc. There is one study suggesting the C-terminal tail helix may be involved in the EIIAGlc binding, which has been added to the discussion.

      Can Figure 5 be split into 2 figures and simplified?

      Our answer: thanks for the suggestion. We have split it into Figs. 5b and 6 and also moved the peptide mapping to the Fig 5a.

      What is the difference between cartoon and ribbon rendering?

      Our answer: Ribbon: illustrating the structure; cartoon: highlighting the positions with statistically significant protection or deprotection. The statistically significant changes are implied by the ribbon representation; Sphere: not covered by labeled peptides.

      Can the panels showing the kinetic data be enlarged? I don't think they need to surround the molecule. An array underneath would be fine.

      Our answer: We have enlarged all figures and labels. The placement of selected plots around the model could clearly show the difference in deuterium uptake rates between the transmembrane domain and extra-membrane regions. We will maintain this arrangement.

      Do colors in panel A correspond with colors in panel B?

      Our answer: The color usage in both are different. Now the two panels have been separated.

      Do I understand correctly that in the HDX experiments, negative values indicate positions that exchange more quickly in the nanobody-free protein relative to the nanobody-bound protein?

      Our answer: Your understanding is correct.

      I assume some of this is due to the protein changing conformation, but some of it might be due to burial at the nanobody-binding interface. Can those peptides be indicated?

      Our answer: Thank you for this comment. We have marked the peptide carrying the Nb-binding residues on uptake plots in Figs.6 and Extended Fig. 1. There are only three Nb-binding residues covered by many overlapping peptides. Most are not covered, either not carried by the labeled peptides (Tyr205, Ser206, and Ser207) or with insignificant changes (Pro132 and Thr133), except for Asp137, Lys138, and Arg141 which are presented in 8 labeled peptides.

      Few buried positions in the outward-facing state are expected to be solvent in the inward-facing state; unfortunately, inward-facing state they are buried by Nb binding.

      Make figure legends easier to interpret by removing non-essential methods details (like buffer conditions).

      Our answer: We removed the detailed method descriptions in most figure legends. Thank you.

      Check throughout for typos.

      ie page 9 Lue Leu

      Page 9 like likely

      Our answer: We have corrected them. Thank you!

      Reviewer #2 (Recommendations For The Authors):

      I have mostly minor questions/remarks.

      • Why not do the hdx-ms experiments in the presence of sugar? That would give a proper distinction between two conformational states, instead of an ensemble of states vs one state.

      Our answer: MelB conformation induced by sugar is also multiple states, and likely most are outward-facing states and occluded intermediate states. This is also supported by the new finding of an inward state with low sugar affinity. The ideal design should be one inward and one outward to understand the inward-outward transition. We have not identified an outward-facing mutant while we can obtain the inward by the Nb. WT MelBSt with bound Na+ favors the outward-facing state. Although our design is not ideal, we do have one state vs a predominant outward-facing WT with bound Na+.

      Minor comments:

      • Fig 5 is misleading as the peptide number does not match with the amino acid sequence. I would suggest putting a heat map with coverage on top. Or showing deuterium uptake per peptide. See examples below.

      Our answer: The peptide number should not match with sequence number. We have 155 overlapping peptides that cover the entire amino acid sequence including the 10-His tag, and there are 60 residues with no data because they are not covered by a labeled peptide. The residue positions that are covered by peptides are estimated by bars on the top. The cylinder length does not correspond to the length of the transmembrane helix, just for mapping purposes.

      • Can the authors explain how they found that the Nbs bind to the cytoplasmic side (before obtaining the structure)?

      Our answer: Our in vivo two-hybrid assay between the Nb and MelBSt indicated their interaction on the cytoplasmic surface of MelBSt, which is further confirmed by the melibiose fermentation and transport assay, where the transport activities were completely inhibited by intracellularly coexpressed Nb and MelBSt. Thanks for raising this question.

      • The authors use the word "substrate" indifferently for sugar and Na+ binding, which is a bit confusing. Technically, only sugar is the substrate and Na+ is a ligand, or cotransported-ion, that powers the reaction of transport. This might sound like nit-picking but it can lead to misunderstandings (at some point I thought two sugars were transported, and then I was looking for the second Na+ binding site).

      Our answer: We used to call the sugar and Na as co-substrate but we agree with this comment.

      We have changed by using substrate for the cargo sugar and coupling cation for the driving cation.

      • Abstract "only the inner barrier" - the is missing.

      Thanks. We have corrected this.

      • p.3 intro "and identified that the positive cooperativity of cation and melibiose, " something is missing.

      Thanks again. We missed the “as the core symport mechanism”.

      • P.6 Nb275_4 instead of Nb725_4

      Thank you very much for your careful reading.

      • P.7. Also, affinity affinities

      Thank you very much. We changed to “; and also, the -NPG affinity decreased by 21~32-fold for both Nbs”

      • P.8 " contains 417 MelBSt residues (positions 2-210, 219-355, and 364-432). This does not sum up to 417 residues.

      Thanks for your critical reading. We changed 364-432 to 262-432.

      • p.9 Lue 54

      We have corrected it to Leu54.

      • I find fig.3 hard to read. Can the authors show the Na+ binding pockets and sugar binding pockets within the structure? Especially figure 3b. why are the residues in different colors?

      Our answer: We have moved Fig 3b into sFig. 11. We colored the residues in the previous Fig 3B to match the hosting helices. We have added two panels to show the location of both sugar and Na in the molecular. Thank you for your comments.

      • Fig4 bcef. Colored circles at the end of the helices. What are they for?

      Our answer: We revised the legend. “The paired helices involved in either barrier formation were highlighted in the same colored circles.”

      • 86% coverage includes the his-tag - it would be good to clarify that.

      Our answer: Yes, it includes the 10-His tag.

      • Fig.7 - anti clockwise cycle of transport is counter-intuitive.

      Our answer: We have re-arranged. Our model was constructed originally to explain efflux due to limited information at the earlier state. Now more data are available allowing us to explain inflow and active transport.

      • Where are all the uptake plots per peptide for the HDX-MS data?

      Our answer: We have added the course raw data and prepared all uptake plots for all 71 peptides with statistically significant changes as an Extended Fig. 1.

      • P.22 protein was concentrated to 50 mg/mL. Really? That is a lot.

      This is correct. We can even concentrate MelBSt protein to greater than 50 mg/ml.

      • Have the authors looked into the potential role of lipids in regulating the conformational transition? Since the structure was obtained in nanodiscs, have they observed some unexplained densities? The role of lipid-protein interactions in regulating such transitions was observed for several transporters including MFS (Gupta K, et al. The role of interfacial lipids in stabilizing membrane protein oligomers. Nature. 2017 10.1038/nature20820. Martens C, et al. Direct protein-lipid interactions shape the conformational landscape of secondary transporters. Nat Commun. 2018 10.1038/s41467-018-06704-1.). Furthermore, I see the authors have already observed lipid specific functional regulation of MelB (ref: Hariharan, P., et al BMC Biol 16, 85 (2018). https://doi.org/10.1186/s12915-018-0553-0). A few words about this previous work, and even commenting on the absence of lipid-protein interactions in this current work is worthwhile.

      Our answer: Thanks for this very relevant comment. We paid attention to the unmodelled densities. There is one with potential but it is challenging to model it. We have added a sentence “There is no unexplained density that can be clearly modeled by lipids.” in the method to address this concern.

      Reviewer #3 (Recommendations For The Authors):

      1) In the following sentence, the authors report high errors for the Kd value. The anti-Fab Nb binding to NabFab was two-fold poorer than Nb725_4 at a Kd value of 0.11 {plus minus} 0.16 μM. The figure however indicates that the error value is 0.016 µM. Pls correct.

      Our answer: Thank you. You are correct. The error has been corrected. 0.16 ± 0.02 uM. In this revised manuscript, we present the data in nM units.

      2) Is the stoichiometry of the MelB:Na+ symport clearly known in this transporter. It can be mentioned in the discussion with appropriate references.

      Our answer: Yes, the stoichiometry of unity has been clearly determined, which was included in the second paragraph of the previous version.

      3) In the last section of results, the authors seem to suggest a greater movement within their Cterminal helical bundle compared to N-terminal helices. Is there evidence to suggest an asymmetry in the rocker switch between the two states of the transporter?

      Our answer: Our structural data revealed that the C-terminal bundle is more dynamic compared with the N-terminal bundle where hosts the residues for specific binding of galactoside and Na+. The HDX data showed that the most dynamic regions are the structurally unresolved C-terminal tail by either method, the conserved tail helix and the middle-loop helix. transmembrane helices are relatively less dynamic with similar distributions on both transmembrane bundles. Since the most dynamic regions are peripheral element associated with the C-terminal domain, it might give a wrong impression. With regard to the symmetric or asymmetric movement, which will certainly affect the dynamic interactions between the transporter and the lipids, we favor the notion that MelBSt performs symmetric movement during the rocker switch between inward and outward states at the least cost for the protein-lipids interaction.

      4) Figure 1. Are the thermograms exothermic or endothermic? clarify

      Our answer: In our thermograms, all positive peaks are exothermic due to the direct detection of the heat release by the TA instrument. We clarified this in Method and now we stress this in figure legends to avoid confusion.

      5) Figure 4a,d. Please put in a membrane bilayer and depict cytosolic and extracellular compartments for clarity.

      Thank you. We have added a bilayer and labeled the sidedness in this figure and other related figures.

      6) Fig 7. Melibiose symport cannot be referred to as Melibiose efflux transport in the legend as the latter refers to antiport. Pls rectify.

      Our answer: Influx and efflux are conventionally used to describe the direction of movement of a substrate. The use of symport and antiport indicates the directions of the coupling reaction for the cargo and cation. For the symporter MelB, melibiose efflux means that sugar with the coupled cation moves out, which is driven by the melibiose concentration. During the steady state of melibiose active transport, efflux rate = influx rate.

      7) Page 11 "A common feature of carrier transporters". The authors can use either carriers or transporters. Need not use both simultaneously.

      Sorry for overlooking this. We have deleted carriers. Thank you very much for your time.

      8) Several typos were noticed in this manuscript. some are listed below. pls correct.

      Page 4- last paragraph "Furthermore"

      We have corrected it. Thank you again!

      Page 7 - second para one repharse "affinity reduced by 21~32 fold/units.." pls clarify

      Added 21~32 fold.

      Page 9 - "so it is highly likely that inward-open conformation" pls correct.

      We have corrected to “likely”.

      Fig. S9c - correct the spelling "Distance".

      We have corrected to “Distance”

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing key molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      Response: We appreciate eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods. Therefore the criticism seems unnecessarily harsh given the limitations of our stated objective.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      Response: Blood neutrophil count was not found to be a predictor for TBM mortality in our previous studies. We agree it could be useful to perform predictive analysis with neutrophil count as suggested by reviewer. Regarding hub genes versus genes representative of the biological pathways, we cannot know which have better predictive values without performing variable selection for the sets of all genes including both hub genes and pathway representative genes, additional analysis which we will undertake.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      -Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      -The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis can not be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      Response: As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis can not be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model. We agree that validation of the gene signature in an independent sample set will be a crucial next step.

    1. Author Response

      Author responses to the original review:

      The data we produce are not criticized as such and thus, do not require revision; the criticisms concern our interpretation of them. General themes of the reviews are that i) genetic signatures do not matter for defining neuronal types (here sympathetic versus parasympathetic); ii) that a cholinergic postganglionic autonomic neuron must be parasympathetic; and iii) that some physiology of the pelvic region would deserve the label “parasympathetic”. We answered the latter argument in (Espinosa-Medina et al., 2018) to which we refer the interested reader; and we fully disagree with the first two. Of note, part of the last sentence of the eLife assessment is misleading and does not reflect the referees’ comments. Our paper analyses genetic differences between the cranial and sacral outflow and uses them to argue that they cannot be both parasympathetic. The eLife assessment acknowledges the “genetic differences” but concludes that, somehow, they don’t detract from a common parasympathetic identity. We take issue with this paradox, of course, but it is coherent with the referee’s comments. On the other hand, the eLife assessment alone pushes the paradox one step further by stating that “functional differences” between the cranial and sacral outflows can’t either prevent them from being both parasympathetic. We would also object to this, but the only “functional differences” used by the referees to dismiss our diagnostic of a sympathetic-like character (rather than parasympathetic) for the sacral outflow are between noradrenergic and cholinergic, and between sympathetic and parasympathetic (and we also disagree with those, see above, and below) —not between cranial and sacral.

      We will thus use the opportunity offered by eLife to keep the paper as it is (with a few minor stylistic changes). We respond below to the referees’ detailed remarks and hope that the publication, as per eLife new model, of the paper, the referees’ comments and our response will help move the field forward.

      Public review by Referee #1

      “Consistently, the P3 cluster of neurons is located close to sympathetic neuron clusters on the map, echoing the conventional understanding that the pelvic ganglia are mixed, containing both sympathetic and parasympathetic neurons”.

      The greater closeness of P3 than of P1/2/4 to the sympathetic cluster can be used to judge P1/2/4 less sympathetic than P3 (and more… something else), but not more parasympathetic. There is no echo of the “conventional understanding” here.

      “A closer look at the expression showed that some genes are expressed at higher levels in sympathetic neurons and in P2 cluster neurons ” [We assume that the referee means “in sympathetic neurons and in P3 cluster neurons”] but much weaker in P1, P2, and P4 neurons such as Islet1 and GATA2, and the opposite is true for SST. Another set of genes is expressed weakly across clusters, like HoxC6, HoxD4, GM30648, SHISA9, and TBX20.

      These statements are inaccurate; On the one hand, the classification is not based on impression by visual inspection of the heatmap, but by calculations, using thresholds. Admittedly, the thresholds have an arbitrary aspect, but the referee can verify (by eye inspection of heatmap) that genes which we calculate as being at “higher levels in sympathetic neurons and in P3 cluster neurons, but much weaker in P1, P2, and P4 neurons” or vice versa, i.e. noradrenergic or cholinergic neurons (genes from groups V and VI, respectively), have a much bigger difference than those cited by the referee, indeed are quasi-absent from the weaker clusters or ganglia. In addition, even by subjective eye inspection:

      Islet is equally expressed in P4 and sympathetics.

      SST is equally expressed in P1 and sympathetics.

      Tbx20 is equally expressed in P2 and sympathetics.

      HoxC6, HoxD4, GM30648, SHISA9 are equally expressed in all clusters and all sympathetic ganglia.

      “Since the pelvic ganglia are in a caudal body part, it is not surprising to have genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa (to have genes expressed in sphenopalatine ganglia, but not in pelvic ganglia), according to well recognized rostro-caudal body patterning, such as nested expression of hox genes.”

      We do not simply show “genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa”, i.e. a genetic distance between pelvic and sphenopalatine, but many genes expressed in all pelvic cells and sympathetic ones, i.e. a genetic proximity between pelvic and sympathetic. This situation can be deemed “unsurprising”, but it can only be used to question the parasympathetic nature of pelvic cells (as we do), or considered irrelevant (as the referee does, because genes would not define cell types, see our response to an equivalent stance by Referee#2). Concerning Hox genes, we do take them into account, and speculate in the discussion that their nested expression is key to the structure of the autonomic nervous system, including its division into sympathetic and parasympathetic outflows.

      It is much simpler and easier to divide the autonomic nervous system into sympathetic neurons that release noradrenaline versus parasympathetic neurons that release acetylcholine, and these two systems often act in antagonistic manners, though in some cases, these two systems can work synergistically. It also does not matter whether or not pelvic cholinergic neurons could receive inputs from thoracic-lumbar preganglionic neurons (PGNs), not just sacral PGNs; such occurrence only represents a minor revision of the anatomy. In fact, it makes much more sense to call those cholinergic neurons located in the sympathetic chain ganglia parasympathetic.

      This “minor revision of the anatomy” would make spinal preganglionic neurons which are universally considered sympathetic (in the thoraco-lumbar chord), synapse onto large numbers of parasympathetic neurons (in the paravertebral chains for sweat glands and periosteum, and in the pelvic ganglion), robbing these terms of any meaning.

      Thus, from the functionality point of view, it is not justified to claim that "pelvic organs receive no parasympathetic innervation".

      There never was any general or rigorous functional definition of the sympathetic and parasympathetic nervous systems — it is striking, almost ironic, that Langley, creator of the term parasympathetic and the ultimate physiologist, provides an exclusively anatomic definition in his Autonomic Nervous System, Part I. Hence, our definition cannot clash with any “functionality point of view”. In fact, as we briefly say in the discussion and explore in (Espinosa-Medina et al., 2018), it is the “sacral parasympathetic” paradigm which is unjustified from a functionality point of view, for implying a functional antagonism across the lumbo-sacral gap, which has been disproven repeatedly. It remains to be determined which neurons are antagonistic to which on the blood vessels of the external genitals; antagonism within one division of the autonomic nervous system would not be without precedent (e.g. there exist both vasoconstrictor and vasodilator sympathetic neurons, and both, inhibitor and activator enteric motoneurons). The way to this question is finally open to research, and as referee#2 says “it is early days”.

      Public review by Referee #2

      This work further documents differences between the cranial and sacral parasympathetic outflows that have been known since the time of Langley - 100 years ago.

      We assume that the referee means that it is the “cranial and sacral parasympathetic outflows” which “have been known since the time of Langley”, not their differences (that we would “further document”): the differences were explicitly negated by Langley. As a matter of fact, the sacral and cranial outflows were first likened to each other by Gaskell, 140 years ago (Gaskell, 1886). This anatomic parallel (which is deeply flawed (Espinosa-Medina et al., 2018)) was inherited wholesale by Langley, who added one physiological argument (Langley and Anderson, 1895) (which has been contested many times (Espinosa-Medina et al., 2018) and references within).

      In addition, the sphenopalatine and other cranial ganglia develop from placodes and the neural crest, while sympathetic and sacral ganglia develop from the neural crest alone.

      Contrary to what the referee says, the sphenopalatine has no placodal contribution. There is no placodal contribution to any autonomic ganglion, sympathetic or parasympathetic (except an isolated claim concerning the ciliary ganglion (Lee et al., 2003)). All autonomic ganglia derive from the neural crest as determined a long time ago in chicken. For the sphenopalatine in mouse, see our own work (Espinosa-Medina et al., 2016).

      One feature that seems to set the pelvic ganglion apart is […] the convergence of preganglionic sympathetic and parasympathetic synapses on individual ganglion cells (Figure 3). This unusual organization has been reported before using microelectrode recordings (see Crowcroft and Szurszewski, J Physiol (1971) and Janig and McLachlan, Physiol Rev (1987)). Anatomical evidence of convergence in the pelvic ganglion has been reported by Keast, Neuroscience (1995).

      Contrary to what the referee says, we do not provide in Figure 3 any evidence for anatomic convergence, i.e. for individual pelvic ganglion cells receiving dual lumbar and sacral inputs. We simply show that cholinergic neurons figure prominently among targets of the lumbar pathway. This said, the convergence of both pathways on the same pelvic neurons, described in the references cited by the referee, is another major problem in the theory of the “sacral parasympathetic” (as we discussed previously (Espinosa-Medina et al., 2018)).

      It should also be noted that the anatomy of the pelvic ganglion in male rodents is unique. Unlike other species where the ganglion forms a distributed plexus of mini-ganglia, in male rodents the ganglion coalesces into one structure that is easier to find and study. Interestingly the image in Figure 3A appears to show a clustering of Chat-positive and Th-positive neurons. Does this result from the developmental fusion of mini ganglia having distinct sympathetic and parasympathetic origins?

      The clustering of Chat-positive and Th-positive cells could arise from a number of developmental mechanisms, that we have no idea of at the moment. This has no bearing on sympathetic and parasympathetic.

      In addition, Brunet et al dismiss the cholinergic and noradrenergic phenotypes as a basis for defining parasympathetic and parasympathetic neurons. However, see the bottom of Figure S4 and further counterarguments in Horn (Clin Auton Res (2018)).

      The bottom of Figure S4 simply indicates which cells are cholinergic and adrenergic. We have already expounded many times that noradrenergic and cholinergic do not coincide with sympathetic and parasympathetic. Henry Dale (Nobel Prize 1936) demonstrated this. Langley himself devoted several pages of his final treatise to this exception to his “Theory on the relation of drugs to nerve system” (Langley, 1921) (p43) (which was actually a bigger problem for him than it is for us, for reason which are too long to recount here; it is as if the theoretical difficulties experienced by Langley had been internalized to this day in the form of a dismissal of the cholinergic sympathetic neurons as a slightly scandalous but altogether forgettable oddity). (Horn, 2018) reviews the evidence that the thoracic cholinergic sympathetic phenotype is brought about by a secondary switch upon interaction with the target and argues that this would be a fundamental difference with the sacral “parasympathetic”. But in fact the secondary switch is preceded by co-expression of ChAT and VAChT with Th in most sympathetic neurons (reviewed in (Ernsberger and Rohrer, 2018)); and we have no idea of the dynamic in the pelvic ganglion. It may also be mentioned in this context that target-dependent specification of neuronal identity has also been demonstrated of other types of sympathetic neurons ((Furlan et al., 2016)

      What then about neuropeptides, whose expression pattern is incompatible with the revised nomenclature proposed by Brunet et al.?

      There was never any neuropeptide-inspired criterion for a nomenclature of the autonomic nervous system.

      Figure 1B indicates that VIP is expressed by sacral and cranial ganglion cells, but not thoracolumbar ganglion cells.

      Contrary to what the referee says, there are VIP-positive cells in our sympathetic data set and even strongly positive ones, except they are scattered and few (red bars on the UMAP). They correspond to cholinergic sympathetics, likely sudomotor, which are known to contain VIP (e.g.(Anderson et al., 2006)(Stanke et al., 2006)). In other words, VIP is probably part of what we call the cholinergic synexpression group (but was not placed in it by our calculations, probably because of a low expression level in sympathetic noradrenergic cells).

      The authors do not mention neuropeptide Y (NPY). The immunocytochemistry literature indicates that NPY is expressed by a large subpopulation of sympathetic neurons but never by sacral or cranial parasympathetic neurons.

      Contrary to what the referee says, Keast (Keast, 1995) finds 3.7% of pelvic neurons double stained for NPY and VIP in male rats, and says (Keast, 2006) that in females “co-expression of NPY and VIP is common” ( thus in cholinergic neurons that the referee calls “parasympathetic”). Single cell transcriptomics is probably more sensitive than immunochemistry, and in our dichotomized data set (table S1), NPY is expressed in all pelvic clusters and all sympathetic ganglia. In other words, it is one more argument for their kinship. It does not appear in the heatmap because it ranks below the 100 top genes.

      Answer to the original recommendations by Referee #2

      Introduction - the use of the words 'consensual' and 'promiscuity' are not clear and rather loaded in the context of the pelvic ganglia. Pick alternative words.

      There is no sexual innuendo inherent in “promiscuity”: “condition of elements of different kinds grouped or massed together without order” (Oxford English Dictionary). We replaced “never consensual” by “never generally accepted”.

      Results - Page 2 - what sex were the mice? Previous works indicate significant sexual dimorphism in the pelvic ganglion.

      The mice included both males and females, and male and female cells are represented in all ganglia and clusters. This is now mentioned in the Material and Methods. Thus, however unsuited to analyze sexual dimorphism, our data set ensures that all the cell types we describe are qualitatively present in both sexes.

      Results line 3 - the celiac and mesenteric ganglia are prevertebral ganglia and not part of the sympathetic chain. The chain refers to the paravertebral ganglia.

      We replaced “part of the prevertebral chain” by “belonging to prevertebral ganglia”. This said, there are precedents for “prevertebral chain ganglia” to designate the rostro-caudal series of prevertebral ganglia. Rita Levi-Montalcini, for example, who devoted her glorious career to sympathetic ganglia, writes in 1972 “The nerve cell population of para- and prevertebral chain ganglia is reduced to 3–5% of that of controls”. (10.1016/0006-8993(72)90405-2).

      Page 3 - "as the current dogma implies". Dogma often refers to opinion or church doctrine. The current nomenclature is neither. Pick another word.

      There is little in science that is proven to the point of eliminating any element of opinion. “Dogma” refers to “that which is held as a principle or tenet […], especially a tenet authoritatively laid down by […] a school of thought” (OED). And “dogma” is used in science to designate tenets better experimentally supported than the “sacral parasympathetic”, such as the “central dogma of molecular biology”.

      Page 3 - "To give justice" implies the classical notion is unjust. How about, 'to further explore previous evidence indicating that ....'

      The term is indeed not proper English for the meaning intended, and the right expression is “to do justice”, to mean: “to treat [a subject or thing] in a manner showing due appreciation, to deal with [it] as is right or fitting” (OED). We have corrected the paper accordingly.

      Page 4 top - the convergence indicated by Figure 3 does not justify excluding cholinergic and noradrenergic genes from the analysis.

      Contrary to what the referee says, Figure 3 does not show any “convergence”, see our answer to Referee#1. What Figure 3 shows is that cells that are targeted by the lumbar pathway (a pathway universally deemed “sympathetic”) are cholinergic in massive proportion. Therefore, by an uncontroversial criterion, the pelvic ganglion contains lots of sympathetic cholinergic neurons. The only other option is to declare that sympathetic preganglionic neurons synapse onto parasympathetic postganglionic ones (which is what Referee#1 proposes, and considers “much simpler”. We beg to differ).

      Our justification for excluding cholinergic and noradrenergic genes from the definition of “sympathetic” and “parasympathetic” is simply that sympathetic neurons can be cholinergic (to sweat glands and periosteum; and — as we show in Figure 3 — many targets of the lumbar pathway); One can also note that anywhere else in the nervous system, classifying cell types as a function of neurotransmitter phenotype would lead to non-sensical descriptions, such as putting together pyramidal cells and cerebellar granules, or motor neurons and basal forebrain cholinergic neurons. Indeed Referee#1 proposes such a revolutionary revision, by calling all cholinergic autonomic neurons “parasympathetic” (see our answer above).

      Keast (1995) did similar experiments and used presynaptic lesions to draw a different conclusion indicating preferential innervation pelvic subpopulations.

      Keast found “preferential” innervation of pelvic subpopulations based on lesion experiments; Nevertheless, she concluded (at the time) that “the correct definition of these two components of the nervous system is based on neuroanatomy rather than chemistry” (Keast, 2006).

      Page 4 - "In the aggregate, the pelvic ganglion is best described as a divergent sympathetic ganglion devoid of parasympathetic neurons" The notion of a divergent ganglion is completely unclear!

      We take “divergent” in a developmental or evolutionary meaning: related to sympathetic ganglia, yet somewhat differing from them. Elsewhere we use the word “modified”. Importantly (and as cited in the paper), a similar situation emerges from the single cell transcriptomic analysis of the lumbar and sacral preganglionics (by other research groups).

      Granted, it is devoid of neurons having the signature of cranial parasympathetics, but that is insufficient to conclude that they are not parasympathetics.

      If a genetic signature which is not only un-parasympathetic, but sympathetic-like remains compatible with some version of the label “parasympathetic”, we get dangerously close to dismissing the molecular make-up of a neuron as a definition of its type. This goes against any contemporary understanding of neuron types (take (Zeisel et al., 2018) among hundreds of other examples).

      Page 4 - "the entire taxonomy of autonomic ganglia could be a developmental readout of Hox genes." This reader completely agrees! We appreciate this would be difficult to test but it helps to explain possible differences along the rostro-caudal axis. Consider making this a key implication of the study!

      If the reader agrees, then his/her previous points become mysterious: we speculate that the Hox code determines the structure of the autonomic nervous system, i.e. the array, along the rostrocaudal axis, of a bulbar parasympathetic, a thoracolumbar sympathetic and lumbo-sacral “pelvo-sympathetic”. The existence of caudal parasympathetic neurons, on the contrary, would subvert any role for Hox genes: similar neurons (similar enough to be called by the same name) would arise at completely different rostro-caudal levels, i.e. with a different Hox code.

      Page 5 - "It is thus remarkable ...that we uncover in no way contradicts the physiology." Not really. The 'classical' sympathetic system innervates the limbs, and the skin and it participates in thermoregulation and in cardiovascular adjustments to exercise. The parasympathetic system does none of these things. Reclassing the pelvic outflow as pseudo-sympathetic contradicts this physiology.

      We do not say that the sacral outflow is classically sympathetic; We go all the way to proposing the special name “pelvo-sympathetic”; And we insist that these special sympathetic-like neurons have special targets (detrusor muscle, helicine arteries…): there is no contradiction. Not only is there no contradiction, but we remove the mind-twister of an anatomical/genetic/cell type-based “sacral parasympathetic” combined with a lack of physiological lumbosacral antagonism (we provide a short history of this dissonance in (Espinosa-Medina et al., 2018)), which led Wilfrid Jänig to write (Jänig, 2006)(p. 357): “Thus, functions assumed to be primarily associated with sacral (parasympathetic) are well duplicated by thoracolumbar (sympathetic) pathways. This shows that the division of the spinal autonomic systems into sympathetic and parasympathetic with respect to sexual functions is questionable”. We could not agree more: this division is questionable in terms of physiology and inexistent in terms of cell types. In other words, we reconcile cell types with physiology (but “it is early days”).

      Answer to the novel recommendations by Referee #2

      In addition to my original comments, important anatomical and functional distinctions are not explained by the data in this paper. ANATOMY- Sympathetic ganglia are located in close proximity to major branches of the aorta. Cranial and sacral parasympathetic ganglia are located next to or within the structures they innervate (e.g. eye, lung, heart, bladder).

      The pelvic ganglion, including some of its cholinergic neurons, that the referee insist are parasympathetic, is further removed from one of its major targets (the helicine arteries of the external genitals) than the sympathetic prevertebral ganglia are of some of theirs (like the gut or kidney). We discussed this issue in (Espinosa-Medina et al., 2018).

      FUNCTION- The sympathetic system controls state variables (e.g. body temperature, blood pressure, serum electrolytes and fluid balance), parasympathetic neurons do not.

      Even in the classical view, the sympathetic system controls the blood vessels of the external genitals or the size of the pupil, for example, which are not state variables.

      […] The data in the paper are a useful next step in defining the genetic diversity of autonomic neurons but do not justify or improve upon existing nomenclature. The future challenge is to understand distinctions between subsets of autonomic ganglion cells that innervate different targets and the principles that govern the integrative function of the autonomic motor system that controls behavior.

      We thank the referee for finding our data useful; and we fully agree with the latter statement. However, neurons, like many other cell types, are hierarchically organized (Zeng and Sanes, 2017), i.e. subsets of neurons belong to sets, with defining traits. Our data argue that there is no parasympathetic neuronal set that includes any pelvic ganglionic neuron. In contrast, there is a ganglionic sympathetic set (defined by our analysis of gene expression) which includes all of them — as there is a preganglionic sympathetic set that includes sacral preganglionics (Alkaslasi et al., 2021; Blum et al., 2021)(although the direct comparison with cranial preganglionics is yet to be made).

      References

      Anderson, C. R., Bergner, A. and Murphy, S. M. (2006). How many types of cholinergic sympathetic neuron are there in the rat stellate ganglion? Neuroscience 140, 567–576.

      Alkaslasi, M. R., Piccus, Z. E., Hareendran, S., Silberberg, H., Chen, L., Zhang, Y., Petros, T. J. and Le Pichon, C. E. (2021). Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord. Nat Commun 12, 2471.

      Blum, J. A., Klemm, S., Shadrach, J. L., Guttenplan, K. A., Nakayama, L., Kathiria, A., Hoang, P. T., Gautier, O., Kaltschmidt, J. A., Greenleaf, W. J., et al. (2021). Single-cell transcriptomic analysis of the adult mouse spinal cord reveals molecular diversity of autonomic and skeletal motor neurons. Nat Neurosci 24, 572–583.

      Ernsberger, U. and Rohrer, H. (2018). Sympathetic tales: subdivisons of the autonomic nervous system and the impact of developmental studies. Neural Dev 13, 20.

      Espinosa-Medina I, Saha O, Boismoreau F, Chettouh Z, Rossi F, Richardson WD, Brunet JF (2016) The sacral autonomic outflow is sympathetic. Science 354, 893-897

      Espinosa-Medina, I., Saha, O., Boismoreau, F. and Brunet, J.-F. (2018). The “sacral parasympathetic”: ontogeny and anatomy of a myth. Clin Auton Res 28, 13–21.

      Furlan, A., La Manno, G., Lübke, M., Häring, M., Abdo, H., Hochgerner, H., Kupari, J., Usoskin, D., Airaksinen, M. S., Oliver, G., et al. (2016). Visceral motor neuron diversity delineates a cellular basis for nipple- and pilo-erection muscle control. 19, 1331–1340.

      Gaskell, W. H. (1886). On the Structure, Distribution and Function of the Nerves which innervate the Visceral and Vascular Systems. J Physiol 7, 1-80.9.

      Horn, J. P. (2018). The sacral autonomic outflow is parasympathetic: Langley got it right. Clin Auton Res 28, 181–185.

      Jänig, W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge: Cambridge University Press.

      Keast, J. R. (1995). Visualization and immunohistochemical characterization of sympathetic and parasympathetic neurons in the male rat major pelvic ganglion. Neuroscience 66, 655–662.

      Keast, J. R. (2006). Plasticity of pelvic autonomic ganglia and urogenital innervation. International Review of Cytology - a Survey of Cell Biology, Vol 248 248, 141-+.

      Langley, J. N. (1921). In The autonomic nervous system (Pt. I)., p. Cambridge: Heffer & Sons ltd.

      Langley, J. N. and Anderson, H. K. (1895). The Innervation of the Pelvic and adjoining Viscera: Part II. The Bladder. Part III. The External Generative Organs. Part IV. The Internal Generative Organs. Part V. Position of the Nerve Cells on the Course of the Efferent Nerve Fibres. J Physiol 19, 71–139.

      Lee, V. M., Sechrist, J. W., Luetolf, S. and Bronner-Fraser, M. (2003). Both neural crest and placode contribute to the ciliary ganglion and oculomotor nerve. Developmental biology 263, 176–190.

      Stanke, M., Duong, C. V., Pape, M., Geissen, M., Burbach, G., Deller, T., Gascan, H., Parlato, R., Schütz, G. and Rohrer, H. (2006). Target-dependent specification of the neurotransmitter phenotype:cholinergic differentiation of sympathetic neurons is mediated in vivo by gp130 signaling. Development 133, 141–150.

      Zeisel, A., Hochgerner, H., Lönnerberg, P., Johnsson, A., Memic, F., van der Zwan, J., Häring, M., Braun, E., Borm, L. E., La Manno, G., et al. (2018). Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22.

      Zeng, H. and Sanes, J. R. (2017). Neuronal cell-type classification: challenges, opportunities and the path forward. Nat Rev Neurosci 18, 530–546.

    1. Author Response

      Reviewer #2 (Public Review):

      Manassaro et al. present an extensive three-session study in which they aimed to change defensive responses (skin conductance; SCR) to an aversively conditioned stimulus by targeting medial prefrontal cortex (their words) using repetitive TMS prior to retrieval. They report that stimulating mPFC using TMS abolishes SCR responses to the conditioned stimulus, and that this effect is specific for the stimulated region and the specific CS-US association, given that SCR responses to a different modality US are not changed.

      I like how the authors have clearly attempted to control for several potential confounds by including multiple stimulation sites, measured SCR responses to several unconditioned stimuli, and applied the experiment in multiple contexts. However, several conceptual and practical issues remain that I think limit the value of potential conclusions drawn from this work.

      The first issue that I have with this study concerns the relationship between the TMS manipulation and the theoretical background the authors present in their rationale. In the introduction the authors sketch that what they call 'mPFC' is involved in regulation of threat responses. They make a convincing case, however, almost all of the evidence they present concerns the ventromedial part of the prefrontal cortex (refs 18-25). The authors then mention that no one has ever studied the effects of 'mPFC'-TMS on threat memories. That is not surprising given that stimulating vmPFC with TMS is very difficult, if not impossible. Simulation of the electrical field that develops as a consequence from the authors manipulation (using the same TMS coil and positioning the authors use) shows that vmPFC (or mPFC for that matter) is not stimulated. The authors then continue in the methods section stating that the region they aimed for was BA10. This region they presumably do stimulate, however, that does not follow logically from their argument. BA10 is anatomically, cytoarchitectonically and functionally a wholly different area than vmPFC and I wonder if their rationale would hold given that they stimulate BA10.

      We would like to thank the Reviewer for highlighting this very important point. The Reviewer is right in stating that the Brodmann area 10 (BA 10) is anatomically, cytoarchitectonically, and functionally distinct from the ventromedial PFC. As we reported in the Methods section, the coil placement over the frontopolar midline electrode (Fpz) according to the international 10‒20 EEG coordinate system directly focused the stimulation over the medial portion of the BA 10. In the literature, the aPFC is also known as the “frontopolar cortex” or the “rostral frontal cortex” and encompasses the most anterior portion of the prefrontal cortex, which corresponds to the BA 10. In line with this observation, we have corrected “medial prefrontal cortex” (mPFC) with “medial anterior prefrontal cortex” (aPFC) throughout the manuscript. We also have corrected the theoretical background and the rationale in the Introduction section by mentioning several studies that: i) Reported the involvement of the aPFC in emotional down-regulation (Volman et al., 2013; Koch et al., 2018; Bramson et al., 2020). ii) Traced anatomical connections between the medial/lateral aPFC and the amygdala (Peng et al., 2018; Folloni et al., 2019; Bramson et al., 2020). iii) Detected functional connections between the aPFC and the vmPFC during fear down-regulation (Klumpers et al., 2010). iv) Found hypoactivation, reduced connectivity, and altered thickness of aPFC in PTSD patients (Lanius et al., 2005; Morey et al., 2008; Sadeh et al., 2015; Sadeh et al., 2016). v) Revealed that strong activation of the aPFC may promote a higher resilience against PTSD onset (Kaldewaij et al., 2021) and that enhanced aPFC activity and potentiated aPFC-vmPFC connectivity is detectable after effective therapy in PTSD patients (Fonzo et al., 2017). Furthermore, we discussed our results in light of this evidence in the Discussion section. We really thank the Reviewer for this key implementation of our study.

      The second concern I have is that although I think the authors should be praised for including both sham and active control regions, the controls might not be optimally chosen to control for the potential confounds of their condition of interest (mPFC-TMS). Namely, TMS on the forehead can be unpleasant, if not painful, whereas sham-TMS or TMS applied to the back of the head or even over dlPFC is not (or less so at the very least). Given that the SCR results after mPFC TMS show exactly the same temporal pattern as the sham-TMS but with a lower starting point, one could wonder whether a painful stimulation prior to the retrieval might have already caused habituation to painful stimulation observed in SCR in consequent CS presentations. A control region that would have been more obvious to take is the lateral part of BA10, by moving the TMS coil several centimeters to the left or right, circumventing all things potentially called medial but giving similar unpleasant sensations (pain etc).

      We would also like to thank the Reviewer for bringing to light this issue and allowing us to strengthen our results. The Reviewer is right in pointing out that rTMS application over the forehead can be subjectively perceived as unpleasant, relative to other head coordinates or sham stimulation. The question of whether an unpleasant stimulation prior to the retrieval might provoke habituation to discomfort sensations and lead to weaker SCRs in the consequent CS presentations is valid and reasonable. We also thank the Reviewer for advising us to stimulate the lateral part of BA 10 as an active control site. However, given the potential involvement of the lateral BA 10 in the fear network (see previous point) and the potential risks due to the anatomical proximity of lateral BA 10 with the temporal lobe, we reasoned to adopt an alternative approach to investigate whether “a painful stimulation prior to the retrieval might have already caused habituation to painful stimulation observed in SCR in consequent CS presentations”. We repeated the entire experiment in one further group (ctrl discomfort, n = 10) by replacing the rTMS procedure with a 10-min discomfort-inducing procedure over the same site of the forehead (Fpz) to mimic the rTMS-evoked unpleasant sensations in the absence of neural stimulation effects (see the new version of the Methods section). The electrical stimulation intensity was individually calibrated through a staircase procedure (0 = no discomfort; 10 = high discomfort). The shock amplitude was set at the current level corresponding to the mean rating of ‘4’ on the subjective scale because, in the new experiments that we performed targeting the aPFC with rTMS (n = 9), we collected participants’ rTMS-induced discomfort ratings obtaining a mean rating of 3.833 ± 0.589 SEM on the same scale. We found CS-evoked SCR levels not significantly different to those of the sham group during the test session as well as during the follow-up session, suggesting that the discomfort experienced during the rTMS procedure did not contribute to the reduction of electrodermal responses observed in the aPFC group. We reported the results of this experiment in the Results section and Figure 2-figure supplement 2.

      My final concern is that the main analyses are performed on single trials of SCR responses, which is a relatively noise measure to use on single trials. This is also done in relatively small groups (n=21). I would have liked to see both the raw or at least averaged timeseries SCR data plotted, and a rationale explaining how the authors decided on the current sample sizes, if that was based on a power analyses one must have expected quite strong effects.

      Following the Reviewer’s suggestion, we decided to remove the analysis on single trials, and we apologize for not including SCR timeseries. To quantify the amount of effect induced by the rTMS protocol, we have now added within-group comparisons (through 2 × 2 mixed ANOVAs) that show, for each group, the amount of change in CS-evoked SCRs from the conditioning phase to the test phase, as well as from the conditioning phase to the follow-up phase. Furthermore, to directly and simply depict these changes, in addition to dot plots, we have also represented them with line charts (Figs. 2C, 2H, 4C, 4H, 5C, 5H). To estimate the sample size, we had previously performed a power analysis through G*Power 3.1.9.2 and it had resulted in n = 21 per experimental group. However, by correcting data pre-processing procedures (in accordance with Reviewer 1), we obtained data that were not normally distributed. Thus, we reasoned to enlarge our sample width by re-performing a power analysis (with the new suggested statistical analyses) and then repeating the experiments. For the main statistics, i.e. mixed ANOVA (within-between interaction) with two groups and two measurements, with the following input parameters: α equal to 0.05, power (1-β) equal to 0.95, and a hypothesized effect size (f) equal to 0.25, the new estimated sample size resulted in n = 30 per experimental group.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the Authors implement a delayed feedback control method and use it for the first time in biological neuronal networks. They extend a well-established computational theory and expand it into the biological realm. With this, they obtain novel evidence, never considered before, that showcases the difference between simulated neuronal networks and biological ones. Furthermore, they optimize the DFC method to achieve optimal results in the control of cell excitability in the content of biological neuronal networks, taking advantage of a closed-loop stimulation setup that, by itself, is not trivial to build and operate and that will certainly have a positive impact the fields of cellular and network electrophysiology.

      Regarding the results, it would be very constructive if the Authors could share the code for the quasi-real-time interface with the Multichannel Systems software (current and older hardware versions), as this represents likely a bottleneck preventing more researchers to implement such an experimental paradigm.

      On the data focusing on the effects of the DFC algorithms on neuronal behavior, the evidence is very compelling, although more care should be devoted to the statistical analyses, since some of the applied statistical tests are not appropriate. In a more biological sense, further discussion and clarification of the experimental details would improve this manuscript, making it more accessible and clearer for researchers across disciplines (i.e., ranging from computational to experimental Neuroscience) and increasing the impact of this research.

      In summary, this work represents a necessary bridge between recent advances in computational neuroscience and the biological implementation of neuronal control mechanisms.

      Regarding sharing the control code, our application for closed-loop stimulation using aDFC, DFC and Poisson is now available in GitHub (https://github.com/NCN-Lab/aDFC). This was, in fact, our initial intention following the reviewing process. With this application, the user can run the developed algorithms with the MEA2100-256 System from Multi Channel Systems MCS GmbH.

      Same with the data. The dataset with the spike data from all experiments is also now publicly available in Zenodo. The data can be found in https://doi.org/10.5281/zenodo.10138446.

      Regarding the improvements in the statistical analysis, the tests are now performed following Reviewer #1 suggestions. Important to emphasize that this did not change the results/ conclusions of the work.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Grove and colleagues analyzes the role of TEAD1 transcription factors in all events regulating PNS myelin formation and maintenance and regeneration. Throughout the manuscript, the authors compare the results obtained to those they previously described in YAP/TAZ double knockout mice. Strengths of the manuscript are combined in vivo analyses by generating mutants constitutively lacking TEAD1 expression in myelinating Schwann cells (P0Cre//TEAD1f/f mice: cKO) and mutants in which TEAD1 expression can be ablated after tamoxifen-mediated recombination is myelinating Schwann cells (PlpCreER//TEAD1f/f mice: iKO). Using this approach the authors were able to assess the role of TEAD1 in all aspects related to PNS myelin: formation as well as maintenance and remyelination after injury. By exploiting these models, they were able to define the role of TEAD1 in regulating Schwann cell proliferation as well as in the cholesterol biosynthetic pathway. Collectively, their data indicate that TEAD 1 has a composite role in PNS myelination being required for developmental myelination, but dispensable for myelin maintenance. Further, they also describe a role for TEAD1 in promoting PNS remyelination after an injury event.

      Despite these strengths, there are some weaknesses that should be addressed by the authors:

      1) The manuscript would benefit from better and more detailed analysis of the role of the other TEAD transcription factors, as they are likely redundant in function to TEAD1. For example, since in cKO mice some fibers can escape the sorting defect and eventually myelinate, albeit at a lower level, could they determine whether TEAD2-4 transcription factors might compensate for TEAD1 absence in this setting?

      We speculate that other TEADs, most likely both TEAD2 and TEAD3, compensate TEAD1 in myelinating some developing axons. We also speculate that TEAD4 counteracts TEAD1, resulting in excessive proliferation of Schwann cells in Tead1 cKO. Unfortunately, because, unlike TEAD1, floxed/congenic alleles and IHC-compatible antibodies are not yet available for TEAD2-4, it is difficult to determine their roles. We attempted to knock down TEAD2-4 by injecting AAV-shRNAs into the sciatic nerves of WT and Tead1 iKO, but this intervention was not successful. Our future studies will determine compensatory and/or opposing roles of other TEADs during development and homeostasis and after nerve injury.

      2) A striking result of the study is the morphological defects observed in the process of axonal sorting and in the Remak fibers formation of TEAD1 cKO mice. To explain the sorting defect, the authors correctly analyze Schwann cell proliferation. However, since axonal sorting is mediated by the interaction between the extracellular matrix and intracellular cytoskeleton rearrangement, they should address also these two aspects. As per the Remak bundles and the poly-axonal myelination they observe, it is difficult to reconcile this "abnormal" myelination with the fact that TEAD1 cKO mice have a very severe myelinating phenotype, which is persistent in adulthood.

      It is noteworthy that we found radial sorting to be delayed, but not blocked, in Tead1 cKO, as we had previously reported for Yap/Taz cDKO mice in our earlier publication (Grove et al., eLIFE 2017). The primary reason that myelin development fails in Schwann cells lacking YAP/TAZ (or TEAD1 in the present report) is because they do not initiate myelination of sorted axons, not because of defective radial sorting. We showed that radial sorting was delayed in Schwann cells lacking YAP/TAZ because of their late S phase entry (Figure 4 in Grove et al., eLIFE 2017). In addition, our earlier report demonstrated that the key laminin receptor, integrin 6, is strongly downregulated but axons are nevertheless sorted out by Schwann cells in Yap/Taz cDKO (Figure 4-figure supplement 2 in Grove et al., eLIFE 2017). Our current view, therefore, is that extracellular matrix may contribute to reducing Schwann cell proliferation (Berti et al., 2011; Pellegatta et al., 2013; Yu, Feltri, Wrabetz, Strickland, & Chen, 2005), which helps to delay radial sorting, but that it is not required for Schwann cells lacking YAP/TAZ (or TEAD1) to sort axons (see the author response #2 in Grove et al., eLIFE 2017). Based on this information, we disagree with the reviewer that it is essential for us to address the role of extracellular matrix in delaying radial sorting in Tead1 cKO.

      Regarding Remak bundles, ‘thinly’ myelinated Remak bundles are only ‘occasionally’ observed in Tead1 cKO mice. Given that some large axons are still myelinated in Tead1 cKO mice, likely due to compensation by other TEADs, we speculate that Remak bundles are occasionally myelinated by other TEADs in Tead1 cKO. We have clarified our description and expanded our discussion of TEAD1 regulation of Remak bundles, including abnormal polyaxonal myelination.

      3) In the analyses of the cholesterol biosynthetic pathway, TEAD1 seems to be only partly involved. Again, which is the role of any of the other TEADs?

      Examining cholesterol biosynthesis pathways (SREBP1 and 2) and their target enzymes (SCD1, HMGCR, FDPS, IDI1) in Tead1 cKO and Yap/Taz cDKO, we showed that TEAD1 is required for upregulating FDPS and IDI1. These data suggest that TEAD1 plays a major role in mediating YAP/TAZ-driven cholesterol synthesis by upregulating FDPS and IDI1. It is also important to note that FDPS and IDI1 levels are reduced in TEAD1 cKO as ‘greatly’ as those in Yap/Taz cDKO (Figure 5). We therefore speculate that other TEADs compensate TEAD1 modestly, if at all, in upregulating FDPS and IDI1. We do not rule out the possibility, however, that other TEADs fully compensate TEAD1 in ‘maintaining’ cholesterol synthesis in adult Schwann cells. We will address these important questions in the future when the key resources mentioned above become available to study TEAD2-4.

      4) Why do cKO mice die before P60?

      In accordance with IACUC guidelines, we humanely euthanized Tead1 cKO mice before P60 because, like Yap/Taz cKO mice, they develop severe peripheral neuropathy.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, the authors discover that postsynaptic mitochondria in C. elegans govern glutamate receptor trafficking dynamics. The core results are two-fold. For one, they find that loss or inhibition of mcu-1 - the C. elegans mitochondrial calcium uniporter - increases GLR-1 glutamate receptor accumulation at the postsynaptic dendritic sites and enhances its trafficking dynamics. The authors hypothesize that this effect on glutamate receptors may have something to do with mitochondrial ROS production. This is because ROS is a by-product of normal oxidative phosphorylation, downstream of calcium import. Indeed, the generation of artificially high amounts of mitochondrial ROS has the opposite effect of mcu-1 loss: decreased glutamate receptor subunit accumulation. Collectively, the results support the idea that mitochondrial function can control receptor dynamics at synaptic sites. This is interesting because tight control of synaptic function likely combines several mitochondrial functions: energy production, calcium buffering, and (here) ROS signaling.

      STRENGTHS

      • The C. elegans genetic model is a strength because the authors are able to make refined conclusions by classical loss-of-function mutants (e.g., mcu-1) along with an impressive cytological toolkit to examine GLR-1 dynamics.

      • The use of pharmacology as a second means to test those genetic conclusions is a strength.

      • The authors' careful reagent verification of reporters (Ca2+, ROS, etc.) is a strength.

      • The ability to link fundamental mitochondrial processes to GLR-1 exocytosis will expand how the field thinks about mitochondrial synapse function.

      WEAKNESSES

      For the most part, the data in the paper support the conclusions, and the authors were careful to try experiments in multiple ways. But please see below:

      • (Main Point) The data are good, but they fall short of mechanism (e.g., Line 322). Figure 6 is accurate as drawn. But calcium and ROS are not abstract signals. They are likely exerting affirmative actions on specific targets. The Discussion does acknowledge this in terms of ROS and it speculates on possible targets.

      We thank the reviewer for their analytical review of our manuscript. We agree that all molecular players involved in the proposed mechanism were not identified by the data presented, so we modified the text to remove overstatements. We also agree that Ca2+ and ROS signaling is not abstract. Rather, there are specific and diverse targets of both Ca2+ and ROS signaling. Follow-up experiments are underway to identify and provide evidence for the necessity of potential ROS/Ca2+ targets in this proposed mechanism. For the current manuscript, we have modified our verbiage in an attempt to not mislead or overstate what our results suggest (e.g., changes/additions to the beginning of the ‘Discussion’, lines 365-377 and 385-388) and updated the illustration of the proposed model to include dashed lines that, as mentioned in the figure legend, indicate indirect action by ROS and Ca2+ (see revised Figure 7).

      The general idea seems to be that mitochondria import calcium through MCU-1 (and interacting factors). As a result, oxidative phosphorylation successfully occurs and mitochondrial ROS is a signaling by-product that signals glutamate receptors not to undergo exocytosis. But there are other interpretations of what might happen in between. In fact, if OXPHOS is disrupted, it is known that this can generate a lot more mitochondrial ROS than the normal by-product levels.

      We do agree that an alternative explanation could be that genetic or pharmacological inhibition of mitochondrial Ca2+ uptake disrupts oxidative phosphorylation, and as a result, inefficiencies or uncoupling in the electron transport chain would lead to an even greater increase in mitochondrial ROS production. Although oxidative phosphorylation was not directly measured, one of our post hoc analyses of GLR-1 transport suggests ATP levels are comparable between controls, mcu-1 mutants, and with Ru360 treatment: the velocity of GLR-1 transport is unchanged between these experimental groups. The processivity of molecular motors (which dictates transport velocity) is highly sensitive to relative ATP abundance. Thus, if ATP levels were dramatically decreased in mcu-1 mutants or following Ru360 treatment, then one would expect a detectable change in GLR-1 transport velocities, but we observed no change (see revised Figure S2E and related discussion at lines 183-190). Although these results do not directly indicate whether ATP production is altered with loss or inhibition of MCU-1, it does suggest that basal ATP levels remain sufficient to support the metabolic demands of GLR-1 transport.

      This reviewer wonders if excess ROS would cause an extreme response. Or alternatively, if scavenging ROS via pharmacological scavengers or SOD expression would reverse the effects.

      These are good points, and we have previously published experiments that address each of them. First, we have seen that globally increasing ROS with various concentrations of H2O2 within the physiological range (<100 nM) decreased GLR-1 transport to a similar extent (PMID: 32847966) indicating that there is not a dose-dependent decrease in GLR-1 transport. We have also assessed GLR-1 transport after treatment with concentrations of H2O2 well above the physiological range (e.g., 500 nM), but these high concentrations obliterated all GLR-1 transport. Contrary to what one may expect, we showed that decreasing ROS via pharmacological or genetic means (probably below physiological range) decreased GLR-1 transport (PMID: 35622512) via a Ca2+ independent mechanism. In other words, ROS scavenging did not have the opposite effect on GLR-1 transport, but we have not combined ROS scavenging with optical induction of ROS production (e.g., via KillerRed) nor have we assessed the potential influence of ROS scavenging on synaptic recruitment. Although we agree that these are important follow-up experiments, they will require a more sensitive ROS indicator because current genetically encoded in vivo ROS sensors cannot detect decreases in ROS levels below the physiological range (< 10 nM) (PMID: 31586057).

      Small Points

      • 33.3 mHz - just making sure, do the authors mean once every 30 seconds? That would be more straightforward.

      Yes, we do mean a 1-second pulse of light every 30 seconds. We have clarified this in the manuscript text (line 115).

      • Figure 2 is confusing. The text says that the mcu-1 mutants have a GLR-1::GFP FRAP rate that is comparable to controls (Lines 165-167). But Figure 2E suggests that it is markedly less, which is the opposite result of the slight increase in rate resulting from Ru360 treatment. And is the explanation why the GLR-1::GFP results differ from the SEP::GLR-1 results a difference between total GFP vs. surface GFP?

      The confusion is due to an incorrect statement in the results text. We have corrected this error and appreciate the reviewer for bringing it to our attention (lines 173-174).

      • I could not watch Video 2 (not sure if it is the file or just the copy I downloaded).

      We thank the reviewer for bringing this to our attention and we believe we have remedied the issue.

      • It is good that the authors tried both optical stimulation and mechanical stimulation (dropping culture plates to stimulate the worms, Figure 3). Why was the mechanical stimulation set aside for further tests in the paper?

      Mechanical stimulation consisted of dropping culture plates containing 2-3 C. elegans onto a lab bench every 30 seconds for 5 or 10 minutes. This mechanical stimulation paradigm was technically cumbersome and was less effective at inducing changes in mito-roGFP fluorescence that optical stimulation. This is likely due to habituation to the mechanical stimulus which has been well-characterized in C. elegans. The optical stimulation was therefore used as it is a more reliable and repeatable method for stimulating the AVA neuron.

      • Does this process affect all kinds of transport, or is it just the glutamate receptors? Was anything else examined?

      Transport of other proteins has not been examined in the context of mitoROS signaling. Our attempts at visualizing and quantifying the transport, synaptic delivery and exocytosis of other synaptic proteins in vivo has proven to be more technically challenging likely due to relatively lower expression in the C. elegans neurons suitable for transport analysis.

      Reviewer #3 (Public Review):

      Reactive oxygen species (ROS) have been previously shown to regulate glutamate receptor phosphorylation, long-distance transport, and delivery of glutamate receptors to synapses, however, the source of ROS is unclear. In this study, the authors test if mitochondria act as a signaling hub and produce ROS in response to neuronal activity in order to regulate glutamate receptor trafficking. The authors use a variety of optogenetic tools including the calcium reporter mitoGCaMP and the ROS reporter mito-roGFP to monitor changes in calcium and ROS, respectively, in mitochondria after activating neurons with ChRimson in the genetic model organism C. elegans. Repeated stimulation of interneurons called AVA with ChRimson leads to increased calcium uptake into mitochondria in dendrites and increased mitochondrial ROS production. The mitochondrial calcium uniporter mcu-1 is required for these effects because mcu-1 genetic loss of function or treatment with Ru360, a drug that inhibits mcu-1, inhibits the uptake of calcium into mitochondria and ROS production after neuronal activation. Mcu-1 genetic loss of function is correlated with an increase in exocytosis of glutamate receptors but a decrease in glutamate receptor transport and delivery to dendrites. This study suggests that mitochondria monitor neuronal activity by taking up calcium and downregulating glutamate receptor trafficking via ROS, as a means to negatively regulate excitatory synapse function.

      Strengths

      -The use of multiple optogenetic tools and approaches to monitor mitochondrial calcium, reactive oxygen species, and glutamate receptor trafficking in live organisms.

      -Identifying a novel signaling role for dendritic mitochondria which is to monitor neuronal activity (via calcium uptake into mitochondria) and generate a signal (reactive oxygen species) that regulates glutamate receptors at synapses.

      Weaknesses

      -Although the use of KillerRed to generate ROS downstream of mcu-1 is a clever approach, the fact that activation of KillerRed results in reduced GLR-1 exocytosis, delivery, and transport raises the concern that KillerRed is generating a high level or ROS that might be toxic to cellular processes. Experiments showing that other cellular processes are not affected by KillerRed activation and testing if reduced ROS production mimics the effects of blocking mcu-1 would strengthen the conclusions in this study.

      We thank the reviewer for their careful analyses of our findings. It is plausible that KillerRed could cause toxic levels of ROS, in fact, it was originally used to instigate oxidative stress-induced apoptosis to achieve cell-specific ablation. These cell ablation protocols required 20+ minutes of KillerRed activation with substantially higher levels of irradiation (e.g., 3.8 mW/mm [PMID: 24209746] vs. our light dosage of 25 µW/mm2). Additionally, our transgenic C. elegans strains expressing KillerRed were designed to have a relatively low KillerRed expression and were screened for low expression based on KillerRed’s fluorescence. Using these strains, we were able to minimally activate KillerRed in the AVA neuron resulting in ROS elevations at mitochondria that were comparable to neuronal activity-induced increases in mitochondrial ROS as measured by mito-roGFP. Specifically, we found that 10 minutes of mechano-stimulation and 5 minutes of ChRimson stimulation increased the fluorescence ratio (Fratio) of mito-roGFP nearly two-fold (Figure 4A-B and 4C-E). A 15-second pulse of light focused on a small region activating mitoKR in the AVA neurite also caused similar two-fold increase in the mito-roGFP Fratio (Figure 4C-E) comparable to what neuronal activity induced. Our 5-minute global KillerRed activation less effectively increased the mito-roGFP Fratio at mitochondria in the AVA neurite compared to neuronal activity (revised Figure 4B and 4H) but was sufficient in decreasing GLR-1 transport (revised Figure 5G-H). So, we decided to do all experiments with 5 minutes of global KillerRed activation since lower activation levels of KillerRed were more likely to achieve non-toxic, signaling levels of ROS. Since we strongly agree that this data is important for tool validation, we have reorganized the manuscript such that these data are now a primary figure (see revised Figure 4 and new results sub-section starting at line 252).

      Additionally, we added supplemental transport velocity data. This data shows that local photoactivation as well as whole-cell activation of KillerRed does not alter transport velocity of GLR-1 vesicles within the neurite (revised Figure S4A and S4B and lines 272-276 and 287-289), which would be the case if ATP, microtubules, or actin dynamics were affected. This supports that our local and whole-cell activation protocol does not cause toxic levels of ROS production.

      Lastly, the reviewer questions whether decreasing ROS alters GLR-1 transport, synaptic delivery and exocytosis in a similar fashion to loss or inhibition of mcu-1, and if so, would further support the proposed mechanism. We have decreased ROS via genetic (catalase overexpression) and pharmacological (using the mitochondria-targeted antioxidant MitoTEMPO) means and seen that diminished ROS levels decrease GLR-1 transport albeit to a lesser degree than that caused by loss/inhibition of mcu-1 (PMID: 35622512). To determine if decreased GLR-1 transport during diminished ROS levels involves mcu-1, we would need to assess GLR-1 transport in mcu-1 mutants while ROS is decreased (e.g., using MitoTEMPO treatment) to see if their combined effect phenocopies the effect of mcu-1(lf) or decreased ROS alone. However, as mentioned previously, we are unable to measure ROS levels below the sensitivity of roGFP but within physiological range so we cannot currently calibrate or validate our methods for scavenging ROS in vivo. This is why we have not yet analyzed synaptic delivery or exocytosis rates of GLR-1 in the context of decreased ROS, but these would be interesting follow-up experiments that may further support our model once more sensitive ROS sensors are available.

      Reviewer #4 (Public Review):

      Using optogenetic stimulation, the authors presented compelling evidence that neuronal activity increases mitochondrial calcium levels, facilitated by the mitochondrial uniporter MCU-1. Through ratiometric measurements, they showed that mitochondrial ROS levels also increase due to neuronal activity via MCU-1. Subsequent FRAP studies were employed to investigate the trafficking of the AMPA receptor, GLR-1. By integrating genetic and pharmacological methodologies, the recovery rate of GLR-1 was assessed. The authors concluded that increased mitochondrial ROS due to neuronal activity reduces the trafficking and exocytosis of AMPA receptors. They proposed that mitochondrial ROS serves as a homeostatic mechanism regulating AMPA receptor trafficking and abundance, thus maintaining synaptic strength. This research is crucial as it provides a direct link between mitochondrial signaling and AMPA receptor trafficking.

      However, there are several significant concerns regarding the methodologies and quantifications employed in this manuscript. The authors utilized GLR-SEP to label surface AMPA receptors and relied on the "FRAP rate" as an indicator of the exocytosis rate. The absence of direct visualization of exocytosis using GLR-SEP, and the lack of direct measurements of exocytosis events, casts doubt on the conclusions about ROS's impact on AMPA receptor exocytosis. Furthermore, the "FRAP rate" determined in this study is a combination of recovery rates (incorporating both endosomal trafficking and diffusion) with the mobile fractions of AMPA receptors, potentially weakened interpretations of the findings. A more comprehensive discussion addressing the conflicting effects of MCU-1 and ROS on GLR-GFP FRAP recovery and dendritic trafficking would enable readers to grasp the intricate roles of mitochondrial calcium and ROS in modulating synaptic receptors.

      We appreciate the reviewer’s attention to detail while reviewing our article. Their major concern about directly visualizing exocytosis events is valid since changes in exocytosis and endocytosis would dictate the amount of SEP::GLR-1 at the synaptic membrane. However, streaming imaging of SEP in vivo is technically difficult showing only few exocytosis events and provides short “snapshots” (1-2 minutes, longer streaming imaging causes photobleaching and photo-toxicity) which must be extrapolated to longer time frames. Our 16-minute SEP::GLR-1 FRAP protocol allows us to capture all plasma membrane recruitment and quantify the relative balance between exo- and endocytosis. It also allows for longer observational periods during which we can detect changes in GLR-1 recruitment to and retention at the synaptic membrane in genetic mutants and with drug treatments. In addition, our photobleaching approach involves photobleaching a ~40-60 µm region proximally and distally to the imaging region which limits the influence of receptor diffusion on the FRAP rate. The reviewer makes a valid point that receptor endocytosis rates would also influence the SEP::GLR-1 FRAP rate. We have now changed the text in the results and discussion to include this information (lines 155-161, and changing “exocytosis” to “synaptic recruitment” throughout the manuscript when discussing SEP::GLR-1 FRAP results [e.g, at lines 169, 208, and 321]).

    1. Author Response

      Reviewer #1 (Public Review):

      Payne et al. have investigated the neural basis of VOR adaptation with the goal of constraining sites and mechanisms of plasticity supporting cerebellar learning. This has been an area of intense debate for decades; previous competing models have argued extensively about the sites of plasticity and the strength of eye velocity feedback/ efference copy signals to Purkinje cells has been central to the debate. This paper nicely explores the consequences of varying the strength of this feedback and in so doing, provides a potential explanation for why Purkinje cell responses during VOR cancellation could exhibit stronger responses following learning, despite net depression of the strength of their vestibular inputs. In that sense it provides some reconciliation of existing models. The work appears to be well done and the paper is well written. The manuscript could be improved and the significance of the work clarified and enhanced by contextualizing the work more appropriately within the existing literature in this area.

      We thank the reviewer for the nice summary of this work’s contribution to the long-standing debate regarding sites and mechanisms of plasticity underlying cerebellar learning.

      We have revised the manuscript to address several key points raised by the reviewer. We now emphasize that the main evidence for weak feedback arises from interpreting our model in the context of the existing experimental evidence for plasticity rules in the cerebellar cortex, and we have clarified the commonalities and differences from the Miles-Lisberger model. Several missing references are now included. Additionally, we clarify the comparison of our model to data after learning, and explain how altered signaling through the visual pathways drives paradoxical changes in neural activity without requiring plasticity in the visual pathways. We hope that these changes better situate the work to be interpreted appropriately in the context of the existing literature.

      Reviewer #2 (Public Review):

      Payne et al. use a computational approach to predict the sites and directions of plasticity within the vestibular cerebellum that explain an unresolved controversy regarding the basis of VOR learning. Specifically, the conclusion by Miles and Lisberger (1981) that vestibular inputs onto Purkinje cells (PCs) must potentiate, rather than depress (as in the Marr/Albus/Ito model), following gain-increase learning because when the VOR is cancelled, PC firing increases rather than decreases. Payne et al. provide a novel model solution that recapitulates the results of Miles and Lisberger but, paradoxically, uses plasticity in the cerebellar cortex that weakens PC output rather than strengthens it. However, the model only succeeds when efference copy feedback to the cerebellar cortex is relatively weak thereby allowing a second feedback pathway to drive PC activity during VOR cancellation to counteract the learned change in gain. Because the model is biologically constrained, the findings are well supported. This work will likely benefit the field by providing a number of potentially experimentally testable conclusions. The findings will be of interest to a wider audience if the results can be extrapolated to other cerebellar-dependent learning behaviors rather then just VOR gain-increase learning. Overall, the manuscript is very well written with clearly delineated results and conclusions.

      We appreciate the reviewer’s comments that the model is well-constrained and provides a solution to the long-standing debate surrounding sites and directions of plasticity underlying VOR learning.

      The reviewer raises an important question: do our results generalize across the cerebellum? We note first that we are studying the cerebellum to illustrate a core problem in modeling systems throughout the brain, namely, how to disambiguate plasticity in the face of ubiquitous feedback loops, both within the brain and between the brain and the environment. Within the cerebellum, we focused on VOR learning due to the wealth of experimental data available. While the specific effect of feedback strength on plasticity will depend on the details of the relevant cerebellar circuit, our general approach can be applied to other areas, given sufficient data, in order to determine how plasticity is distributed in the face of potential feedback loops. Importantly, error-driven LTD of the parallel fiber-Purkinje cell synapse is a fundamental hypothesized mechanism for cerebellar learning which has been generally accepted elsewhere in the cerebellum, but was called into question for VOR learning in the flocculus by the Miles-Lisberger model. Thus, our study of VOR learning has broad implications for reconciling plasticity mechanisms across the cerebellum.

      We also note that, even within the VOR circuit, the direction of plasticity and the relative dependence on plasticity at each site may depend on the timescale of learning. On longer timescales, there is thought to be consolidation of learning from a cerebellar cortical site to a brainstem site. Such consolidation from a faster-learning site to a slower-learning site is known as systems consolidation and has been shown theoretically to mitigate the ‘plasticity-stability dilemma’ of having fast learning without over-writing longer-term learning. Our model is compatible with both error-driven plasticity in the cerebellar cortex and a site of plasticity in the brainstem, with brainstem plasticity potentially mediating consolidation of earlier learned changes in the cerebellar cortex. We have now updated the text significantly to discuss the broader implications of the results and to address the reviewer’s specific comments.

      Reviewer #3 (Public Review):

      Summary: In this study, the authors attempt to determine what is the role (and strength) of feedback in a closed-loop (cerebellar) system.

      Strengths:

      1) By combining extensive data fitting of cerebellar experimental observations this study provides deep insights into existing questions and more broadly on the role of feedback and what are the limitations when inferring feedback in (plastic) neural circuits.

      2) Another strength of this study is the gradual build-up of evidence by using models of different complexities to help build the argument that weak feedback is sufficient to explain experimental observations.

      3) The paper is well-written and structured.

      Weaknesses:

      1) In principle feedback can (i) drive dynamics or/and (ii) drive learning directly. Throughout the paper, the authors refer to only the first case (i.e. dynamics). However, the role of feedback in learning is already implicitly assumed by the authors when jointly fitting the model before and after learning. Note that the general conclusion that feedback (in general) is weak may be to the first view (i.e. dynamics), but not the second. Given that a key conclusion of the paper is that no feedback is sufficient to explain the data, this suggests that feedback may instead be used for learning/plasticity.

      We fully agree with the reviewer that our conclusions do not preclude an important role for many other types of feedback, including as an instructive signal for learning. Instead of explicitly considering feedback for learning in our model, we consider static snapshots before and after learning to infer plasticity, while remaining agnostic to the neural algorithm used to achieve such plasticity. A widely held hypothesis is that motor error signals carried by climbing fibers instruct LTD at co-active parallel fiber inputs to Purkinje cells; this is indeed a form of feedback, operating on a slower timescale than “feedback for dynamics.” This “feedback for learning” is not modeled here but is fully consistent with our results, as discussed in a new paragraph of our Discussion (end of Section 3.4.1 “Pathways undergoing plasticity”).

      2) There are some potential limitations of the conclusions drawn due to the model inference methods used. The methods used (fmincon) can easily get stuck in local minima and more importantly they do not provide an overview of the likelihood of parameters given the data. A few studies have now shown that it is important to apply more powerful inference techniques both to infer plasticity (Bykowska et al. Frontiers 2019) and neural dynamics (Gonçalves et al. eLife 2020). As highlighted by Costa et al. Frontiers 2013 using more standard fitting methods can lead to misleading interpretations. Given the large range of experimental data used to constrain the model, this may not be an issue, but it is not explicitly shown.

      The reviewer correctly points out that we used a deterministic model-fitting procedure. To address this concern, we complemented the full dynamic model with a simple analytic model ( Figure 5 ) for which we could fully derive the cost function landscape and analytically show that there is a line of parameters corresponding to a perfect degeneracy in the model. Thus, the challenge in the model we analyze is that there are too many solutions, rather than it being difficult to find a solution. Given this degeneracy, we chose to fix the level of efference copy feedback and then find the (now non-degenerate) solutions, and to then compare these different solutions with regards to their implications for the correlated strengths and changes in strengths of different pathways. We have edited the relevant section of the Discussion for clarity on this topic, and have added references to the additional strategies for model inference mentioned above, in Section 3.3 “Relation to other sloppy models”.

      3) There is some lack of clarity on how the feedback pathways as currently presented should be interpreted in the brain.

      We interpret this comment as referring to the questions of (1) whether our model includes a pathway for learning through feedback, (2) what is the anatomical implementation of the efference copy feedback pathway and visual pathways, and (3) how should the positive weights on the efference copy feedback pathway k PE be interpreted. We address these below.

      (1) Feedback for learning was discussed in point 1 above.

      (2) Anatomical implementation of efference copy pathway: We have edited the Discussion to clarify that there is anatomical evidence for efference copy input to the cerebellum, but that a key aspect of ‘feedback’ is that activity functionally loops back onto itself. Instead, neurons carrying eye movement commands (such as in the vestibular nucleus) could send signals to the cerebellum, without receiving output from the same cerebellar neurons – this would correspond to a ‘spiraling’ pathway that does not form a closed feedback loop (Figure 8). Thus we argue that the existence of the gross anatomical pathways does not necessitate a role for strong, functional, efference copy feedback (Discussion, Section 3.1, lines 481-491).

      Anatomical implementation of visual pathway: The visual feedback pathways considered here are those that would receive visual motion information from the environment. This visual feedback is itself changed by eye movements, thus providing a net overall negative feedback loop that helps to stabilize gaze. This pathway has been proposed to involve cortical regions such as MST (discussed in Materials and Methods, Model Implementation, lines 769-774).

      (3) Interpretation of positive feedback loop: In our model, the efference copy feedback filter, k PE , has positive weight. This corresponds to the positive net sign of the Purkinje cell to brainstem to Purkinje cell feedback loop. Specifically, the Purkinje cell to brainstem pathway is inhibitory (because Purkinje cells are inhibitory), the brainstem to eye velocity command pathway is inhibitory (to achieve counter-rotation of the eyes in response to head turns), and the feedback of this eye velocity command back to Purkinje cells (k PE ) is positive. Thus this loop in our model represents positive feedback. This is now clarified in Materials and Methods, Model Implementation, lines 748.

      4) The functional benefits of having (or not) feedback could be better discussed (related to point 1 above).

      Related to point 1 above, it is certainly the case that feedback is necessary for learning. We do not explicitly model the climbing fiber feedback thought to be involved in learning/plasticity of the parallel fiber pathway.

      We instead focus on the role of efference copy feedback, and how it functionally impacts the required sites and signs of plasticity in the circuit. As shown in the paper, if the efference copy pathway is strong, then this is most consistent with learned changes in eye movements being driven primarily by plasticity in the brainstem pathway (as in the Miles-Lisberger hypothesis), whereas if the efference copy pathway is weak, then this is most consistent with learned changes in eye movements being driven by net depression in the parallel fiber to Purkinje cell pathway (as in the classic Marr-Albus-Ito model and as suggested by most cellular and molecular studies of parallel fiber-Purkinje cell plasticity), in addition to a role of plasticity in the brainstem pathway. We also note that, in the ‘Strong Feedback’ model, the feedback is so strong that the system is on the brink of instability – this has been argued to have the functional benefit of providing ‘inertia’ to eye movements that could help to maintain eye movements during smooth pursuit when a target goes behind an occluder, but it also has the disadvantage of placing the system at a level of positive feedback near the brink of instability. We also note that the visual feedback pathway through the environment, emphasized in this work, serves as a negative feedback loop that reduces deviations between the eye and target velocity. We have extensively re-written the first section of the Discussion (Section 3.1), in order to more clearly lay out the implications of each model for circuit plasticity and feedback.

      5) Some of the key conclusions of the work are not described in the abstract, namely that feedback is weak in the cerebellar system.

      Thank you for raising this point, we have added this key conclusion to the end of the abstract: “Our results address a long-standing debate regarding cerebellum-dependent motor learning, suggesting a reconciliation in which error-driven plasticity of synaptic inputs to Purkinje cells is compatible with seemingly oppositely directed changes in Purkinje cell activity. More broadly, the results demonstrate how learning-related changes in neural activity can appear to contradict the sign of the underlying plasticity when either internal feedback or feedback through the environment is present.”

      Claims:

      The argument is well-built throughout the paper, but there are some potential caveats with the general interpretation (see weaknesses).

      Impact:

      This work has the potential to bring important messages on how best to interpret and infer the role of feedback in neural systems. For the field of the cerebellum, it also proposes solutions to long-standing problems.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Cyclic Nucleotide Binding (CNB) domains are pervasive structural components involved in signaling pathways across eukaryotes and prokaryotes. Despite their similar structures, CNB domains exhibit distinct ligand-sensing capabilities. The manuscript offers a thorough and convincing investigation that clarifies numerous puzzling aspects of nucleotide binding in Trypanosoma.

      Strengths:

      One of the strengths of this study is its multifaceted methodology, which includes a range of techniques including crystallography, ITC (Isothermal Titration Calorimetry), fluorimetry, CD (Circular Dichroism) spectroscopy, mass spectrometry, and computational analysis. This interdisciplinary approach not only enhances the depth of the investigation but also offers a robust cross-validation of the results.

      Weaknesses:

      None noticed.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript clearly shows that Trypanosoma PKA is controlled by nucleoside analogues rather than cyclic nucleotides, which are the primary allosteric effectors of human PKA and PKG. The authors demonstrate that the inosine, guanosine, and adenosine nucleosides bind with high affinity and activate PKA in the tropical pathogens T. brucei, T. cruzi and Leishmania. The underlying determinants of nucleoside binding and selectivity are dissected by solving the crystal structure of T. cruzi PKAR(200-503) and T. brucei PKAR(199-499) bound to inosine at 1.4 Å and 2.1 Å resolution and through comparative mutational analyses. Of particular interest is the identification of a minimal subset of 2-3 residues that controls nucleoside vs. cyclic nucleotide specificity.

      Strengths:

      The significance of this study lies not only in the structure-activity relationships revealed for important targets in several parasite pathogens but also in the understanding of CNB's evolutionary role.

      Weaknesses:

      The main missing piece is the model for activation of the kinetoplastid PKA which remains speculative in the absence of a structure for the trypanosomatid PKA holoenzyme complex. However, this appears to be beyond the scope of this manuscript, which is already quite dense.

      We fully agree that insight into the activation mechanism and its possible deviation from the mammalian paradigm requires a holoenzyme structure revealing the details of R-C interaction. We have attempted Cryo-EM from LEXSY-produced holoenzyme, yet upscaling the purification procedures described in this manuscript have repeatedly failed in spite of numerous protocol changes and optimizations. Much more work is required to achieve this.

      Reviewer #2 (Recommendations For The Authors):

      Some minor points to consider for enhancing the impact of this interesting manuscript:

      1) The nucleoside affinities measured are mainly for the regulatory subunits unbound to the kinase domain. How would nucleoside affinities change when the regulatory subunits are bound to the kinase domain, which is presumably the case under resting conditions? An estimation of this change in affinity is important because it more closely relates to the variations in cellular nucleoside concentrations needed for activation.

      This is an important question and we have given an indirect answer in the manuscript, but not very explicit. The EC50 values for kinase activation of the purified holoenzyme complexes are very similar or almost identical to the kD values measured by ITC with free regulatory subunits. By inference, the binding kD for the holoenzyme and for the free R-subunit cannot be very different. In addition, we have recently determined the EC50 for PKA activation in vivo in trypanosomes using a bioluminescence complementation reporter assay. The values fit perfectly to the values obtained with purified holoenzyme (Wu et al. in preparation). A sentence in Results (lines 201-203) has been added.

      2) The authors should point out that a major implication of nucleoside vs. cyclic nucleotide activation is in terms of signal termination. If phosphodiesterases (PDEs) are responsible for cAMP/cGMP signal termination, what terminates nucleoside-dependent signaling? Although the answer to this question may not be known at this stage, it is important to highlight this critical implication of the authors' study.

      The mechanism of signal termination is indeed unknown so far. We speculate that some enzymes of the purine salvage pathways are differentially localized in subcellular compartments and thereby able to establish microdomains that enable nucleoside signaling. In addition, PKA subunit phosphorylations/dephosphorylations and/or protein turnover may also regulate signal termination. As an example, free PKAC1 is rapidly degraded upon depletion of the PKAR subunit by RNAi. We have now mentioned signal termination in Discussion and have revised the last part of Discussion (lines 567-602). A possible approach to monitor compartmentalized signaling would be using the FluoSTEPs technology (Tenner et al., Sci. Adv. 2021; 7: eabe4091), but adapting this to the trypanosome system will not be a short-term task.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The investigators have performed a state-of-the art systematic review and meta-analysis of studies that may help to answer the research question: if administration of multiple antibiotics simultaneously prevents antibiotic resistance development in individuals. The amount of studies eligible for analysis is very low, and within that low number, there is huge variability in bug-drug combinations studied and most studies had a high risk of bias, further limiting the capability of meta-analysis to answer the research question. In addition, based on I2 values there is also huge statistical heterogeneity between outcomes of studies compared, further limiting the predictive value of meta-analysis. In fact, the only 2 studies meeting all eligibility criteria addressed the treatment of mycobacterium tuberculosis, for which the research question is hardly applicable. The authors, therefore, conclude that "our analysis could not identify any benefit or harm of using a higher or a lower number of antibiotics regarding within-patient resistance development." Apart from articulating this knowledge gap, the findings will not have consequences for patient care, but may stimulate the scientific community to better address this research question in future studies.

      Strengths:

      The systematic and rigorous approach for the review and meta-analysis.

      Weaknesses:

      None identified.

      We thank the reviewer for this thoughtful and positive appraisal of our work.

      Reviewer #2 (Public Review):

      Summary:

      The authors performed a systematic review and meta-analysis to investigate whether the frequency of emergence of resistance is different if combination antibiotic therapy is used compared to fewer antibiotics. The review shows that there is currently insufficient evidence to reach a conclusion due to the limited sample size. High-quality studies evaluating appropriate antimicrobial resistance endpoints are needed.

      Strengths:

      The strengths of the manuscript are that the article addresses a relevant research question that is often debated. The article is well-written and the methodology used is valid. The review shows that there is currently insufficient evidence to reach a conclusion due to the limited sample size. High-quality studies evaluating appropriate antimicrobial resistance endpoints are needed. I have several comments and suggestions for the manuscript.

      Weaknesses:

      Weaknesses of the manuscript are the large clinical and statistical heterogeneity and the lack of clear definitions of acquisition of resistance. Both these weaknesses complicate the interpretation of the study results.

      We thank the reviewer for the positive comments and pointing out where our work can be improved.

      Major comments:

      My main concern about the manuscript is the extent of both clinical and statistical heterogeneity, which complicates the interpretation of the results. I don't understand some of the antibiotic comparisons that are included in the systematic review. For instance the study by Paul et al (50), where vancomycin (as monotherapy) is compared to co-trimoxazole (as combination therapy). Emergence (or selection) of co-trimoxazole in S. aureus is in itself much more common than vancomycin resistance. It is logical and expected to have more resistance in the co-trimoxazole group compared to the vancomycin group, however, this difference is due to the drug itself and not due to co-trimoxazole being a combination therapy. It is therefore unfair to attribute the difference in resistance to combination therapy. Another example is the study by Walsh (71) where rifampin + novobiocin is compared to rifampin + co-trimoxazole. There is more emergence of resistance in the rifampin + co-trimoxazole group but this could be attributed to novobiocin being a different type of antibiotic than co-trimoxazole instead of the difference being attributed to combination therapy. To improve interpretation and reduce heterogeneity my suggestion would be to limit the primary analyses to regimens where the antibiotics compared are the same but in one group one or more antibiotic(s) are added (i.e. A versus A+B). The other analyses are problematic in their interpretation and should be clearly labeled as secondary and their interpretation discussed.

      We acknowledge the presence of statistical and clinical heterogeneity in our overall analysis. The decision to pursue this comprehensive examination was predefined in our previously published study protocol (PROSPERO CRD42020187257) and driven by our interest whether, despite some differences, we could either identify an overarching effect of combination therapy on resistance or identify factors that explain potential differences of the effect of combination therapy across pathogens/drugs. We indeed, find that heterogeneity is high, however identifying the driving factors of this heterogeneity is difficult as evidence is limited.

      We carried out several subgroup analyses, e.g. explicitly focusing on specific pathogen groups and medical conditions or exploring heterogeneity in treatment arms (figure 3, supplementary materials section 6). However, it is important to highlight that the number of studies available for these subgroup analyses was low. Additionally, recognizing the high heterogeneity within treatment arms, we performed a subgroup analysis focusing solely on resistances of antibiotics common to both arms (supplementary material section 6.1.8; which would avoid comparisons such as the one between vancomycin and co-trimoxazole raised by the reviewer). Unfortunately, this also revealed substantial heterogeneity. While we aimed to address heterogeneity through these subgroup analyses, limitations arose due to the number of studies meeting specific criteria and the nature of data provided by these studies.

      Moreover, regarding the concern on interpretation of co-trimoxazole as combination therapy, we acknowledge the confusion surrounding its classification as one or two antibiotics. Despite the common contemporary view of co-trimoxazole as a single antibiotic, we chose to consider it as two antibiotics due to historical practices, as observed in Black et al. (1982), where trimethoprim was compared to trimethoprim and sulfamethoxazole. We recognize that this decision may lead to confusion and we consider conducting a further sensitivity analysis in the future version of this manuscript, exploring the possibility of considering co-trimoxazole as a single antibiotic. We agree that the slight trend of less antibiotics performing better overserved for MRSA, should not be over interpreted as this is driven by the two studies Walsh et al 1993 and Paul et al 2015 as pointed out by the reviewer. In lines 183-186 we discuss this issue that for better evaluation of antibiotic combination therapy, more studies which use identical antibiotics (i.e. A versus A+B) are needed. We will try to clarify and highlight this in the future version of the manuscript.

      Another concern is about the definition of acquisition of resistance, which is unclear to me. If for example meropenem is administered and the follow-up cultures show Enterococcus species (which is intrinsically resistant to meropenem), does this constitute acquisition of resistance? If so, it would be misleading to determine this as an acquisition of resistance, as many people are colonized with Enterococci and selection of Enterococci under therapy is very common. If this is not considered as the acquisition of resistance please include how the acquisition of resistance is defined per included study.

      Thank you for pointing out this potential ambiguity. Our definition of “acquisition of resistance” is agnostic to bacterial species and hence intrinsically resistant species can be included if they were only detected during the follow-up culture by the studies. We will clarify this in the definition of “acquisition of the resistance” in the manuscript (see l. 259-260). However, it was not always clear from the studies which pathogens were acquired or whether intrinsically resistant species were not reported. Therefore, we rely on the studies' specifications of resistant and non-resistant without further classifying data into intrinsic and non-intrinsic resistance. The outcome “acquisition of resistance” can be seen more of a risk assessment for having any resistant bacterium during or after treatment. In contrast, the outcome “emergence of resistance” is more rigorous, demanding the same species to be measured as more resistant during or after treatment.

      Table S1 is not sufficiently clear because it often only contains how susceptibility testing was done but not which antibiotics were tested and how a strain was classified as resistant or susceptible.

      In Table S1, we omitted the listing of antibiotics for which susceptibility testing was performed, as this information is already presented in the main text (Table 1). However, we agree that linking this information better in a future version would benefit the understanding. Given the variability in methods used to assess resistance and the variability in drugs, the comparability of breakpoints is limited. Hence, we decided not to provide further details on this aspect so far.

      Line 85: "Even though within-patient antibiotic resistance development is rare, it may contribute to the emergence and spread of resistance."

      Depending on the bug-drug combination, there is great variation in the propensity to develop within-patient antibiotic resistance. For example: within-patient development of ciprofloxacin resistance in Pseudomonas is fairly common while within-patient development of methicillin resistance in S. aureus is rare. Based on these differences, large clinical heterogeneity is expected and it is questionable where these studies should be pooled.

      We agree that our formulation neglects differences in prevalence of within-host resistance emergence depending on bug-drug combinations. We will correct this in our upcoming version. (i.e. we will correct our statement to: “Within-patient antibiotic resistance development, even if rare, can contribute to the emergence and spread of resistance.”)

      Line 114: "The overall pooled OR for acquisition of resistance comparing a lower number of antibiotics versus a higher one was 1.23 (95% CI 0.68 - 2.25), with substantial heterogeneity between studies (I2=77.4%)"

      What consequential measures did the authors take after determining this high heterogeneity? Did they explore the source of this large heterogeneity? Considering this large heterogeneity, do the authors consider it appropriate to pool these studies?

      Thank you for highlighting this lack of clarity. In our upcoming version, we will emphasize the sub-analyses conducted to explore heterogeneity (i.e., figure 3 and supplementary materials section 6). Nevertheless, these analyses faced limitations due to the scarcity of evidence and the data provided by the studies. Given the lack of appropriate evidence, it is hard to identify the source of heterogeneity. The decision to pool all studies was pre-specified in our previously published study protocol (PROSPERO CRD42020187257) and was motivated by the question whether there is a general effect of combination therapy on resistance development or identify factors that explain potential differences of the effect of combination therapy across bug-drug combinations.

    1. Author Response

      We are grateful to the reviewers for their positive feedback with their comments and suggestions on the manuscript. Reviewer 1 has indicated two weaknesses and Reviewer 2 has none. With this provisional reply, we address the two concerns of the Reviewer 1:

      1) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs. This is not a risk-free extrapolation. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. We wish to clarify that only a few of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      2) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing another important point. We analysed the aa-tRNA bound elongation factor structures from both bacteria (PDB id: 1TTT) and mammal (PDB id: 5LZS) and found that the amino acid binding site is highly conserved where side chain of amino acid is projected outside. Modelling of D-amino acid in the same site shows serious clashes, indicating D-chiral rejection during aa-tRNA binding by elongation factor. In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs. Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences. However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We will present a sequence and structural conservation analysis to clarify this important point in our revised manuscript. Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      In our revised manuscript, we will provide a thorough point-by-point response to the above as well as all the specific reviewer comments. We also intend to include new analysis with updated data that would address the key questions raised by the reviewers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This nice study by Miyano combines slice electrophysiology and superresolution microscopy to address the role of RBP2 in Ca2+ channel clustering and neurotransmitter release at hippocampal mossy fiber terminals. While a number of studies demonstrated a critical role for RBPs in clustering Ca2+ channels at other synapses and some provided evidence for a role of the protein in molecular coupling of Ca2+ channels and release sites, the present study targets another key synapse that is an important model for presynaptic studies and offers access to a microdomain controlled synaptic vesicle (SV) release mechanism with low initial release probability.

      Summarizing a large body of high-quality work, the authors demonstrate reduced Ca2+ currents and a reduced release probability. They attribute the latter to the reduced Ca2+ influx and can restore release by increasing Ca2+ influx. Moreover, they propose an altered fusion competence of the SVs, which is not so strongly supported by the data in my view.

      The effects are relatively small, but I think the careful analysis of the RBP role at the mossy fiber synapse is an important contribution.

      We thank the reviewer for careful assessment of the paper. We agree that while reduced Ca influx in KO is relatively straightforward, impaired priming is somewhat indirect, remaining as suggestion. We also noted that Moser and colleagues have analyzed the function of RIM-BP2 at hair cell synapses and also showed reduced Ca influx. In cortical synapses, there have been no study using direct presynaptic recording. In the revision, we carefully cited previous studies and tried to be fair. We hope that the current revision is much improved.

      Reviewer #2 (Public Review):

      The proper expression and organization of CaV channels at the presynaptic release sites are subject to coordinative and redundant control of many active zone-specific molecules including RIM-BPs. Previous studies have demonstrated that ablation of RIM-BPs in various mammalian synapses causes significant impairment of synaptic transmission, either by reducing CaV expression or decoupling CaV from synaptic vesicles. The mechanisms remain unknown.

      In the manuscript, Sakaba and colleagues aimed to examine the specific role of RIM-BP2 at the hippocampal mossy fiber-CA3 pyramidal cell synapse, which is well-characterized by low initial release probability and strong facilitation during repetitive stimulation. By directly recording Ca2+ currents and capacitance jumps from the MF boutons, which is very challenging but feasible, they showed that depolarization-evoked Ca2+ influx was reduced significantly (~39%) by KO of RIM-BP2, but no impacts on Ca-induced exocytosis and RRP (measured by capacitance change). They used STED microscopy to image the spatial distribution of the CaV2.1 cluster but found no change in the cluster number with a slight decrease in cluster intensity (~20%). They concluded that RIM-BP2 functions in tonic synapses by reducing CaV expression and thus differentially from phasic synapses by decoupling CaV-SV.

      In general, they provide solid data showing that RIM-BP2 KO reduces Ca influx at MF-CA3 synapse, but the phenotype is not new as Moser and colleagues have also used presynaptic recording and capacitance measurement and shown that RIM-BP2 KO reduces Ca2+ influx at hair cell active zone (Krinner et al., 2017), although at different synapse model expressing CaV1.3 instead of CaV2.1. Further, the concept that RIM-BP2 plays diverse functions in transmitter release at different central synapses has also been proposed with solid evidence (Brockmann et al., 2019).

      We thank the reviewer for careful reading of the ms. We agree that previous studies have sown reduced Ca influx at hair cells, and diverse function of RIM-BP2 in different central synapses have been proposed by Brockman et al. The new point of this study is we firmly and quantitatively show the reduced Ca currents using direct presynaptic recording, which has not been done in mossy fiber synapses or cortical synapses in general. Quantitative and time-resolved measurements of the presynaptic currents cannot be done by other methods, so far. In this revision, we point this out carefully.  

      Reviewer #1 (Recommendations For The Authors):

      The MS is overall carefully prepared and I have only a few minor comments to help with further improving the manuscript.

      Abstract:

      I think the notion of different RBP function at tonic and phasic synapses is not so well founded. The reduced number of Ca2+ channels and their altered topography have been shown in multiple synapses that also include those with phasic release. Quantitative structural and functional analysis of presynaptic Ca2+ channels of RBP-2 and RBP1-2 DKO deficient AZs closely related to the present study has e.g. been provided for auditory synapses (e.g. hair cells, endbulb/calyx of end synapses that provide both phasic and sustained release.

      In abstract, we have omitted description of phasic vs tonic synapses, because it is not well founded as the reviewer pointed out. Specifically, in abstract (Line 13~):

      “Synaptic vesicles dock and fuse at the presynaptic active zone (AZ), the specialized site for transmitter release. AZ proteins play multiple roles such as recruitment of Ca2+ channels as well as synaptic vesicle docking, priming and fusion. However, the precise role of each AZ protein type remains unknown. In order to dissect the role of RIM-BP2 at mammalian cortical synapses having low release probability, we applied direct electrophysiological recording and super-resolution imaging to hippocampal mossy fiber terminals of RIM-BP2 KO mice. By using direct presynaptic recording, we found the reduced Ca2+ currents. The measurements of EPSCs and presynaptic capacitance suggested that the initial release probability was lowered because of the reduced Ca2+ influx and impaired fusion competence in RIM-BP2 KO. Nevertheless, larger Ca2+ influx restored release partially. Consistent with presynaptic recording, STED microscopy suggested less abundance of P/Q-type Ca2+ channels at AZs deficient in RIM-BP2. Our results suggest that the RIM-BP2 regulates both Ca2+ channel abundance and transmitter release at mossy fiber synapses.”

      Intro:

      Line 48: consider adding Butola et al., 2021 /endbuld of Held to reference which concurs on the notion made for Calyx. However, a contrasting finding was made for another synapse with tight coupling: RBP2 deletion did not alter tight coupling in hair cells (Krinner et al., 2017). Line 51: RBP-DKO/lack of additional effect of RBP1 deletion: suggest adding Krinner et al., 2021 to reference, which concurs with the notion made for hair cells.

      We cited Butola et al., 2021 (Line 49) and Krinner et al., 2021 (Line 52), as the reviewer suggested.

      Results:

      STED microscopy: I am concerned with two aspects of the analysis/presentation. I) I recommend replacing density with abundance as the authors do not resolve single channels. II) I appreciate the note of caution about the fact that STED nanoscopy due to the non-linear nature of the depletion process should/could not be easily used to quantify copy numbers based on immunofluorescence. I would recommend the authors perform 2D Gaussian fitting to at least the Cav2.1 immunofluorescent spots neighboring Munc13-1 spots and report the short and long axis estimates as well as potentially the area. Should the authors have confocal Cav2.1 and Cav2.2 immunofluorescent data co-acquired with STED of Munc13-1, this would be very valuable additional information, but I do not think the experiment is essential for the sake of publication if it was not done already, given the large body of high-quality physiology data.

      I) We have changed the term from density to abundance as the reviewer suggested throughout the manuscript.

      II) As the reviewer suggested, we have carried out 2D Gaussian fitting of Cav2.1 spots. The length, width, and area of Cav2.1 clusters in the AZ were not different between WT and RIM-BP2 KO terminals (Line 431-433, Figure 7-figure supplement 4). The spatial resolution of STED, especially at mossy fiber synapses in the tissue, and a small difference between WT and KO (~30 % expected from electrophysiology) could prevent detection of the difference, unlike ribbon synapses and fly NMJ where release sites and Ca channel clusters are well defined. We should also note that the intensity was calculated similar to previous studies (integral of signal intensity, Krinner et al., 2017), and not absolute peak intensity.  

      As the reviewer suggested, we have added confocal data ((Line 434-436, Figure 7-figure supplement 5). We have determined the AZ area from the Munc13-1 STED data, and Munc13-1, Cav2.1 and Cav2.2 intensities were quantified. As shown in the figure, only Ca2.1 intensity was reduced in KO, consistent with the STED data.

      Nevertheless, we should be cautious about interpretation of the intensity as the reviewer suggested, and are aware that the data are just consistent with electrophysiology. From imaging, we only see a qualitative rather than quantitative difference between WT and KO.

      Discussion:

      I think the focus on alterations of presynaptic Ca channels could be further strengthened along with the discussion of the relevant previous studies.

      Thank you for the suggestion. We have added a paragraph as shown below in the discussion (Line 531~).

      “By using direct presynaptic patch clamp recordings, we here observed a decrease of Ca2+ current amplitudes (~30%) in RIM-BP2 KO mice (Fig. 1). Consistently, STED microscopy supported reduced abundance of P/Q-type Ca2+ channels (Cav2.1) in the mutant mossy fiber terminal (Fig. 7). Interestingly, this observation is similar to that at Drosophila NMJ and hair cell synapses (Liu et al., 2011; Krinner et al., 2017), but not that at other synapses (Acuna et al., 2015; Grauel et al., 2016; Butola et al., 2021), suggesting that the functional role of RIM-BP2 in recruiting Ca2+ channels differs among synapse types. “

      Reviewer #2 (Recommendations For The Authors):

      Minor questions:

      1) The title is misleading as it only shows RIM-BP2 regulates CaV expression but not clustering.

      This has been pointed out by the 1st reviewer, too. We have adopted the term “abundance” as suggested by the 1st reviewer and changed to “RIM-BP2 regulates Ca2+ channel abundance and neurotransmitter release at hippocampal mossy fiber terminals.”

      2) Figure 7 legend. Again, RIM-BP2 only changes the intensity of CaV2.1 clusters but not the density.

      Changed Figure 7 title from “RIM-BP2 deletion alters the density …” to “RIM-BP2 deletion alters the signal intensity …”.

      3) Line 31: "Ca2+ influx through voltage-gated Ca2+ channels triggers neurotransmitter release from synaptic vesicles within a millisecond" is not correct. Ca-evoked transmitter release can only occur with such fast speed at very specialized synapses such as the calyx of Held but not at general chemical synapses.

      We changed “within a millisecond” to “within milliseconds” (Line 30).

      4) Line 44-46: In Drosophila NMJs and at Drosophila NMJs are redundant.

      We eliminated “at Drosophila NMJs”.

      5) The authors should use the verb tense consistently throughout the manuscript such as"In RIM-BP1,2 DKO mice, the coupling between Ca2+ channels and synaptic vesicles became loose, and action potential-evoked neurotransmitter release was reduced at the calyx of Held synapse (Acuna et al., 2015). At hippocampal CA3-CA1 synapses, RIM-BP2 deletion alters Ca2+ channel localization at the AZs without altering total Ca2+ influx. Besides, RIM-BP1,2 DKO has no additional effect...".

      We changed verb tenses in Line 46-49, Line 55-58, and Line 62-67. We also checked the ms once more. Thank you for pointing this out.

      6) Line 59: technically difficulty should be technical difficulty.

      Fixed.

      7) Figure 4A-B are representative traces of 0.5 mM EGTA (black) or 5 mM EGTA (red) recorded from the same terminals or from different terminals but simply superimposed?

      Representative traces are recorded from different terminals. We describe this point in the figure legend (Fig 4A). We are very sorry for confusion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      Receptor tyrosine kinases such as ALK play critical roles during appropriate development and behaviour and are nodal in many disease conditions, through molecular mechanisms that weren't completely understood. This manuscript identifies a previously unknown neuropeptide precursor as a downstream transcriptional target of Alk signalling in Clock neurons in the Drosophila brain. The experiments are well designed with attention to detail, the data are solid and the findings will be useful to those interested in events downstream of signalling by receptor tyrosine kinases.

      Authors response: We thank the reviewers for this assessment of our Manuscript. We are happy to accept the current eLife assessment of our manuscript. In our revised manuscript we have addressed all of the major reviewer comments, including additional experiments suggested by the reviewers, which have significantly strengthened the revised version.

      Reviewer #1 (Public Review):

      Sukumar et al build on a body of work from the Palmer lab that seeks to unravel the transcriptional targets of Alk signaling (a receptor tyrosine kinase). Having uncovered its targets in the mesoderm in an earlier study, they seek to determine its targets in the central nervous system. To do this, they use Targeted DamID (TaDa) in the wild-type and Alk dominant negative background and identify about 1700 genes that might be under the control of Alk signalling. Using their earlier data and applying a set of criteria - upregulated in gain-of-Alk, downregulated in loss-of-Alk, and co-expressed with Alk positive cells in single cell datasets - they arrive upon a single gene, Sparkly, which is predicted to be a neuropeptide precursor.

      They generate antibodies and mutants for Sparkly and determine that it is responsive to Alk signalling and is expressed in many neuroendocrine cells, as well as in clock neurons. Though the mutants survive, they have reduced lifespans and are hyperactive. In summary, the authors identify a previously unidentified transcriptional target of Alk signalling, which is likely cleaved into a neuropeptide and is involved in regulating circadian activity.

      The data support claims made, are generally well presented and the manuscript clearly written. The link between circadian control of Alk signalling in Clock neurons > Spar expression > ultimately controlling circadian activity, however, was not clear.

      Authors response: We thank the reviewer for this through reading of our manuscript and for kindly highlighting the important takeaways from the study. The role of Alk signalling in activity, circadian rhythm and sleep has previously been reported by other groups in the following studies – (Bai and Sehgal, 2015; Weiss et al, 2017; Gouzi, Bouraimi et al 2018), which we have discussed in our manuscript. We also have identified a hyperactivity phenotype in our Alk CNS specific loss-of-function allele, AlkRA, which is similar to the Spar loss-of-function mutant phenotype. We hypothesize that one of ways in which Alk signalling regulates fly activity is through regulating Spar gene expression in neuroendocrine cells. This is supported by our data which shows Alk expression in Clock neurons, as well by the new experimental data showing an activity phenotype in flies expressing Spar RNAi driven by the Clk678-Gal4 driver.

      Reviewer #2 (Public Review):

      This manuscript illustrates the power of "combined" research, incorporating a range of tools, both old and new to answer a question. This thorough approach identifies a novel target in a well-established signalling pathway and characterises a new player in Drosophila CNS development.

      Largely, the experiments are carried out with precision, meeting the aims of the project, and setting new targets for future research in the field. It was particularly refreshing to see the use of multi-omics data integration and Targeted DamID (TaDa) findings to triage scRNA-seq data. Some of the TaDa methodology was unorthodox (and should be justifed/caveats mentioned in the main text), however, this does not affect the main finding of the study.

      Their discovery of Spar as a neuropeptide precursor downstream of Alk is novel, as well as its ability to regulate activity and circadian clock function in the fly. Spar was just one of the downstream factors identified from this study, therefore, the potential impact goes beyond this one Alk downstream effector.

      Authors response: We thank the reviewer for the positive comments highlighting the strengths of our study. TaDa was used as a semi-quantitative readout of the transcriptional activity in a Alk loss-of-function background with an emphasis on relative differences in peaks close to GATC sites, providing an important dataset for integration with bulk and single cell RNAseq. As the reviewer points out there are important considerations when interpreting this data and we have now added sentences in the discussion to inform readers of possible caveats of our TaDa dataset.

      Reviewer #3 (Public Review):

      Summary:

      The receptor tyrosine kinase Anaplastic Lymphoma Kinase (ALK) in humans is nervous system expressed and plays an important role as an oncogene. A number of groups have been signalling ALK signalling in flies to gain mechanistic insight into its various role. In flies, ALK plays a critical role in development, particularly embryonic development and axon targeting. In addition, ALK also was also shown to regulate adult functions including sleep and memory. In this manuscript, Sukumar et al., used a suite of molecular techniques to identify downstream targets of ALK signalling. They first used targeted DamID, a technique that involves a DNA methylase to RNA polymerase II, so that GATC sites in close proximity to PolII binding sites are marked. They performed these experiments in wild-type and ALK loss of function mutants (using an Alk dominant negative ALkDN), to identify Alk responsive loci. Comparing these loci with a larval single-cell RNAseq dataset identified neuroendocrine cells as an important site of Alk action. They further combined these TaDa hits with data from RNA seq in Alk Loss and Gain of Function manipulations to identify a single novel target of Alk signalling - a neuropeptide precursor they named Sparkly (Spar) for its expression pattern. They generated a mutant allele of Spar, raised an antibody against Spar, and characterised its expression pattern and mutant behavioural phenotypes including defects in sleep and circadian function.

      Strengths:

      The molecular biology experiments using TaDa and RNAseq were elegant and very convincing. The authors identified a novel gene they named Spar. They also generated a mutant allele of Spar (using CrisprCas technology) and raised an antibody against Spar. These experiments are lovely, and the reagents will be useful to the community. The paper is also well written, and the figures are very nicely laid out making the manuscript a pleasure to read.

      Weaknesses:

      My main concerns were around the genetics and behavioural characterisation which is incomplete. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele. However, they have only one mutant allele of Spar, and it doesn't appear as if this mutant was outcrossed, making it very difficult to rule out off-target effects. To make this data convincing, it would be better if the authors had a second allele, perhaps they could try RNAi?

      Further, the sleep and circadian characterisation could be substantially improved. In Fig 8 E-F it appears as if sleep was averaged over 30 days! This is a little bizarre. They then bin the data as day 1 - 12 and 12-30. This is not terribly helpful either. Sleep in flies, as in humans, undergoes ontogenetic changes - sleep is high in young flies, stabilises between day 3-12, and shows defects by around 3 weeks of age (cf Shaw et al., 2000 PMID 10710313). The standard in the sleep field is to average over 3 days or show one representative day. The authors should reanalyse their data as per this standard, and perhaps show data from 310 day old flies, and if they like from 20-30 day old flies. Further, sleep data is usually analysed and presented from lights on to lights on. This allows one to quantify important metrics of sleep consolidation including bout lengths in day and night, and sleep latency. These metrics are of great interest to the community and should be included.

      The authors also claim there are defects in circadian anticipatory activity. However, these data, as presented are not solid to me. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). Further, circadian period could also be evaluated. There are several free software packages to perform these analyses so it should not be hard to do.

      Authors response: We thank the reviewer for the thorough reading of our manuscript and for generously praising the positives as well as pointing out the weakness of our study. We have now addressed the highlighted weaknesses in behavioural experiments. In particular, we have reanalysed our data according to the reviewer’s suggestions. In addition, we provide experimental data, driving Spar RNAi in Clock neurons, that support our Spar mutant analysis.

      Point-by-point response to the reviewers’ concerns:

      Point 1. “My main concerns were around the genetics and behavioural characterisation which is incomplete. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele. However, they have only one mutant allele of Spar, and it doesn't appear as if this mutant was outcrossed, making it very difficult to rule out off-target effects. To make this data convincing, it would be better if the authors had a second allele, perhaps they could try RNAi?”

      Authors response: As per the reviewer's suggestion, we conducted a targeted knockdown of Sparkly specifically in clock neurons (Clk-Gal4 > Spar-RNAi) and assessed the circadian phenotypes. Flies were monitored for 5 days in LD followed by a shift to DD, similar to our previous LD-DD experiments. The results revealed a significant disruption in both activity and sleep during the DD transition period upon knockdown of Spar in circadian clock neurons. These findings strongly align with the expression pattern of Spar in clock neurons (Figure 7i-l’’). We have now included a new main figure (Figure 9) together with several supplementary figure (Figure 9 – figure supplements 1 and 2) and discussed these experiments on pages 17-18 of the results section of the revised manuscript.

      Point 2. “Further, the sleep and circadian characterisation could be substantially improved. In Fig 8 E-F it appears as if sleep was averaged over 30 days! This is a little bizarre. They then bin the data as day 1 - 12 and 12-30. This is not terribly helpful either. Sleep in flies, as in humans, undergoes ontogenetic changes - sleep is high in young flies, stabilises between day 3-12, and shows defects by around 3 weeks of age (cf Shaw et al., 2000 PMID 10710313). The standard in the sleep field is to average over 3 days or show one representative day. The authors should reanalyse their data as per this standard, and perhaps show data from 3–10-day old flies, and if they like from 20–30-day old flies.”

      Authors response: We have reanalysed these data according to the reviewer's suggestions and revised the sleep data presented. Specifically, we have focused on two 3-day periods, days 5-7 as well as days 20-22. By averaging the sleep mean during these time points, we observed a significant decrease in average sleep duration in the SparΔExon1 and Alk ΔRA mutant flies at a younger age (Figure 8h-h’, Figure 8 – figure supplement 2). However, no significant effect was observed in older flies (Figure 8h-h’, Figure 8 – figure supplement 2). We have incorporated this new data into Figure 8 and provided a detailed description in the results section (page 16) of the revised manuscript.

      Point 3. “Further, sleep data is usually analysed and presented from lights on to lights on. This allows one to quantify important metrics of sleep consolidation including bout lengths in day and night, and sleep latency. These metrics are of great interest to the community and should be included.”

      Authors response: We have now reanalysed these data as per the reviewer's suggestion. From the raw data collected over a span of 3 days, we specifically selected the lights on-lights on data and examined the average sleep duration. Notably, we observed a significant downregulation of average sleep in SparΔExon1 and AlkΔRA flies, but only at a younger age (Figure 8h-h’, Figure 8 – figure supplement 2). Furthermore, we assessed the number of sleep bouts using this data and found a significant increase in the number of bouts in younger SparΔExon1 and AlkΔRA flies, with no changes observed at an older age (Figure 8 – figure supplement 2). Additionally, we evaluated the number of bouts in flies that were initially monitored in LD and then shifted to DD, observing a significant decrease in the number of sleep bouts in SparΔExon1 flies following the transition to DD (Figure 9d). This new data is described in detail in the results section (pages 16-18) of the revised manuscript.

      Point 4. “The authors also claim there are defects in circadian anticipatory activity. However, these data, as presented are not solid to me. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827).”

      Authors response: We appreciate the valuable suggestion provided by the reviewer. In accordance with the referenced paper by Harrisingh et al. (2007), we calculated the "anticipation score" defined as the percentage of activity in the 6hour period preceding the lights-on or lights-off transition that occurs in the 3-hour window just before the transition. To analyse the mean activity of the flies, we selected the data corresponding to the 6 hours before lights-on and the 6 hours before lights-off, averaged over a 14-day period under normal LD conditions. Interestingly, we observed a significant increase in the mean activity of SparΔExon1 flies during both morning anticipation (a.m. anticipation) and evening anticipation (p.m. anticipation) (Figures 8f). Furthermore, we analysed this parameter for flies entrained in DD and found that SparΔExon1 flies exhibited lower mean activity during both morning and evening anticipation (Figures 8g). We have incorporated this new data into Figure 8 and provided a detailed description in the results section (pages 16-18) of the revised manuscript.

      Point 5. Further, circadian period could also be evaluated. There are several free software packages to perform these analyses so it should not be hard to do.

      Authors response: We have now evaluated the circadian period as suggested by the reviewer; generating a chi-square periodogram for each fly to calculate the free-running period for the flies that were under normal LD conditions additionally to the ones that were entrained in DD. We calculated the percentage of flies that had a shorter or longer period than 1440 min (24 h) and observed that w1118 and SparΔExon1 flies have a longer circadian period (Figure 8 – figure supplement 4) but following the shift to DD, they tend to have a shorter circadian period (Figure 9 – figure supplement 3). This new data is described in the results (pages 16-18).

      Recommendations for the authors:

      There are two major concerns that we recommend the authors address:

      1) The behaviour: There are a number of unconventional representations of the behavioural data in this manuscript. We recommend that the authors revisit their data representation to adhere to conventions in the field - specific suggestions are in the reviews. We also suggest an additional experiment - an RNAi/different allele/rescue experiment to ensure that the phenotypes the authors observe are not due to off-target effects of the mutant they have generated.

      Authors response: In the revised manuscript, we have reanalysed the behavioural data according to the reviewers’ recommendations (included in Figures 8 and 9 of the revised version). In addition, we have performed a targeted Spar RNAi experiment in clock neurons (included in Figure 9 of the revised version), identifying a hyperactive behavioural phenotype similar to that of Spar mutants. The inclusion of these new analyses and data strengthens the manuscript and support the conclusion that Spar plays a role in regulation of behaviour.

      2) TaDa analyses: We were concerned that the authors might be picking up false positives with the way they have analysed their data. While this may not matter for this study, it will be useful to reason out their approach and keep this in mind for any other targets they choose from these data for further studies.

      Authors response: In line with the reviewers concerns we have now highlighted the potential caveats and drawbacks of our TaDa dataset in the discussion section of the revised manuscript (detailed in response to Reviewer #2 below).

      Reviewer #1 (Recommendations For The Authors):

      Though generally well written, I felt that some sections could be written in more detail. For example, the text around Figure 5 was not very informative. Many of the other approaches to the analyses and details of datasets used were glossed over. Since the manuscript uses a lot of previously published data, it would be nice to give more details about them in the context of the results.

      Authors response: We thank the reviewer for this recommendation. We have now added additional information about peptidomics analysis in the results and in the legend of Figure 5. We have also included a table in the Methods that summarised the datasets used in this study, including the Dataset name, brief description and reference.

      In the panels where co-localisations have been represented, it would be nice to include enlarged insets depicting the co-labelling. It is not always obvious in the way the figures have currently been represented. For example, in Fig 2G, Alk stain appears to be everywhere, but the authors make the point that it is enriched in neuroendocrine cells (as labelled by dimmed), but the co-localisation isn't evident. Similar issues come up with the sparkly colocalisations.

      Authors response: As suggested by the reviewer, we have now added additional panels to complement the stainings in Figure 2G. These new data are included as Figure 2 – figure supplement 1 (Alk/Dimm-Gal4>UAS-GFPcaax staining) and as Figure 4 – figure supplement 1 (Alk/Spar staining), which indicate colocalization in the central brain and ventral nerve cord prosecretory cells with enlarged panels.

      Supplementary figures S3C and 3F appear garbled to me? Maybe it didn't upload properly?

      Authors response: Unfortunately, this issue is not apparent to us. However, we have now re-uploaded these Figures.

      Sparkly's responsiveness to Alk signalling: Visually, there does not seem to be an increase or decrease in spar levels in the images in Fig 4F-H. How was the quantification done? I would suggest a more detailed interpretation of their results related to spar's responsiveness to Alk signalling - at the mRNA vs protein levels and the GOF vs LOF conditions.

      Authors response: We thank the reviewer for this constructive recommendation. In the revised manuscript, we have now repeated this experiment with increased numbers of larval CNS followed by blinded image analysis. These results also show an increased fluorescence intensity as measured by corrected total cell fluorescence (CTCF), confirming our previous observation of increased Spar protein expression in in Alk gain-of-function conditions compared to controls. In this analysis, changed in Spar levels in Alk loss-of-function remained non-significant compared to control, in agreement with our previous data. As suggested by the reviewer, we have now included several additional sentences discussing the possible reasons for these observations. This following text is now included on Page 11 of the results section:

      “While our bulk RNA-seq and TaDa datasets show a reduction in Spar transcript levels in Alk loss-of-function conditions, this reduction is not reflected at the protein level. This observation may reflect additional uncharacterised pathways that regulate Spar mRNA levels as well as translation and protein stability. Taken together, these observations confirm that Spar expression is responsive to Alk signaling in CNS, although Alk is not critically required to maintain Spar protein levels.” We have also added an additional Image analysis method section explaining the methodology of the CTCF fluorescent intensity quantification on Page 28.

      Reviewer #2 (Recommendations For The Authors):

      It was surprising to see that the authors did not use Dam-only controls. This is to control for background methylation by Dam (i.e. accessible chromatin). This does not invalidate the main results of the manuscript, however, there could be false positives in the dataset for genes that are seen to be up-regulated in the mutant condition (e.g. if accessibility is increased in the mutant but not transcription, then it would look like increased Pol II binding, when it isn't). As the study was focusing on genes down-regulated in the mutant, this is less of an issue, as it is very unlikely to see an increase in transcription with a decrease in accessibility (that could provide a false positive). The authors should explain their rationale for not using Dam-only controls, and the associated caveats, in the manuscript.

      Authors response: We agree with the reviewer’s comment on possibility of identifying false positive candidates from our TaDa dataset. Especially, if one is seeking to find a gene with increased Pol II occupancy in a Alk dominant negative condition. However, our analysis only focuses on genes which are responsive to Alk-manipulation, namely, genes which are downregulated in the Alk dominant negative condition. One of the rationales for not using a Dam-only control was that in our previous Mendoza-Garcia et al, 2021 study, we employed a similar method and were able to successfully identify already known and novel targets of Alk signalling in embryonic mesoderm comparing the Dam-Pol II versus Dam-Pol II; Alk Dominant negative conditions. In the current version of the manuscript, we have expanded our discussion of these caveats as follows (Discussion, Page 19-20):

      “A potential drawback of our TaDa dataset is the identification of false positives, due to non-specific methylation of GATC sites at accessible regions in the genome by Dam protein. Hence, our experimental approach likely more reliably identifies candidates which are downregulated upon Alk inhibition. In our analysis, we have limited this drawback by focusing on genes downregulated upon Alk inhibition and integrating our analysis with additional datasets, followed by experimental validation. This approach is supported by the identification of numerous previously iden- tied Alk targets in our TaDa candidate list.”

      Related to this, could the authors make it clear/justify why they chose to use peakbased analysis of the Dam-Pol II data rather than looking at signals across whole transcripts? For example, this could result in false positives if a gene switches from having no Pol II to having paused Pol II.

      Authors response: In our opinion, a peak based analysis is dependable in this context. We chose to prioritize peaks close (+/- 1kb) to transcription start sites (TSS) to increase the chances of finding true Pol II occupancy peaks. Also, during bioinformatics analysis using Damid-seq pipeline (Maksimov et al, 2016) fragments not aligning to GATC borders are excluded. Therefore, a whole transcript Pol II occupancy peak analysis may not be always feasible. We agree with the reviewer that a paused Pol II will result in false positives, however, it will only result in an increase of a specific peak and in our case, we are seeking to identify peaks with lower pol II occupancy as a result of Alk knockdown. Furthermore, we depend on additional integration with additional relevant datasets to minimise false positive candidates for detailed analysis. In the current version of the manuscript these caveats have been mentioned and discussed (see point above).

      Do the authors have any theories about the mode of action of Spar? Or ideas about how this might be followed up? If so, that could be included in the Discussion.

      Authors response: Other than identifying modified Spar derived peptides, which suggest a target receptor, possibly a GPCR, were have no other data currently that allows us to speculate more on the mode of action of Spar. We are currently working hard to try to identify a receptor, but this is a challenging and ongoing process. In the discussion we speculate regarding the identity of the Spar receptor, as well as its location, which is likely in the CNS, and body muscle, however, these are open questions that we can hopefully answer in a future study.

      Reviewer #3 (Recommendations For The Authors):

      Spar protein expression was unchanged in Alk loss of function. This is a curious result as the authors used RNA seq data from Alk loss of function to identify Spar. This could be commented on in the discussion.

      Authors response: We thank the reviewer for this comment, and they are correct in noticing this. We have also thought about this, and reviewer #1 also commented. To confirm this result, we repeated this experiment with increased numbers of larval CNS followed by blinded image analysis for the revised version. These results also show an increased fluorescence intensity as measured by corrected total cell fluorescence (CTCF), confirming our previous observation of increased Spar protein expression in in Alk gain-of-function conditions compared to controls. In this analysis, changed in Spar levels in Alk loss-of-function remained non-significant compared to control, in agreement with our previous data. As suggested by reviewer #1, we have now included several additional sentences discussing the possible reasons for these observations. This following text is now included on Page 11 of the results section:

      “While our bulk RNA-seq and TaDa datasets show a reduction in Spar transcript levels in Alk loss-of-function conditions, this reduction is not reflected at the protein level. This observation may reflect additional uncharacterised pathways that regulate Spar mRNA levels as well as translation and protein stability. Taken together, these observations confirm that Spar expression is responsive to Alk signaling in CNS, although Alk is not critically required to maintain Spar protein levels.”

      Pg 19: Spar is expressed in the Mushroom Bodies (MBs). Do they mean in Kenyon Cells (KCs)? I don't see this expression in the figures. Maybe this could be highlighted in the figure. It would definitely be of interest if this were true.

      Authors response: We agree with the reviewer that this would be interesting. We have not performed detailed staining of the mushroom bodies at this point, however, Spar mRNA expression in a transcriptomics analysis performed by Crocker et al, 2016, identifies Spar in all cell types, including Kenyon cells. We have now included this and cited this reference in the discussion.

      Spar is also expressed in multiple potential sleep regulatory sites including clock neurons, the PI, AstA cells and so on. Some of these might be arousal-promoting and some sleep-promoting. Taking out Spar in both sleep and arousal-promoting subsets might have complex effects. The authors might want to knock down Alk in different subsets of neurons to make more targeted manipulations.

      Authors response: We thank the reviewer for this suggestion regarding interesting experiments to further investigate Spar function. We are planning to follow up and study the role of Alk signalling in different neuronal subsets, with a specific interest in neuroendocrine/prosecretory cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer No.1 (public)

      The authors present a study focused on addressing the key challenge in drug discovery, which is the optimization of absorption and affinity properties of small molecules through in silico methods. They propose active learning as a strategy for optimizing these properties and describe the development of two novel active learning batch selection methods. The methods are tested on various public datasets with different optimization goals and sizes, and new affinity datasets are curated to provide up-todate experimental information. The authors claim that their active learning methods outperform existing batch selection methods, potentially reducing the number of experiments required to achieve the same model performance. They also emphasize the general applicability of their methods, including compatibility with popular packages like DeepChem.

      Strengths:

      Relevance and Importance: The study addresses a significant challenge in the field of drug discovery, highlighting the importance of optimizing the absorption and affinity properties of small molecules through in silico methods. This topic is of great interest to researchers and pharmaceutical industries.

      Novelty: The development of two novel active learning batch selection methods is a commendable contribution. The study also adds value by curating new affinity datasets that provide chronological information on state-of-the-art experimental strategies.

      Comprehensive Evaluation: Testing the proposed methods on multiple public datasets with varying optimization goals and sizes enhances the credibility and generalizability of the findings. The focus on comparing the performance of the new methods against existing batch selection methods further strengthens the evaluation.

      Weaknesses:

      Lack of Technical Details: The feedback lacks specific technical details regarding the developed active learning batch selection methods. Information such as the underlying algorithms, implementation specifics, and key design choices should be provided to enable readers to understand and evaluate the methods thoroughly.

      Evaluation Metrics: The feedback does not mention the specific evaluation metrics used to assess the performance of the proposed methods. The authors should clarify the criteria employed to compare their methods against existing batch selection methods and demonstrate the statistical significance of the observed improvements.

      Reproducibility: While the authors claim that their methods can be used with any package, including DeepChem, no mention is made of providing the necessary code or resources to reproduce the experiments. Including code repositories or detailed instructions would enhance the reproducibility and practical utility of the study.

      Suggestion 1:

      Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

      Answer: We thank the reviewer for this suggestion. Following this comments we have extended the text in Methods (in Section: Batch selection via determinant maximization and Section: Approximation of the posterior distribution) and in Supporting Methods (Section: Toy example). We have also included the pseudo code for the Batch optimization method.

      Suggestion 2:

      Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

      Answer: Following this comment we added to Table 1 details about the way we computed the cutoff times for the different methods. We also provide more details on the statistics we performed to determine the significance of these differences.

      Suggestion 3:

      Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

      Answer: This is something we already included with the original submission. The code is publicly available. In fact, we provide a phyton library, ALIEN (Active Learning in data Exploration) which is published on the Sanofi Github(https://github.com/ Sanofi-Public/Alien). We also provide details on the public data used and expect to provide the internal data as well. We included a small paragraph on code and data availability.

      Reviewer No.2 (public)

      Suggestion 1:

      The authors presented a well-written manuscript describing the comparison of activelearning methods with state-of-art methods for several datasets of pharmaceutical interest. This is a very important topic since active learning is similar to a cyclic drug design campaign such as testing compounds followed by designing new ones which could be used to further tests and a new design cycle and so on. The experimental design is comprehensive and adequate for proposed comparisons. However, I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

      Answer: We want to thank the reviewer for these comments. We provide a detailed response to the specific comments below. 

      Reviewer No.1 (Recommendations For The Authors)

      Recommendation 1:

      The description provided regarding the data collection process and the benchmark datasets used in the study raises some concerns. The comment specifically addresses the use of both private (Sanofi-owned) and public datasets to benchmark the various batch selection methods. Lack of Transparency: The comment lacks transparency regarding the specific sources and origins of the private datasets. It would be crucial to disclose whether these datasets were obtained from external sources or if they were generated internally within Sanofi. Without this information, it becomes difficult to assess the potential biases or conflicts of interest associated with the data.

      Answer: We would like to thank the reviewer for this comment. As mentioned in the paper, the public github page contains links to all the public data and we expect also to the internal Sanofi data. We also now provide more information on the specific experiments that were internally done by Sanofi to collect that data.

      Potential Data Accessibility Issues: The utilization of private datasets, particularly those owned by Sanofi, may raise concerns about data accessibility. The lack of availability of these datasets to the wider scientific community may limit the ability of other researchers to replicate and validate the study’s findings. It is essential to ensure that the data used in research is openly accessible to foster transparency and encourage collaboration.

      Answer: Again, as stated above we expect to release the data collected internally on the github page.

      Limited Information on Dataset Properties: The comment briefly mentions that the benchmark datasets cover properties related to absorption, distribution, pharmacokinetic processes, and affinity of small drug molecules to target proteins. However, it does not provide any specific details about the properties included in the datasets or how they were curated. Providing more comprehensive information about the properties covered and the methods used for curation would enhance the transparency and reliability of the study.

      To address these concerns, it is crucial for the authors to provide more detailed information about the data sources, dataset composition, representativeness, and curation methods employed. Transparency and accessibility of data are fundamental principles in scientific research, and addressing these issues will strengthen the credibility and impact of the study.

      Answer: We agree with this comment and believe that it is important to be explicit about each of the datasets and to provide information on the new data. We note that we already discuss the details of each of the experiments in Methods and, of course, provide links to the original papers for the public data. We have now added text to Supporting Methods that describes the experiments in more details as well as providing literature references for the experimental protocols used. As noted above, we expect to provide our new internal data on the public git page. 

      Recommendation 2:

      Some comments on the modeling example Approximation of the posterior distribution. Lack of Methodological Transparency: The comment fails to provide any information regarding the specific method or approach used for approximating the posterior distribution. Without understanding the methodology employed, it is impossible to evaluate the quality or rigor of the approximation. This lack of transparency undermines the credibility of the study.

      Answer: We want to thank the reviewer for pointing this out. Based on this comment we added more information to Section: Approximation of the posterior distribution. Moreover, we now provide details on the posterior approximation in Section: Two approximations for computing the epistemic covariance.

      Questionable Assumptions: The comment does not mention any of the assumptions made during the approximation process. The validity of any approximation heavily depends on the underlying assumptions, and their omission suggests a lack of thorough analysis. Failing to acknowledge these assumptions leaves room for doubt regarding the accuracy and relevance of the approximation.

      Answer: We are not entirely sure which assumptions the reviewer is referring to here. The main assumption we can think of that we have used is the fact that getting within X% of the optimal model is a good enough approximation. We have specifically discussed this assumption and tested multiple values of X. While it would have been great to have X = 0 this is unrealistic for retrospective studies. For Active Learning the main question is how many experiments can be saved to obtain similar results and the assumptions we used are basically ’what is the definition of similar’. We now added this to Discussion.

      Inadequate Validation: There is no mention of any validation measures or techniques used to assess the accuracy and reliability of the approximated posterior distribution. Without proper validation, it is impossible to determine whether the approximation provides a reasonable representation of the true posterior. The absence of validation raises concerns about the potential biases or errors introduced by the approximation process.

      Answer: We sincerely appreciate your concern regarding the validation of the approximated posterior distribution. We acknowledge that our initial submission might not have clearly highlighted our validation strategy. It is, of course, very hard to determine the accuracy of the distribution our model learns since such distribution cannot be directly inferred using experiments (no ’ground truth’). Instead, we use an indirect method to determine the accuracy. Specifically, we conducted retrospective experiment using the learned distribution. In these experiments, we indirectly validated our approximation by measuring the error with the respective method. The results from these retrospective experiments provided evidence for the accuracy and reliability of our approximation in representing the true posterior distribution. We now emphasize this in Methods.

      Uncertainty Quantification: The comment does not discuss the quantification of uncertainty associated with the approximated posterior distribution. Properly characterizing the uncertainty is crucial in statistical inference and decision-making. Neglecting this aspect undermines the usefulness and applicability of the approximation results.

      Answer: Thank you for pointing out the importance of characterizing uncertainty in statistical inference and decision-making, a sentiment with which we wholeheartedly agree. In our work, we have indeed addressed the quantification of uncertainty associated with the approximated posterior distribution. Specifically, we utilized Monte Carlo Dropout (MC Dropout) as our method of choice. MC Dropout is a widely recognized and employed technique in the neural networks domain to approximate the posterior distribution, and it offers an efficient way to estimate model uncertainty without requiring any changes to the existing network architecture [1, 2]. In the revised version, we provide a more detailed discussion on the use of Monte Carlo Dropout in our methodology and its implications for characterizing uncertainty.

      Comparison with Gold Standard: There is no mention of comparing the approximated posterior distribution with a gold standard or benchmark. Failing to provide such a comparison leaves doubts about the performance and accuracy of the approximation method. A lack of benchmarking makes it difficult to ascertain the superiority or inferiority of the approximation technique employed.

      Answer: As noted above, it is impossible to find gold standard information for the uncertainly distribution. It is not even clear to us how such gold standard can be experimentally determined since its a function of a specific model and data. If the reviewer is aware of such gold standard we would be happy to test it. Instead, in our study, we opted to benchmark our results against state-of-the-art batch active learning methods, which also rely on uncertainty prediction (such uncertainty prediction is the heart of any active learning method as we discuss). Results clearly indicate that our method outperforms prior methods though we agree that this is only an indirect way to validate the uncertainty approximation.

      Reviewer No.2 (Recommendations For The Authors)

      Recommendation 1:

      The text is kind of messy: there are two results sections, for example. It seems that part of the text was duplicated. Please correct it.

      Answer: We want to thank the reviewer pointing this out. These were typos and we fixed them accordingly.

      Recommendation 2:

      Text in figures is very small and difficult to read. Please redraw the figures, increasing the font size: 10-12pt is ideal in comparison with the main text.

      Answer: We want to thank the reviewer for this comment and we have made the graphics larger.

      Recommendation 3: Please, include specific links to data availability instead of just stating it is available at the Sanofi-Public repository.

      Answer: We want to thank the reviewer for this comment and added the links and data to the Sanofi Github page listed in the paper.

      Recommendation 4:

      What are the descriptors used to train the models?

      Answer: We represented the molecules as molecular graphs using the MolGraphConvFeaturizer from the DeepChem library. We now explicitly mention this in Methods.

      Recommendation 5:

      Regarding the quality of the models, I strongly suggest two approaches instead of using only RMSE as metrics of models’ performance. I recommend using the most metrics as possible as reported by Gramatica (https://doi.org/10.1021/acs.jcim.6b00088). I also recommend somehow comparing the increment on the dataset diversity according to the employed descriptors (applicability domain) as a measurement to further applications on the unseen molecules.

      Answer: We want to thank the reviewer for this great suggestions. As suggested we added new comparison metrics to the Supplement.

      • Distribution plot for the range of the Y values Figure 8 • Clustering of the data sets represented as fingerprints Supplementary material Figure 5,6

      • Retrospective experiments with Spearman correlation coefficient. Supplementary material Figure: 2,3,4

      I suggest also a better characterization of datasets including the nature and range of the Y variable, the source of data in terms of experimentation, and chemical (structural and physicochemical) comparison of samples within each dataset.

      Answer: As noted above in response to a similar comment by Reviewer 1, we have added more detailed information about the different experiments we tested to Supporting Methods.

      References

      [1] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.

      [2] N.D. Lawrence. Variational Inference in Probabilistic Models. University of Cambridge, 2001.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their work, and the very useful comments.

      Public reviews:

      Reviewer #2

      1) The authors discussed possible reasons for the different results of the RRP sizes between this study and Alten et al., 2021. One of them is how the hypertonic solution is applied. The authors thought that the long application of hypertonic solution in Alten et al., 2021 caused an overlapping release of RRP and upstream vesicle pools because Alten et al., 2021 measured 10-fold larger RRP size than what was measured in this study. However, Alten et al., 2021 measured RRP from IPSCs and a single inhibitory vesicle fusion causes larger charge transfer than an excitatory vesicle. The authors need to take this into consideration and 10-fold is likely an overestimate.

      Answer: Thank you for pointing out this important difference. We have modified the text in the Discussion accordingly and we no longer refer to the 10-fold difference.

      2) Statistical tests should be performed for protein expression levels (Fig 2A and Fig 10A) and in vitro fusion assays (Fig 8D,E and Fig 9 B,C).

      Answer: We inserted new panels B and C in Fig. 2 and Fig. 10 showing all the Western Blot data and performed statistical tests (none were significant). For the in vitro fusion assays, we have inserted statistical tests in panels 8E and 9C. The quantities in those panels (subdivided into “Pre Ca2+”, “post Ca2+” and “end fusion”) are based on the data in Figure 8D and 9B. We have therefore not inserted separate statistical tests in Figures 8D and 9B.

      Reviewer #1 (Recommendations For The Authors):

      It would be quite interesting for future studies to address how these three mutations in SNAP-25 behave in the Syt1 null background in their electrophysiological experiments. Does the I167N allele block the enhanced spontaneous release in the Syt1 null? Do the V48F and D1667 alleles synergize with Syt1 to enhance spontaneous release to even higher levels? By examining how different components interact to shape the energy landscape for priming and fusion, these types of approaches should be quite revealing.

      Answer: We agree with the reviewer that these future studies would be interesting. Unfortunately, they are beyond our current capacities.

      Reviewer #2 (Recommendations For The Authors):

      1) In the introduction, when discussing haploinsufficiency of Munc18-1 causes a decrease in release, additional references should be included, for example, the studies in flies (Wu et al., 1998, EMBO), human neurons (Patzke et al., 2015 JCI), and mouse neurons (Toonen et al., 2006 PNAS; Chen et al., 2020 eLife).

      Answer: Thank you for the suggestion. We have rewritten the text and added additional references.

      2) The authors may consider introducing additional motivations and significance of this study. For example, the evoked EPSCs cannot be properly measured in the cultures of Alten et al., 2021, but was properly studied here.

      Answer: We agree and have added additional motivations in the Introduction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weinberger et al. use different fate-mapping models, the FIRE model and PLX-diet to follow and target different macrophage populations and combine them with single-cell data to understand their contribution to heart regeneration after I/R injury. This question has already been addressed by other groups in the field using different models. However, the major strength of this manuscript is the usage of the FIRE mouse model that, for the first time, allows specific targeting of only fetal-derived macrophages. The data show that the absence of resident macrophages is not influencing infarct size but instead is altering the immune cell crosstalk in response to injury, which is in line with the current idea in the field that macrophages of different origins have distinct functions in tissues, especially after an injury. To fully support the claims of the study, specific targeting of monocyte-derived macrophages or the inhibition of their influx at different stages after injury would be of high interest. In summary, the study is well done and important for the field of cardiac injury. But it also provides a novel model (FIRE mice + RANK-Cre fate-mapping) for other tissues to study the function of fetal-derived macrophages while monocyte-derived macrophages remain intact.

      Response from the authors: We thank the reviewer for the thorough review and the positive feedback, and we agree that the Csf1r-FIRE mice represent an interesting model for studying the role of resident embryo-derived macrophages in different tissues and pathologies.

      Recent work of the Cochain lab demonstrated by combined CITE-seq analysis and CCR2 antibody treatment that monocyte depletion does not affect levels of resident tissue macrophages after myocardial infarction (REF Rizzo et al PMID: 35950218), supporting the concept to specifically investigate the role of resident and recruited macrophages. While previous work has addressed the effects of broad CCR2-mediated monocyte depletion, information on differential macrophage subsets derived from blood monocytes has been lacking. We agree with the reviewer that targeting subsets of monocyte-derived macrophages, such as for example Ly6Chi monocytes, MHCII+Il1b+ macrophages, and Isg15hi populations (REF Rizzo et al PMID: 35950218), or interference with their recruitment at different time-points after myocardial infarction would be of interest and could help to decipher their functions in the different stages of cardiac healing. However, these studies would go beyond the scope of the current analysis and will be addressed in a separate project.

      Reviewer #2 (Public Review):

      In this study Weinberger et al. investigated cardiac macrophage subsets after ischemia/reperfusion (I/R) injury in mice. The authors studied a ∆FIRE mouse model (deletion of a regulatory element in the Csf1r locus), in which only tissue resident macrophages might be ablated. The authors showed a reduction of resident macrophages in ∆FIRE mice and characterized its macrophages populations via scRNAseq at baseline conditions and after I/R injury. 2 days after I/R protocol ∆FIRE mice showed an enhanced pro inflammatory phenotype in the RNAseq data and differential effects on echocardiographic function 6 and 30 days after I/R injury. Via flow cytometry and histology the authors confirmed existing evidence of increased bone marrow-derived macrophage infiltration to the heart, specifically to the ischemic myocardium. Macrophage population in ∆FIRE mice after I/R injury were only changed in the remote zone. Further RNAseq data on resident or recruited macrophages showed transcriptional differences between both cell types in terms of homeostasis-related genes and inflammation. Depleting all macrophage using a Csf1r inhibitor resulted in a reduced cardiac function and increased fibrosis.

      Strengths

      1) The authors utilized robust methodology encompassing state of the art immunological methods, different genetic mouse models and transcriptomics.

      2) The topic of this work is important given the emerging role of tissue resident macrophages in cardiac homeostasis and disease.

      Response from the authors: We thank the reviewer for pointing out the strengths of our study, and putting the findings in context of the current view of the role of resident macrophages.

      Weaknesses:

      1) Specificity of ∆FIRE mouse model for ablating resident macrophages.

      The study builds on the assumption that only resident macrophages are ablated in ∆FIRE mice, while bone marrow-derived macrophages are unaffected. While the effects of the ∆FIRE model is nicely shown for resident macrophages, the authors did not directly assess bone marrow-derived macrophages. Moreover, in the immunohistological images in Fig. 1D nearly all macrophages appear to be absent. It would be helpful to further address the question of whether recruited macrophages are influenced in ∆FIRE mice. Evaluation of YFP positive heart and blood cells in ∆FIRE mice crossed with Flt3CreRosa26eYFP mice could clarify whether bone marrow-derived cardiac macrophages are influenced in ∆FIRE mice. This would be even more relevant in the I/R model where recruitment of bone marrow-derived macrophages is increased. A more direct assessment of recruited macrophages in ∆FIRE mice could also help to discuss potential similarities or discrepancies to the study of Bajpai et al, Circ Res 2018, which showed distinct effects of resident versus recruited macrophages after myocardial infarction. Providing the quantification of flow cytometry data (fig. 1E-F) would be supportive.

      Response from the authors: We thank the reviewer for these comments. The reviewer addresses the specificity of the ∆FIRE mouse model for ablating resident macrophages and its potential effects on bone marrow-derived macrophages. Our single-cell sequencing data support the specificity of the ∆FIRE model regarding embryo-derived resident macrophages in two ways. First, the ∆FIRE mice are characterized by the specific reduction of embryo-derived macrophage clusters (e.g. homeostatic macrophages as well as antigen-presenting macrophages) in baseline conditions, while the abundance of recruited macrophages (e.g. Ccr2hiLy6chi macrophages, Cx3Cr1hi macrophages) is not altered (Fig. 2B-D). Second, transcriptomic analysis of bone marrow-derived macrophage clusters (e.g. Ccr2hiLy6chi macrophages, Cx3Cr1hi macrophages) and of monocytes revealed no differences in ∆FIRE compared to control mice. On the other hand, we found substantial transcriptome differences in clusters that were mainly of embryonic origins (e.g. homeostatic macrophages as well as antigenpresenting macrophages) (Fig.2 and Fig S.4). These findings indicate that the ∆FIRE model mainly induces changes in embryo-derived macrophages.

      We agree with this reviewer that crossbreeding of ∆FIRE mice with Flt3CreRosa26eYFP mice would be of interest, and we have been working hard to establish this line. However, our breeding efforts have thus far been in vain, which is probably due to the necessity to keep a CBA/Ca background for the FIRE model (as reported by JAX: https://www.jax.org/strain/032783) and requires further backcrossing of Flt3CreRosa26eYFP mice with the respective CBA strain. In future work, we plan to carry out this experiment and also to specifically target monocyte-derived macrophages.

      The reviewer further asks about the modality to quantify cardiac macrophages, and suggests flow cytometry to quantify their number and not only use immunohistology. The quantification of cardiac immune cells shown in Fig. 1D (formerly 1C) was in fact performed by flow cytometry. We apologize for the lack of clarity. We rearranged the figure and added this information to the figure legend. We also added quantification by immunohistology, which is now shown in Fig. 1G.

      2) Limited adverse cardiac remodeling in ∆FIRE mice after I/R.

      The authors suggested an adverse cardiac remodeling in ∆FIRE mice. However, the relevance of a <5% reduction in ejection fraction/stroke volume within an overall normal range in ∆FIRE mice is questionable. Moreover, 6 days after I/R injury ∆FIRE mice were protected from the impairment in ejection fraction and had a smaller viability defect. Based on the data few questions may arise: Why was ablation of resident macrophages beneficial at earlier time points? Are recruited macrophages affected in ∆FIRE mice (see above)? Overall, the manuscript could benefit if the claim of an adverse remodeling in ∆FIRE mice would be discussed more carefully.

      Underlying mechanisms:

      The study did not functionally evaluated targets from transcriptomics to provide further mechanistic insights. It would be helpful if the authors discuss potential mechanisms of the differential effects of macrophages after ischemia in more detail.

      Response from the authors: The reviewer raises the question why the ablation of resident macrophages trends towards a beneficial effect at earlier time points after I/R injury. Further, the reviewer questions the relevance of a <5% reduction in ejection fraction/stroke volume over time in the light of an otherwise modestly reduced ejection fraction.

      In this study we used the experimental mouse model of ischemia-reperfusion injury with transient (1h) coronary artery occlusion. The potential disadvantage of this model is the smaller infarct size and smaller effects on cardiac function. However, it better represents the clinical picture and pathology of myocardial infarction in human patients with timely reperfusion by percutaneous coronary intervention. Infarct size after I/R was approx. 25% in control animals indicating relevant cardiac injury. Further, infarct size was reduced to approx. 16% in ∆FIRE mice 6 days after infarction, however, the difference did reach statistical significance. In line with this, the ejection fraction was numerically reduced on d6 after infarction in the control group, however with no statistical significance. In the chronic phase after infarction, the ejection fraction improved over time in the control group by approx. 5% and decreased in ∆FIRE mice by 4%, which resulted in a difference (delta) of 9% change of ejection fraction. This indicated adverse remodeling in ∆FIRE mice.

      We agree that the different impact of the absence of resident cardiac macrophages during the course of myocardial healing after injury is of great interest to the field. We discuss potential mechanisms of the differential effects of resident macrophage ablation in lines 290-314 in the revised manuscript. However, to decipher the influence of embryo-derived macrophages at different time points after infarction, an inducible model for specific depletion of this macrophage population would be necessary, which to our knowledge does not exist.

      In the revised manuscript, we now discuss the effects on cardiac healing in ∆FIRE and also the limitations more thoroughly.

      Other:

      • It is unclear why the authors performed RNAseq experiments 2 days after I/R (fig. 5/6), while the proposed functional phenotype occurred later. - A sample size of 2 animals per group appears very limited for RNAseq in ∆FIRE mice (fig.6).

      Response from the authors: We chose a time point in the “late early phase” of myocardial infarction (= day 2 post I/R) as we were also interested in the effect of resident macrophage depletion on other immune cell subsets (e.g. neutrophils) which could only be captured in this time period.

      We aimed to analyse 10000 cells per condition. The applied sample size allowed us to analyse 13452 CD45+cells from ∆FIRE mice and 9152 cells from control mice in infarct condition.

      Lines 299-324 "Ablation of resident macrophages altered macrophage crosstalk to non-macrophage immune cells, especially lymphocytes and neutrophils. This was characterized by a proinflammatory gene signature, such as neutrophil expression of inflammasome-related genes and a reduction in anti-inflammatory genes like Chil3 and Lcn2. Interestingly, inflammatory polarization of neutrophils have also been associated with poor outcome after ischemic brain injury (Cuartero et al, 2013). Clinical trials in myocardial infarction patients showed a correlation of inflammatory markers with the extent of myocardial damage {Sanchez, 2006 #2763} and with short- and long-term mortality {Mueller, 2002 #2780}.

      Our study provides evidence that the absence of resident macrophages negatively influences cardiac remodeling in the late postinfarction phase in ∆FIRE mice indicating their biological role in myocardial healing. In the early phase after I/R injury, absence of resident macrophages had no significant effect on infarct size or LV function. These observations potentially indicate a protective role in the chronic phase after myocardial infarction by modulating the inflammatory response, including adjacent immune cells like neutrophils or lymphocytes.

      Deciphering in detail the specific functions of resident macrophages is of considerable interest but requires both cell-specific and temporally-controlled depletion of respective immune cells in injury, which to our knowledge is not available at present. These experiments could be important to tailor immune-targeted treatments of myocardial inflammation and postinfarct remodelling."

      Reviewer #1 (Recommendations For The Authors):

      1) Fetal-derived macrophages are often involved in organ development and function during steady-state. The authors should show heart morphology/function before I/R injury to make sure that the cause for a worsened outcome in FIRE mice is not due to a developmental/functional defect.

      Response from the author: We conducted a gross analysis of cardiac morphology by histology, and did not determine differences to littermate controls. However, we have not conducted a detailed investigation of cardiac development since this was not the scope of this study. Further, our study mainly shows differences in cardiac healing between d6 and d30, which is unlikely influenced by developmental defects.

      2) Line 164: The authors state that they have analysed macrophages via flow cytometry, but Figure 4a only shows IF. Quantification of different macrophage subsets via flow cytometry should be included in this model.

      Response from the author: The sentence “To gain a deeper understanding of the inflammatory processes taking place in the infarcted heart, we quantified macrophage distribution by immunofluorescence and flow cytometry analysis of ischemic and remote areas after I/R.” beginning line 164 describes the entire figure 4 and not only 4a. Here we show IF as well as flow cytometry to describe numbers but also different subpopulations of macrophages (BM-derived vs. resident).

      3) Lines 254-255 (now starting 267): it is not entirely true that the heart does not harbor BM-derived macrophages under steady state. Of course, there are many more after I/R injury, but the authors should take also their own data into account (Figure 1c, e showing a clear reduction but not complete absence of macrophages) and not claim a "scarce" population. See also Dick et al (PMID: 30538339), where both, the Ccr2-Tim4- and Ccr2+ populations are (slowly) replaced by BM monocytes.

      Response from the author: We thank the reviewer for this comment. We changed “scarce population” to “small population”.

      4) Lines 269-273 (now starting line 283): The point that DT-mediated depletion of cells causes inflammation that may have an impact on macrophages is compelling. However, the approach of combining and correlating data from PLX diet and FIRE mice is not proof that the significant increase in infarct size and deterioration of left ventricular function after I/R injury is driven by monocyte-derived macrophages. The authors could use Ccr2KO mice or injection of Ly6C antibody to show the specific functions of recruited macrophages.

      Response from the author: In this study we combine a specific genetic depletion of resident macrophages (FIRE) with an pharmaceutical depletion of all macrophage populations (Csf1r-inhibiton with PLX5622). We did not aim to specifically deplete monocyte-derived macrophages, which has been addressed previously by Bajpai et al. (PMID: 30582448) using the CCR2-DTR mouse line. To address the functions of recruited macrophages would go beyond the scope of the manuscript.

      Along these lines: the authors discuss that neutrophils may have been targeted in the Ccr2-DTR model. However, the egress of neutrophils in the CCR2 KO model is not affected and should be a good model to look at the impact of monocyte-derived macrophages after I/R injury in the heart.

      Response from the author: We agree with the reviewer that CCR2 under steady state conditions might not be important for the egress of neutrophils. However, after ischemic injury CCR2-inhibition has been shown to impair neutrophil egress as well as neutrophil recruitment to ischemic tissue in an ischemia-reperfusion injury model (PMID: 28670376).

      5) Line 299 (now line 332): Reference is missing for Ccr2-DTR mice study

      Response from the author: We added the respective reference.

      6) Can the authors take also the timing of treatment/cell depletion into account in their discussion incoming monocytes may be required in the first days after injury to promote the regeneration process so that targeting them before the onset of the injury may be detrimental while targeting them during the chronic phase may be beneficial.

      Response from the author: We thank the reviewer for this comment. We added the following sentence to the manuscript (Lines 343-346):

      “An explanation of this controversy might be the timing and duration of macrophage depletion. Bajpai et al. depleted recruited macrophages only in the initial phase of myocardial infarction which improved cardiac healing (Bajpai et al., 2019), while depletion of macrophages over a longer period of time, as shown in our study, is detrimental for cardiac repair.”

      7) Figure 6E, F: Why are the outgoing signals pooled? The data has the strength of distinguishing between distinct populations. This data should be used and exploited to work out distinct pathways of distinct macrophage populations in more detail. From the representation, it remains unclear which pathways are active and distinct between Ctrl and FIRE mice besides the few chosen once (inflammasome). Also, legends are missing (what is red/blue?)

      Response from the author: We thank the reviewer for this comment. The aim of this analysis was to evaluate the effect of the FIRE ko on communication of immune cells in infarct conditions. To address changes in all populations which are affected by the FIRE ko we pooled the respective clusters (e.g. homeostatic, antigen-presenting and Ccr2loLy6clo Mø clusters). We provided the detailed analysis of the individual clusters in the new Supplemental Figure 9. Further, we added the respective legend to the Figure.

      8) The methods part mentioned CD169-DTR mice, however, there are no experiments shown in the manuscript. Further, how did the authors breed the FIRE mice? It is known in the field that they have big developmental issues and behavioural deficits if kept on a B6 background, which was likely the case in the study, at least for the fate-mapping approach.

      Response from the author: We removed the CD169-DTR reference from the methods part.<br /> FIRE mice were kept on a CBA/Ca background. As mentioned by the reviewer this was not the case for the experiment where reporter mice were bred with FIRE mice (Csf1rΔFIRE/+RankCreRosa26eYFP) as these mice are on a C57Bl6 background. All experiments evaluating cardiac function and outcome after infarction in FIRE mice were performed on mice kept with a CBA/Ca background.

      Reviewer #2 (Recommendations For The Authors):

      • Please provide the sample size for Fig. 5.

      We described the sample size in the methods part (lines 448-450: “Cell sorting was performed on a MoFlo Astrios (Beckman Coulter) to obtain cardiac macrophages from CD45.2; Mx1CreMybflox/flox after BM-transplantation of CD45.1 BM (n=3 for 2 days after I/R injury) for bulk sequencing,..“). We added the sample size also to the figure legend.

      • Please state in the methods how the normality of data was tested.

      We added the respective normality test to the methods part. “The Shapiro-Wilk test was used to test normality. “

      • How did the authors ensure a standardized infarct size?

      The authors ensured a standardized infarct size in mice following myocardial infarction through a carefully controlled experimental protocol. We employed the well-established I/R procedure for inducing myocardial infarction in mice by ligation of the LAD for 1h to mimic the transient blockage of blood flow to the anterior wall of the heart. Success of the ligation of the LAD and the induction of ischemia was confirmed by the pale color of the myocardium after ligation and the success of reperfusion by the return of color after removing the suture. The surgical technique was consistently performed by the same highly trained veterinarian in a blinded fashion to minimize variability.

    1. Author Response

      We are grateful to the three reviewers and the editors who have provided comments about our manuscript, "Formation of malignant, metastatic small cell lung cancers through overproduction of cMYC protein in TP53 and RB1 depleted pulmonary neuroendocrine cells derived from human embryonic stem cells.”

      We are pleased that the reviewers recognized the importance of the problem we have addressed – namely, the need for better models of small cell lung cancer, a relatively common and refractory cancer. We also appreciate their acknowledgement of the significance of our major finding: that addition of an efficiently expressed CMYC transgene to neuroendocrine cells derived from human embryonic stem cells in which the RB1 and TP53 genes have been suppressed serves to drive aggressive growth and metastatic spread, rendering this system an appealing one for future studies of this recalcitrant cancer. Further, we acknowledge that more work needs to be done to more fully characterize and better understand the mechanistic features of this model system and to exploit it for therapeutic purposes.

      More specifically, we agree with the reviewers that this manuscript would be stronger if it included: (i) tests of other oncogenes, especially other members of the MYC gene family, to serve as drivers of tumor growth and metastasis and tests of orthotropic implantation of cells into the lung; (ii) descriptions of how such tumors with various genotypes respond to therapeutic approaches, both established and novel; and (iii) a more complete assessment of the contribution of abundant MYC proteins to physiological changes in tumor cells, such as growth, apoptosis, and invasion.

      While we wish we could provide such information, it is unrealistic to believe that it will be generated by the current constellation of authors in the foreseeable future. Data in the present manuscript has been generated over nearly five years, mostly in the early phases of that interval. Since then, some of us have moved from one institution to another, and some have shifted the focus of our studies. Further delays in publishing the main messages in this paper will only delay the pursuit of further studies, most likely by others. Indeed, one of the strongest justifications for the novel publication policies at eLife is to return control of the time for dissemination of results to the hands of the authors. Our situation illustrates the wisdom of that approach.

      We also note that the reviewers have raised a few issues that we aim to clarify by revisions of the current manuscript, thereby creating an improved Version of Record, within the next few weeks. We acknowledge here the significance of those issues and the ambiguities noted by the reviewers.

      The issues include the following point noted by more than one reviewer: our claim that expression of the CMYC oncogene increases the neuroendocrine character of the tumors. We recognize that this observation may be influenced by the nature of the analysis (single cell or bulk RNA sequencing), the choice of lineage markers (eg, NEUROD1 or ASCL1 or others), and the statistical evaluation of the data. We will review these aspects of the problem and make appropriate changes in the text to be submitted as the Version of Record.

      Reviewer 1 also makes a good point about the possible effects of CMYC on the differentiation of hESC-derived lung progenitors (LPs). In this paper, we examine this issue only in LPs in which the tumor suppressor genes, RB1 and TP53, have been suppressed. Further studies of the effect of CMYC on differentiation of LPs with various combinations of functional tumor suppressor genes might well prove valuable in exploring the origins of SCLC.

      Finally, we wish to note that a topic discussed by Reviewer 1 (and by us) about the still poorly understood relationship between cancer genotypes and cell lineages has been partially addressed in a paper from our group that has been accepted for publication in Science.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      1) A single biomarker seems very unlikely to be of much help in the detection of glaucoma due to the etiological heterogeneity of the disease, the existence of different subtypes, and the genetic variability among patients. Rather, a panel of biomarkers may provide more useful information for clinical prediction, including better sensitivity and specificity. The inclusion of additional metabolites already identifying in the study, in combination, may provide more reliable and correct assignment results.

      The authors’ answer: Thank you for your comment. We recognize the constraints of using single biomarkers for diagnosis. In upcoming research, we aim to incorporate multiple biomarkers to improve diagnostic accuracy and will consider adding more metabolites as suggested.

      2) The number of samples in the supplementary phase is low, larger sample sizes are mandatory to confirm the diagnostic accuracy.

      The authors’ answer: Thank you for your comment. Collecting aqueous humor is invasive, making samples scarce. We acknowledge the small sample size limitation. In future studies, we plan to use larger samples to verify the biomarker's diagnostic accuracy. Your feedback emphasizes the need for thorough validation in our next research

      3) Cohorts from different populations are needed to verify the applicability of this candidate biomarker.

      The authors’ answer: Thank you for the suggestion. We agree on the need to test the biomarker's relevance across varied populations. Reports from other groups will help confirm and broaden our results.

      4) Sex hormones seem to be associated also with other types of glaucoma, such as primary open-angle glaucoma (POAG), although the molecular mechanisms are unclear (see doi:10.1167/iovs.17-22708). The inclusion of patients diagnosed with other subtypes of glaucoma, like POAG, may contribute to determining the sensitivity and specificity of the proposed biomarker. Androstenedione levels should be determined in POAG, NTG, or PEXG patients.

      The authors’ answer: I agree with your comment and thank you for your suggestion. PACG is a major cause of irreversible blindness in Asians. While this study centers on PACG, the link between sex hormones and other glaucoma subtypes, like POAG, merits investigation. Future studies will include POAG and other subtypes to further assess androstenedione's diagnostic relevance.

      5) In addition, the levels of androstenedione were found significantly altered during other diseases as described by the authors or by conditions like polycystic ovary syndrome, limiting the utility of the proposed biomarker.

      The authors’ answer: Thank you for your advice. Androstenedione levels also change in conditions like polycystic ovary syndrome, which could affect the biomarker's specificity. We plan to further study androstenedione's unique changes in glaucoma versus other conditions to clarify its diagnostic value.

      6) Uncertainty of the androstenedione levels compromises its usefulness in clinical practice.

      The authors’ answer: The uncertainty surrounding androstenedione levels and its impact on clinical applicability is a valid concern. We plan to delve deeper into understanding the variability and determinants of androstenedione levels to better assess its clinical relevance.

      Reviewer #2 (Public Review):

      The "predict" part is on much less solid ground. The visual field progression and association with serum androstenedione within the current experimental design eludes to a correlation. It truly cannot be stated as predictive. To predict one needs to put the substance when nothing is there and demonstrate that the desired endpoint is reached. Conversely, the substance (androstenedione) can be removed, and show that the condition regresses. None of these are possible without model system experiments, which have not been done. The authors could put some additional details in the methods, such as: 1) how much sample was collected, 2) whether equal serum volume for analysis had equal serum proteins (or cells). They have used a LC-MS/MS and a Chemiluminescence method, but another independent method such as GC-MS/MS or NMR to detect androstenedione for a subset of patients with different stages of visual field defect would be desirable.

      The authors’ answer: We acknowledge your constructive critique concerning our use of the term "predict". In the present study, we elucidated a discernible correlation between visual field progression and serum androstenedione concentrations. We are cognizant of the critical distinction between correlation and causation, and we concur that our application of the term “predict” may have been overly assertive in this context.

      Your emphasis on the imperative of employing model system experiments to unequivocally ascertain causative relationships is well-received. The experimental approach of modulating the substance, androstenedione in this case, to empirically observe its consequential impact on the condition, is a pivotal direction that warrants exploration in subsequent research endeavors. With regard to the variability of serum protein concentrations across participants, we adopted a methodological standardization by ensuring that the analyzed serum volume remained consistent across samples. This was implemented to enhance the reliability and generalizability of our findings.

      Your recommendation to consider alternative detection methodologies, specifically GC-MS/MS or NMR, is duly noted. Although our choice of LC-MS/MS and Chemiluminescence was predicated on available resources, we recognize the scientific merit in leveraging multiple analytical techniques. In future investigations, we endeavor to incorporate a broader spectrum of detection methodologies for androstenedione, particularly when assessing patients with varied visual field defect stages, thereby bolstering the robustness and validity of our conclusions.

      Reviewer #1 (Recommendations for The Authors):

      1) POAG is the leading cause of irreversible blindness worldwide (see reference #4). The prevalence of PACG is highest in Asia, but the major form of glaucoma is still POAG. The authors should modify the abstract and background sections accordingly (see line 30 and lines 61-62).

      The authors’ answer: Thank you for your suggestion, and we apologize for this mistake. The sentence” Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness worldwide” has been changed to” Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness in Asia”. (Page 2, lines 33; Page 3, lines 62-64)

      2) Line 69, please change the sentence "the He et al. taught us..." to the following "the He et al. study taught us.".

      The authors’ answer: Thank you for your comment. The sentence "the He et al. taught us..." has been changed to "the He et al. study taught us.". (Page 3, lines 72)

      3) I suggest including the name of the identified candidate biomarker in the title of the manuscript. The title must be straightforward.

      The authors’ answer: We agree with your comment and thank you for your suggestion. The sentence “Metabolomics Identifies and Validates Serum Novel Biomarker for Diagnosing Primary Angle Closure Glaucoma and Predicting the Visual Field Progression” has been changed to “Metabolomics Identifies and Validates Serum Androstenedione as Novel Biomarker for Diagnosing Primary Angle Closure Glaucoma and Predicting the Visual Field Progression”. (Page 1, lines 1)

      4) Line 88, please change "normal subjects" to "control individuals".

      The authors’ answer: Thank you for your comment. We have changed "normal subjects" to "control individuals”. (Page 4, lines 91)

      5) Line 95 and so on along the manuscript, avoid the term "normal controls" or "normal" and use only the term "controls".

      The authors’ answer: Thank you for your advice. "normal subjects" has been changed to "controls". (Page 4, lines 113; Page5, lines 118,120,124,128,133)

      6) In the participants section, indicate the ocular treatments of PACG patients. For example, on line 141, which "treatment" are you referring to?

      The authors’ answer: Thank you for your comment. We apologize to this vague statement. Treatment included medical treatment and surgical treatment. We have revised it in the manuscript. (Page 5, lines 142)

      7) The entire section 2.4 is confusing. According to Figure S2, untargeted metabolomics was conducted with a mixed sample containing "all" serum extracts in order to obtain an in-house database with molecular features present in serum by LCHRMS. Then, this database was used for targeted metabolomics in individual serum samples using LCQQQ. However, as it is described in the manuscripts, it seems that first, an untargeted metabolomics analysis was carried out to identify altered metabolites, then targeted metabolomics was carried out to validate the untargeted analysis and finally, a profiling analysis was carried out to construct the database. The workflow must be clearly discussed and amended to be understable.

      The authors’ answer: Thank you for your comment. We have revised the description of the experimental method section 2.4. (Page 7, lines 195-198)

      8) Please, briefly explain what widely-targeted metabolomics is and how it works in this study (see section 2.4).

      The authors’ answer: Thank you for your comment. For extensively targeted metabolome detection, a local database was first established by using the standard database, and ion pair information was obtained by scanning ion pairs of mixed samples (QC) with QTOF. A wide range of metabolites were qualitatively obtained by comparing with the local self-built database, and then the metabolites of each sample were qualitatively and quantitatively measured by MRM scanning mode of triple four-bar QQQ. This project combines the non-target public database scanning construction database and the wide target local database to build a new database, and then scans the database of the samples of this project with Q-TOF, and then carries out the qualitative and quantitative detection of metabolites of each sample in MRM mode. (Figure S2)

      9) On Table 1, indicate the number of patients and controls with cataracts.

      The authors’ answer: For the glaucoma group and the control group, we have excluded people with cataracts. This section is described in the inclusion and exclusion criteria for supplementary materials. (Inclusion and exclusion criteria)

      10) On "Sample processing" section, lines 152 and 153: Have you used cold methanol to ensure metabolic quenching? If not, how metabolite quenching was carried out?

      The authors’ answer: Thank you for your comment. We use cold methanol to extract metabolites, and the early blood samples have been stored in a -80°C refrigerator to ensure a low temperature process and ensure metabolic quenching. (Page 6, lines 196)

      11) On the same "Sample processing" section, have you used internal standards during metabolite extraction? If yes, ones? If not, why?

      The authors’ answer: Thank you for your comment. In the metabolite extraction process of each sample, the same internal standard was added, and the same volume of 50 μL serum samples were extracted. The specific internal label name has been added in "Sample processing" section. (Page 6, lines 153-155)

      12) Lines 161-163, I suggest including in the supplementary material the worklist of the entire experiment run by LC-MS, including analytical replicates and QCs.

      The authors’ answer: Thank you for your comment. Worklist for mass spectrometry can be found in supplementary sheet1. (Page 6, lines 165)

      13) The title of the section "Detection method" does not seem appropriate, please change it to "Analytical methods "or something similar.

      The authors’ answer: Thank you for your advice. "Detection method" has been changed to “Analytical methods “. (Page 6, lines 168)

      14) Section 2.4.1, I suggest changing "Untargeted detection conditions" to "Untargeted metabolomics analysis".

      The authors’ answer: Thank you for your comment. "Untargeted detection conditions" has been changed to "Untargeted metabolomics analysis". (Page 6, lines 169)

      15) Lines 170-172, the column used is compatible with 100% water, why start with 5% acetonitrile?

      The authors’ answer: Thank you for your comment. If the acetonitrile starting gradient is 0, it will cause a lot of water-soluble substances to elute and easily clog the column, so we want to use 5% organic phase.

      16) Section 2.4.1, the chromatographic conditions (mobiles phases) were the same in both positive and negative ion mode? It is desirable to change or adjust a basic pH when working in negative, so please amend and clarify it.

      The authors’ answer: Thank you for your comment. In the negative ion mode, the peak shape of the chromatogram under the acidic system is better than that under the alkaline system, so we choose the acidic system.

      17) I am not able to clearly understand what is "widely targeted conditions" (see section 2.4.2). What is the difference with the conventional targeted metabolomics analysis? In my view, widely-targeted metabolomics refers to the combination of untargeted metabolomics and targeted metabolomics. This must be clarified and simplified.

      The authors’ answer: Thank you for your syggestion. The characterization of metabolites in this study was conducted using a non-targeted database and a self-built database. Non-targeted metabolites were characterized with mixed samples, and then combined with the laboratory self-established database to form a new metabolome database for this study. 2.4.2 The broad targeting here refers to the use of the MWDB standard self-built database to characterize metabolites, and then the QQQ MRM model to quantify metabolites. In order to clearly describe the detection process, this part of the method has been modified. (Figure S2)

      18) Line 199, please, indicate the normalization carried out.

      The authors’ answer: We agree with your comment and thank you for your suggestion. The normalization description is missing from its data processing steps and has been corrected in the manuscript. (Page 7, lines 203)

      19) How many instrumental replicates have you carried out both in untargeted and targeted metabolomics? Please, indicate it.

      The authors’ answer: Thank you for your advice. In this project, all sample mixtures were used as QC samples, which were repeated several times in the testing process (one QC sample was inserted between every 10 samples), and the repeated correlation between repeated QC was more than 99% to ensure the stability of sample testing. (Sheet1)

      20) Line 267, why did you select a fold changes threshold greater than 1.15 (or lower 0.85)? In metabolomics, it would be desirable to have a minimum of 1.5-fold change considering the variability of data.

      The authors’ answer: Thank you for your comment. FC reduction is selected to expand potential candidate metabolites and can be repeated in three batches and refer to the literature "Blood metabolomics uncovers inflammation-associated mitochondrial dysfunction as a potential mechanism. underlying ACLF "method screening threshold.

      21) To include anywhere the molecular formula of androstenedione.

      The authors’ answer: I agree with your comment and thank you for your suggestion. We have added the molecular formula of androstenedione to the supplementary material. (Page 17, lines 475)

      22) Line 290 is not Figure 4B and 4C, you may refer to Figure 3B and 3C.

      The authors’ answer: Thank you for your advice. We apologize to this mistake. Figure 4B and 4C have been changed to Figure 3B and 3C.

      23) Figure S3 was lost from Supplementary material, please include it.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. There is an error in the ordering of the supplementary graph. Figure 3 is redundant, and we have modified it in the supplementary materials.

      24) Figure 4 B, indicate in the text the average and uncertainty of androstenedione levels in both control and PACG groups.

      The authors’ answer: Thank you for your comment. In the manuscript, We have added descriptions of mean ± standard deviation of androstendione levels in the control group and the disease group. (Page 11, lines 311-312)

      25) Section 3.6. please include the average and uncertainty of androstenedione levels in males and females in both control and PACG groups.

      The authors’ answer: Thank you for your advice. For 3.6 section, we supplemented the mean ± standard deviation of androstenedione levels in the control and disease groups. (Page 13, lines 350-356)

      26) Figure S9 seems missing.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. Figures S9 has been added in the Supplementary material.

      27) Lines 345-346, indicate the levels obtained for the metabolite in the compared groups.

      The authors’ answer: Thank you for your suggestion. The levels of androstenedione in each group are seen in “The results from both discovery set 1 (Figure S9A, Mild:32600±17011, Moderate:33215±17855, Severe:46060±21789) and discovery set 2 (Figure S9B, Mild:27866±19873, Moderate:27057±13166, Severe:43972±19234) indicated that the mean serum androstenedione levels were significantly higher in the severe PACG group compared to the moderate and mild PACG groups (P<0.001). These findings were further validated in both validation phase 1 (Figure S9C, Mild:75726±45719, Moderate:65798±30610, Severe:94348±30858) and validation phase 2 (Figure S9D, Mild:1.121±0.3143 ng/ml, Moderate:1.461±0.4391 ng/ml, Severe:2.147±0.6476 ng/ml).” and “Notably, the level of androstenedione was found to be significantly higher in PACG patients than in normal subjects in both discovery set 1 (Figure 4B, P=0.0081, Normal:33987±11113, PACG:42852±20767) and discovery set 2 (Figure 4C, P=0.0078, Normal:31559±10975, PACG:37934±18529).”

      28) Line 368, you don't need to indicate the PACG abbreviation again.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. I have changed " patients with PACG " to "patients". (Page 13, lines 377)

      29) Figure 6, panels A and B are not labeled (i.e., commented) in the body text of the manuscript.

      The authors’ answer: Thank you for your suggestion. We’re very sorry for this mistake. Figure 6, panels A and B have been labeled in the manuscript. (Page 13, lines 377-379)

      30) Section 3.7., when you indicate "after therapy" are you referring to surgical treatment? Please, clarify.

      The authors’ answer: Thank you for your comment. We apologize to this vague statement. Blood samples were taken before and three months after surgery. “therapy” has been changed to “surgical treatment” in the manuscript. (Page 13, lines 377)

      31) Line 370, "97th patient" should be replaced by "nine patients"?

      The authors’ answer: Thank you for your advice. We apologize to this mistake. "97th patient" has been changed to “nine patients". (Page 13, lines 378-379)

      32) Lines 370-372, it difficult to understand, please clarify why these findings indicate that severity is related to increased PACG according to Figure 6B.

      The authors’ answer: Thank you for your comment. We’re very sorry for this vague statement. The sentence of “These findings showed that the levels of androstenedione that were tightly connected with PACG severity rose dramatically as PACG progressed.” Has been removed.

      33) Line 447, the word "corrected" should be changed to "correlated"?

      The authors’ answer: Thank you for your comment. "corrected" has been changed to "correlated". (Page 16, lines 453,456)

      34) According to the literature, the levels found in control subjects are within the range of the "normal" values, i.e., are they comparable?

      The authors’ answer: Thank you for your advice. Androstenedione ranges from 0.4 to 2 in the normal population. The mean standard deviation of androstenedione in the normal population was 1.552 ± 0.4859.

      35) Lines 471-474, why "steroid hormone biosynthesis appears to be the critical node to high-match PACG pathophysiological concepts" while the high enrichment was observed in the "metabolic pathways"?

      The authors’ answer: Metabolic pathways encompass a series of chemical reactions within a cell that enable the synthesis or breakdown of molecules to maintain the cell's energy balance. Steroid hormone biosynthesis is one of these metabolic pathways, and its products, steroid hormones, participate in a wide range of physiological processes, including metabolism, immune response, and the regulation of inflammation. In a different context, a study related to fatigue during Androgen Deprivation Therapy (ADT) showed a significant difference in metabolite levels within the steroid hormone biosynthesis pathways, emphasizing the role these pathways play in metabolic alterations. The mentioned findings suggest that steroid hormone biosynthesis and metabolic pathways are intertwined. (Page 17, lines 481-488)

      36) Figure S13 and Figure S14A are the same.

      The authors’ answer: Thank you for your comment. Figure S14A has been removed.

      37) On lines 476-485, it would be interesting to discuss whether alterations of this metabolite could be a cause or consequence of PACG.

      The authors’ answer: Based on the literature found, androstenedione is a naturally occurring steroid hormone produced by the gonads and adrenal glands, and serves as an intermediate in testosterone biosynthesis (Androstenedione (a Natural Steroid and a Drug Supplement): A Comprehensive Review of Its Consumption, Metabolism, Health Effects, and Toxicity with Sex Differences). Early events in the pathobiology of glaucoma involve oxidative, metabolic, or mechanical stress acting on retinal ganglion cells (RGCs), leading to their rapid release of danger signals such as extracellular ATP, thus triggering microglial and macroglial activation as well as neuroinflammation (Immune Responses in the Glaucomatous Retina: Regulation and Dynamics). However, one might speculate that since androstenedione is a steroid hormone, it could potentially impact the inflammatory and metabolic stress observed in the pathophysiological processes of glaucoma (Adaptive responses to neurodegenerative stress in glaucoma). Metabolic and anti-inflammatory avenues might be crucial in understanding the relationship between alterations in androstenedione levels and the severity of glaucoma. Nevertheless, more research and literature analysis would be necessary to better understand the precise relationship and its underlying mechanisms between these two entities.

      38) I suggest sending the MS and MS/MS into a publicly available repository.

      The authors’ answer: Thank you for your suggestion. Further research will necessitate the utilization of the raw mass spectrometry data. We anticipate making this raw data available in a public repository upon the conclusion of subsequent experiments.

      Reviewer #2 (Recommendations for The Authors):

      The authors should aim to describe methods in greater detail.

      The authors could improve the writing to accurately describe their results and their interpretation and state what else could be done to make the result truly "predictive".

      The authors’ answer: (1) Detail Enhancement in the Methods section: We expand the description of methods such as sample pre-processing, mass spectrometry detection, and result analysis in the study to provide more detailed information about the procedures, equipment, and materials used. (2) Improvement in Writing Quality: We have engaged a scientific editor to review our manuscript for clarity, coherence, and consistency to ensure that the results and interpretations are accurately and clearly conveyed. Terminologies and phrases have been revised to better reflect the findings and interpretations. (3) Limitation supplement: We have included a discussion on the limitations of our study and suggested additional studies and analyses that could be conducted to enhance the predictive value of our findings. We sincerely appreciate the constructive feedback from the reviewer, which has greatly contributed to improving the quality and rigor of our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Issue 1: The relevance is somewhat unclear. High cysteine levels can be achieved in the laboratory, but, is this relevant in the life of C. elegans? Or is there physiological relevance in humans, e.g. a disease? The authors state "cells and animals fed excess cysteine and methionine", but is this more than a laboratory excess condition? SUOX nonfunctional conditions in humans don't appear to tie into this, since, in that context, the goal is to inactivate CDO or CTH to prevent sulfite production. The authors also mention cancer, but the link to cysteine levels is unclear. In that sense, then, the conditions studied here may not carry much physiological relevance.

      Response 1: We set out to answer a fundamental question: what pathways regulate the function of cysteine dioxygenase, a highly conserved enzyme in sulfur amino acid metabolism? In an unbiased genetic screen that sampled millions of EMS generated mutations across all ~20,000 C. elegans genes, we discovered loss of function/null mutations in egl-9 and rhy-1, two negative regulators of the hypoxia inducible transcription factor (hif-1). Genetic ablation of the egl-9 or rhy-1 loci are likely not relevant to the life of a C. elegans animal, i.e. this is not representative of a natural state. Yet, this extreme genetic intervention has taught us a new fundamental truth about the interaction between EGL-9/RHY-1, HIF-1, and the transcriptional activation of cdo1. Similarly, the high cysteine levels used in our assays may or may not be representative of a state in nature, we do not know (nor do we make any claims about the environmental relevance of our choice of cysteine concentrations). It seems very plausible that pathological states exist where cysteine concentrations may rise to comparable levels in our experimental system. More importantly, we have started with excess to physiology to elicit a clear response that we can study in the lab. Similar strategies established the cysteine-induction phenotype of CDO1 in mammalian systems. For instance, in Kwon and Stipanuk 2001, hepatocytes are cultured in media supplemented with 2mmol/L cysteine to promote a ~4-fold increase in CDO1 mRNA.

      Issue 2: The pathway is described as important for cysteine detoxification, which is described to act via H2S (Figure 6). Much of that pathway has already been previously established by the Roth, Miller, and Horvitz labs as critical for the H2S response. While the present manuscript adds some additional insight such as the additional role of RHY-1 downstream on HIF-1 in promoting toxicity, this study therefore mainly confirms the importance of a previously described signalling pathway, essentially adding a new downstream target rhy-1 -> cysl-1 -> egl-9 -> hif-1 -> sqrd-1/cdo-1. The impact of this finding is reduced by the fact that cdo-1 itself isn't actually required for survival in high cysteine, suggesting it is merely a maker of the activity of this previously described pathway.

      Response 2: We agree that the primary impact of our manuscript is the establishment of a novel intersection between the H2S-sensing pathway (largely worked out by Roth, Miller, and Horvitz) and our gene of interest, cysteine dioxygenase. We believe that the connection between these two pathways is exciting as it suggests a logical homeostatic circuit. High cysteine yields enzymatically produced H2S. This H2S may then act as a signal promoting HIF-1 activity (via RHY-1/CYSL-1/EGL-9). High HIF-1 activity increases cdo-1 transcription and activity promoting the degradation of the high-cysteine trigger. As pointed out by the reviewer, cdo-1(-) loss of function alone does not cause cysteine sensitivity at the concentrations tested. Given that cysl-1(-) and hif-1(-) mutants are exquisitely sensitive to high levels of cysteine, we propose that HIF-1 activates the transcription of additional genes that are required for high cysteine tolerance. However, our genetic data show that cdo-1 is more than simply a marker of HIF-1 transcription. Our genetic data in Table 1 demonstrate that HIF-1 activation (caused by egl-9(-)) is sufficient to cause severe sickness in a suox-1 hypomorphic mutant which cannot detoxify sulfites, a critical product of cysteine catabolism. This severe sickness can be reversed by inactivating hif-1, cth-2, or cdo-1. These data demonstrate a functional intersection between the established H2S-sensing pathway and cysteine catabolism governed by cdo-1.

      Reviewer #2 (Public Review):

      Issue 3: First, the authors show that the supplementation of exogenous cysteine activates cdo-1p::GFP. Rather than showing data for one dose, the author may consider presenting dose-dependency results and whether cysteine activation of cdo-1 also requires HIF-1 or CYSL-1, which would be important data given the focus and major novelty of the paper in cysteine homeostasis, not the cdo-1 regulatory gene pathway.

      Response 3: We agree with the reviewer and have performed the suggested dose-response curve for expression of Pcdo-1::GFP in wild-type C. elegans. We observe substantial activation of the Pcdo-1::GFP transcriptional reporter beginning at 100µM supplemental cysteine (Figure 3C). Higher doses of cysteine do not elicit a substantially stronger induction of the Pcdo-1::GFP reporter. Thus, we find that 100µM supplemental cysteine strikes the right balance between strongly inducing the Pcdo-1::GFP reporter while not inducing any toxicity or lethality in wild-type animals (Figure 3E).

      We further agree that testing for induction of the Pcdo-1::GFP reporter in a hif-1(-) or cysl-1(-) mutant background is a critical experiment. However, we have not been able to identify a cysteine concentration that induces Pcdo-1::GFP and is not 100% lethal for hif-1(-) or cysl-1(-) mutant C. elegans. The remarkable sensitivity of hif-1(-) or cysl-1(-) mutant C. elegans to supplemental cysteine demonstrates the critical role of these genes in promoting cysteine homeostasis. But because of this lethality, we could not assay the Pcdo1::GFP reporter in the hif-1(-) or cysl-1(-) mutant animals. But the lethality to excess cysteine demonstrates that this cysteine response is salient. To get at how cysteine might be interacting with the HIF-1-signaling pathway, we performed new additivity experiments by supplementing 100µM cysteine to wild type, egl-9(-), and rhy-1(-) mutant C. elegans expressing the Pcdo-1::GFP reporter. Surprisingly, we found that cysteine had no significant impact on Pcdo-1::GFP expression in an egl-9(-) mutant background but significantly increased the Pcdo-1::GFP expression in a rhy-1(-) background (Figure 3A,B). These data suggest that cysteine acts in a pathway with egl-9 and in parallel to rhy-1. These data have been incorporated into Figure 3A,B and are included in the Results section of the manuscript.

      Issue 4: While the genetic manipulation of cdo-1 regulators yields much more striking results, the effect size of exogenous cysteine is rather small. Does this reflect a lack of extensive condition optimization or robust buffering of exogenous/dietary cysteine? Would genetic manipulation to alter intracellular cysteine or its precursors yield similar or stronger effect sizes?

      Response 4: We agree that the induction of the Pcdo-1::GFP reporter by supplemental cysteine is not as dramatic as the induction caused by the egl-9 or rhy-1 null alleles. We believe our Response 3 and new Figure 3C demonstrate that this phenomenon is not due to lack of condition optimization, but likely reflects some biology. As pointed out by the reviewer, C. elegans likely buffers exogenous cysteine and this (perhaps) prevents the impressive Pcdo-1::GFP induction observed in the egl-9(-) and rhy-1(-) mutant animals. We have now mentioned this possible interpretation in the Results section. Furthermore, we like the idea of using genetic tricks to promote cysteine accumulation within C. elegans cells and tissues and will consider these approaches in future studies.

      Issue 5: Second, there remain several major questions regarding the interpretation of the cysteine homeostasis pathway. How much specificity is involved for the RHY-1/CYSL-1/EGL-9/HIF-1 pathway to control cysteine homeostasis? Is the pathway able to sense cysteine directly or indirectly through its metabolites or redox status in general? Given the very low and high physiological concentrations of intracellular cysteine and glutathione (GSH, a major reserve for cysteine), respectively, there is a surprising lack of mention and testing of GSH metabolism.

      Response 5: Future studies are required to determine the specificity of the RHY-1/CYSL-1/EGL-9/HIF-1 pathway for the control of cysteine homeostasis. Our proposed mechanism, that H2S activates the HIF-1 pathway is based largely on the work of the Horvitz lab (Ma et al. 2012). They demonstrate that H2S promotes a direct inhibitory interaction between CYSL-1 and EGL-9, leading to activation of HIF-1. These findings align nicely with our genetic and pharmacological data. However, our work does not provide direct evidence as to the cysteine-derived metabolite that activates HIF-1. We propose H2S as a likely candidate.

      We have added a note to the introduction regarding the role of GSH as a reservoir of excess cysteine and agree that future studies might find interesting links between CDO-1, GSH metabolism, and HIF-1.

      Issue 6: In addition, what are the major similarities and differences of cysteine homeostasis pathways between C. elegans and other systems (HIF dependency, transcription vs post-transcriptional control)? These questions could be better discussed and noted with novel findings of the current study that are likely C. elegans specific or broadly conserved.

      Response 6: We have included a new section in the Discussion highlighting the nature of mammalian CDO1 regulation. We propose the hypothesis that a homologous pathway to the C. elegans RHY-1/CYSL-1/EGL9/HIF-1 pathway might operate in mammalian cells to sense high cysteine and induce CDO1 transcription. Importantly, all proteins in the C. elegans pathway have homologous counterparts in mammals. However, this hypothesis remains to be tested in mammalian systems.

      Reviewer #3 (Public Review):

      Major weaknesses of the paper include:

      Issue 7: the over-reliance on genetic approaches.

      Response 7: This is a fair critique. Our expertise is genetics. Our philosophy, which the reviewers may not share, is that there is no such thing as too much genetics!

      Issue 8: the lack of novelty regarding prolyl hydroxylase-independent activities of EGL-9.

      Response 8: We believe the primary novelty of our work is establishing the intersection between the H2Ssensing HIF-1 pathway and cysteine catabolism governed by cysteine dioxygenase. Our demonstration that cdo-1 regulation operates largely independent of VHL-1 and EGL-9 prolyl hydroxylation is a mechanistic detail of this regulation and not the critical new finding. Although, we believe it does suggest where pathway analyses should be directed in the future. We also believe that our homeostatic feedback model for the regulation of HIF-1 (and cdo-1) by cysteine-derived H2S is new and exciting and provides insight into the logic of why HIF-1 might respond to H2S and promote the activity of cdo-1. Our work suggests that one reason for this intersection of hif-1 and cdo-1 is to sense and maintain cysteine homeostasis when cysteine is in excess.

      Issue 9: the lack of biochemical approaches to probe the underlying mechanism of the prolyl hydroxylaseindependent activity of EGL-9.

      Response 9: While not the primary focus of our current manuscript, we agree that this is an exciting area of future research. To uncover the prolyl hydroxylase-independent activity of EGL-9, we agree that a combination of approaches will be required including, biochemical, structure-function, and genetic.

      Major Issues We Feel the Authors Should Address:

      Issue 10: One particularly glaring concern is that the authors really do not know the extent to which the prolyl hydroxylase activity is (or is not) impacted by the H487A mutation in egl-9(rae276). If there is a fair amount of enzymatic activity left in this mutant, then it complicates interpretation. The paper would be strengthened if the authors could show that the egl-9(rae276) eliminates most if not all prolyl hydroxylase activity. In addition, the authors may want to consider doing RNAi for egl-9 in the egl-9(rae276) mutant as a control, as this would support the claim that whatever non-hydroxylase activity EGL-9 may have is indeed the causative agent for the elevation of CDO-1::GFP. Without such experiments, readers are left with the nagging concern that this allele is simply a hypomorph for the single biochemical activity of EGL-9 (i.e., the prolyl hydroxylase activity) rather than the more interesting, hypothesized scenario that EGL-9 has multiple biochemical activities, only one of which is the prolyl hydroxylase activity.

      Response 10: We have two lines of evidence that suggest the egl-9(rae276)-encoded H487A variant eliminates prolyl hydroxylase activity. First, Pan et al. 2007 (reference 57) demonstrate that when the equivalent histidine (H313) is mutated in human protein, that protein lacks detectible prolyl hydroxylase activity. Second, the phenotypic similarities caused by egl-9(rae276) and the vhl-1 null allele, ok161. Both alleles cause nearly identical activation of the Pcdo-1::GFP reporter transgene (Fig. 5C,D), and similarly impact the growth of the suox-1(gk738847) hypomorphic mutant (Table 1). This phenotypic overlap is highly relevant as the established role of VHL-1 is to recognize the hydroxyl mark conferred by the EGL-9 prolyl hydroxylase domain and promote the degradation of HIF-1. If EGL-9[H487A] had residual prolyl hydroxylase activity, we would expect the vhl-1(-) null mutant C. elegans to display more dramatic phenotypes than their egl-9(rae276) counterparts. This is not the case.

      Issue 11: The authors observed that EGL-9 can inhibit HIF-1 and the expression of the HIF-1 target cdo-1 through a combination of activities that are (1) dependent on its prolyl hydroxylase activity (and subsequent VHL-1 activity that acts on the resulting hydroxylated prolines on HIF-1), and (2) independent of that activity. This is not a novel finding, as the authors themselves carefully note in their Discussion section, as this odd phenomenon has been observed for many HIF-1 target genes in multiple publications. While this manuscript adds to the description of this phenomenon, it does not really probe the underlying mechanism or shed light on how EGL-9 has these dual activities. This limits the overall impact and novelty of the paper.

      Response 11: See response to Issues #8.

      Issue 12: Cysteine dioxygenases like CDO-1 operate in an oxygen-dependent manner to generate sulfites from cysteine. CDO-1 activity is dependent upon availability of molecular oxygen; this is an unexpected characteristic of a HIF-1 target, as its very activation is dependent on low molecular oxygen. Authors neither address this in the text nor experimentally, and it seems a glaring omission.

      Response 12: We agree this is an important point to raise within our manuscript. Although, despite its induction by HIF-1, there is no evidence that cdo-1 transcription is induced by hypoxia. In fact, in a genome wide transcriptomic study, cdo-1 was not found to be induced by hypoxia in C. elegans (Shen et al. 2005, reference 71).

      We have newly commented on the use of molecular oxygen as a substrate by both EGL-9 and CDO-1 in our Discussion section. The mammalian oxygen-sensing prolyl hydroxylase (EGLN1) has been demonstrated to have high a Km value for O2 (high µM range). This likely allows EGLN1 to be poised to respond to small decreases in cellular oxygen from normal oxygen tensions. Clearly, CDO-1 also requires oxygen as a substrate, however the Km of CDO-1 for O2 is likely to be much lower, preventing sensitivity of the cysteine catabolism to physiological decreases in O2 availability. Although, to our knowledge, the CDO1 Km value for O2 has not been experimentally determined. We have added a new Discussion section where we address the conundrum about low oxygen inducing HIF-1 but oxygen being needed by CDO-1/CDO1.

      Issue 13: The authors determined that the hypodermis is the site of the most prominent CDO-1::GFP expression, relevant to Figure 4. This claim would be strengthened if a negative control tissue, in the animal with the knockin allele, were shown. The hypodermal specific expression is a highlight of this paper, so it would make this article even stronger if they could further substantiate this claim.

      Response 13: Our claim that the hypodermis is the critical site of cdo-1 function is based on; i) our hands on experience looking at Pcdo-1::GFP, Pcdo-1::CDO-1::GFP, CDO-1::GFP (encoded by cdo-1(rae273)) and our reporting of these expression patterns in multiple figures throughout the manuscript and ii) the functional rescue of cdo-1(-) phenotypes by a cdo-1 rescue construct expressed by a hypodermal-specific promoter (col10). We agree that providing negative control tissues would modestly improve the manuscript. However, we do not think that adding these controls will substantially alter the conclusions of the paper. Importantly, we acknowledge this limitation of our work with the sentence, “However, we cannot exclude the possibility that CDO-1 also acts in other cells and tissues as well.”

      Minor issues to note:

      Issue 14: Mutants for hif-1 and cysl-1 are sensitive to exogenous cysteine levels, yet loss of CDO-1 expression is not sufficient to explain this phenomenon, suggesting other targets of HIF-1 are involved. Given the findings the authors (and others) have had showing a role for RHY-1 in sulfur amino acid metabolism, shouldn't the authors consider testing rhy-1 mutants for sensitivity to exogenous cysteine?

      Response 14: To test the hypothesis that rhy-1(-) C. elegans might be sensitive to supplemental cysteine, we cultured wild type and rhy-1(-) animals on 0, 100, and 1000µM supplemental cysteine. At 0 and 100µM supplemental cysteine, neither wild-type nor rhy-1(-) animals display any lethality suggesting rhy-1 is not required for survival in the face of excess cysteine (Fig. 3D,E). We also cultured these same strains on 1000µM supplemental cysteine, a concentration that is highly toxic to wild-type animals (100% lethality). rhy1(-) animals were resistant to 1000µM supplemental cysteine with a substantial fraction of the population surviving overnight exposure to this lethal dose of cysteine. Similarly, egl-9(-) mutant C. elegans were also resistant to 1000µM supplemental cysteine. We propose that loss of egl-9 or rhy-1 activates HIF-1-mediated transcription which is priming these mutants to cope with the lethal dose of cysteine. These data are now presented in Figure 3D-F and presented in the Results section.

      Issue 15: The cysteine exposure assay was performed by incubating nematodes overnight in liquid M9 media containing OP50 culture. The liquid culture approach adds two complications: (1) the worms are arguably starving or at least undernourished compared to animals grown on NGM plates, and (2) the worms are probably mildly hypoxic in the liquid cultures, which complicates the interpretation.

      Response 15: We agree that it is possible that animals growing overnight in liquid culture are undernourished and mildly hypoxic. However, we are confident in our data interpretation as all our experiments are appropriately controlled. Meaning, control and experimental groups were all grown under the same liquid culture conditions. Thus, these animals would all experience the same stressors that come with liquid culture. Importantly, we never make comparisons between groups that were grown under different culture conditions (i.e. solid media vs. liquid culture).

      Issue 16: An easily addressable concern is the wording of one of the main conclusions: that cdo-1 transcription is independent of the canonical prolyl hydroxylase function of EGL-9 and is instead dependent on one of EGL-9's non-canonical, non-characterized functions. There are several points in which the wording suggests that CDO-1 toxicity is independent of EGL-9. In their defense, the authors try to avoid this by saying, "EGL-9 PHD," to indicate that it is the prolyl hydroxylase function of EGL-9 that is not required for CDO-1 toxicity. However, this becomes confusing because much of the field uses PHD and EGL-9/EGLN as interchangeable protein names. The authors need to be clear about when they are describing the prolyl hydroxylase activity of EGL-9 rather than other (hypothesized) activities of EGL-9 that are independent of the prolyl hydroxylase activity.

      Response 16: We appreciate the reviewer alerting us to this practice within the field. To avoid confusion, we have removed the “PHD” abbreviation from our manuscript and explicitly referred to the “prolyl hydroxylase domain” where relevant.

      Issue 17: The authors state in the text, "the egl-9; suox-1 double mutants are extremely sick and slow growing." We appreciate that their "health" assay, based on the exhaustion of food from the plate, is qualitative. We also appreciate that it is a functional measure of many factors that contribute to how fast a population of worms can grow, reproduce, and consume that lawn of food. However, unless they do a lifespan assay and/or measure developmental timing and specifically determine that the double mutant animals themselves are developing and/or growing more slowly, we do not think it is appropriate to use the words "slow growing" to describe the population. As they point out, the rate of consumption of food on the plate in their health assay is determined by a multitude and indeed a confluence of factors; the growth rate is one specific one that is commonly measured and has an established meaning.

      Response 17: We see how the phrase ‘slow growing’ might imply a phenotype that we have not actually assessed with this assay. Therefore, we have removed all claims about “slow growth” of the strains presented in Table 1 and have highlighted the assay more overtly in the results section. For example; “While egl-9(-) and suox-1(gk738847) single mutant animals are healthy under standard culture conditions, the egl-9(-); suox1(gk738847) double mutant animals are extremely sick and require significantly more days to exhaust their E. coli food source under standard culture conditions (Table 1).”

      Reviewer #1 (Recommendations For The Authors):

      Issue 18: Relevance could be addressed further in the text.

      Response 18: We have added additional context for our work in the Discussion section. Please see our response to Issues #5, 6, 12, and 24.

      Issue 19: Better appreciation and integration of the manuscript's findings with published studies would be appropriate.

      Response 19: We have added additional context for our work in the Discussion section. Please see our response to Issues #5, 6, 12, and 24.

      Issue 20: It might be perhaps relevant to test whether cdo-1 is relevant for hypoxia resistance since it appears to be a key target for hif-1.

      Response 20: We agree that this is an interesting future direction, however given that cdo-1 mRNA is not induced by hypoxia (Shen et al. 2005) we have not prioritized these experiments for the current manuscript.

      Issue 21: "egl-9 inhibits cdo-1 transcription in a prolyl-hydroxylase and VHL-1-independent manner" should be tempered. vhl-1 mutants and egl-9 hydroxylase point mutant still have significant induction of the reporter.

      Response 21: Thank you for identifying this oversight. We have modified the Figure 5 legend title to read, “egl9 inhibits cdo-1 transcription in a largely prolyl-hydroxylase and VHL-1-independent manner.”

      Issue 22: Please use line numbers in the future for easier tracking of comments.

      Response 22: We shall.

      Issue 23: Abstract and elsewhere, "high cysteine activates...", should be rephrased to "high levels of cysteine".

      Response 23: We have made this change throughout the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Issue 24: The authors discuss CDO1 in the context of tumorigenesis, as well as the potential regulation between cysteine and the hypoxia response pathway. Thus, I was surprised that there was no mention of the foundational Bill Kaelin paper (Briggs et al 2016) showing how the accumulation of cysteine is related to tumorigenesis, and that cysteine is a direct activator of EglN1. Puzzling that CDO1 is a tumor suppressor: you lose it, cysteine can accumulate and activate EglN1, causing HIF1 turnover. How do the authors reconcile their results with this paper? I was also surprised that there was no mention in the Discussion of the role of hydrogen sulfide, cysteine metabolism, and CTH and CBS in oxygen sensation in the carotid body given the role they play there. Seems important to discuss this issue.

      Response 24: We have added new sections to our Discussion that consider the relationship between our work and Briggs et al. 2016 as well as mentioned the role of CTH and H2S in the mammalian carotid body.

      Issue 25: The abstract has a variety of contradictory statements. For example, the authors state that "HIF-1mediated induction of cdo-1 functions largely independent of EGL-9," but then go on to conclude in the final sentence that cysteine stimulates H2S production, which then activates EGL-9 signaling, which then increases HIF-1-mediated transcription of cdo-1. A quick reading of the abstract leaves the reader uncertain whether EGL-9 is or is not involved in this regulation of cdo-1 expression. In addition, the conclusion sentence implies that activation of the EGL-9 pathway increases HIF-1-mediated transcription, yet it is well established that EGL-9 is an inhibitor of HIF-1. The abstract fails to deliver a clear summary of the paper's conclusions. Perhaps consider this alternative (changes in capital letters):

      The amino acid cysteine is critical for many aspects of life, yet excess cysteine is toxic. Therefore, animals require pathways to maintain cysteine homeostasis. In mammals, high cysteine activates cysteine dioxygenase, a key enzyme in cysteine catabolism. The mechanism by which cysteine dioxygenase is regulated remains largely unknown. We discovered that C. elegans cysteine dioxygenase (cdo-1) is transcriptionally activated by high cysteine and the hypoxia inducible transcription factor (hif-1). hif-1- dependent activation of cdo-1 occurs downstream of an H2S-sensing pathway that includes rhy-1, cysl-1, and egl-9. cdo-1 transcription is primarily activated in the hypodermis where it is sufficient to drive sulfur amino acid metabolism. EGL-9 and HIF-1 are core members of the cellular hypoxia response. However, we demonstrate that the mechanism of HIF-1-mediated induction of cdo-1 IS largely independent of EGL-9 prolyl hydroxylASE ACTIVITY and the von Hippel-Lindau E3 ubiquitin ligase. We propose that the REGULATION OF cdo-1 BY HIF-1 reveals a negative feedback loop for maintaining cysteine homeostasis. High cysteine stimulates the production of an H2S signal. H2S then ACTS THROUGH the rhy-1/cysl-1/egl-9 signaling pathway DISTINCTLY FROM THEIR ROLE IN HYPOXIA RESPONSE TO INCREASE HIF-1-mediated transcription of cdo-1, promoting degradation of cysteine via CDO-1.

      Response 25: We agree that the abstract could be clearer. We believe this concern stems from the fact that we did not discuss our initial screen in the abstract. Thus, we failed to establish a role for egl-9 in the regulation of cdo-1. To remedy this, we have modified the abstract as suggested by the reviewer and added additional context. We believe that these changes improve the clarity of the Abstract substantially.

      Issue 26: An easily addressable concern involves the "dark" microscopy controls showing lack of fluorescence from a nematode. In these dark negative control micrographs, the authors should draw dotted outlines around where the worms are or include a brightfield image next to the fluorescence image. On a computer screen, it is in fact possible to make out the worms. Yet, when printed out, the reader must assume there are worms in the dark images. Additionally, we realize that adjusting fluorescence so that wild-type CDO-1 expression can be seen will result in oversaturation of the egl-9 and rhy-1; cdo-1 doubles; however, this would be a useful figure to add into the supplement to both provide a normal reference of CDO-1 low-level expression and a demonstration of just how bright it is in the mutant backgrounds. It would also be useful for you to please report your exposure settings for purposes of reproducibility.

      Response 26: As suggested, we have added dotted lines around the location of the C. elegans animals in all images where GFP expression is low or basal. We have also reported the exposure times for each image in the appropriate figure legends.

      Issue 27: This title is quite generic and doesn't even mention the main players (CDO-1 and sulfite metabolism).

      Response 27: We have updated our title to call attention to cysteine dioxygenase. The improved title is: “Hypoxia-inducible factor induces cysteine dioxygenase and promotes cysteine homeostasis in Caenorhabditis elegans”

      Issue 28: The authors mention two disorders in which CDO-1 plays a pathogenic role: MoCD and ISOD. We recommend switching the order in which the authors mention these, as the remainder of the paragraph is about MoCD. Also, they should write out the number "2" in the first sentence of that paragraph.

      Response 28: We have made the suggested changes.

      Issue 29: The authors state in the main text, "...to ubiquitinate HIF-1, targeting it for degradation by the proteosome." Here, they should refer to the pathway in Figure 5a.

      Response 29: We have made the suggested change.

      Issue 30: The authors state in the main text, "Elements of the HIF-1 pathway have emerged..." which is vague and confusingly worded. Change to, "Members of the HIF-1 pathway and its targets have emerged from C. elegans genetic studies."

      Response 30: We have made the suggested change.

      Issue 31: Clarify in the figure legends that supplemental cysteine did not affect the mortality of worms that were imaged.

      Response 31: We have added this note to Figure 3A and Figure S3A.

      Issue 32: Figure 1b. "the cdo-1 promoter is shown..." Add: "as a straight line" to the end of this phrase.

      Response 32: We have made the suggested change.

      Issue 33: The authors should consider changing the red text in Figure 1 to magenta, which tends to be more readable for people who have limited color vision.

      Response 33: We have adjusted the colors in Figure 1 as suggested.

      Issue 34: Figure 2, legend title. Consider changing "hif-1" to "HIF-1," as well as rhy-1, cysl-1, and egl-9. In this case, they are talking about proteins, not mutants or genes. This will make the paper easier to follow for readers who lack a C. elegans background.

      Response 34: We have made the suggested change.

      Issue 35: Figure 5, caption text. "...indicates weak similarity." Add, "amongst species compared."

      Response 35: We have made the suggested change.

      Issue 36: It is starting to become a standard for showing the datapoints in bar graphs. Although this is done in many graphs in the paper, it should also be done for Figure S1 and Figure 4C.

      Response 36: We have made the suggested change.

      Issue 37: An extensive ChIP-seq and RNA-seq analysis of C. elegans HIF-1 was recently published (Vora et al, 2022), which the authors should reference in support of the regulation of CDO-1 transcription by HIF-1 in their description of published expression studies of the pathway (Results section, page 4). Indeed, Vora et al were key generators of the ChIP-seq data cited in Warnhoff et al but not included as authors in the ModERN/ModENCODE publication: their contributions were published separately in Vora et al and should be acknowledged equivalently.

      Response 37: We appreciate the reviewer pointing this detail out and we have added the correct citation as indicated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some suggestions:

      1) It's obviously concerning that your GWAS results are not at all robust to the approach used (Fig S3). Did you try something non-parametric, like a Kruskal-Wallis test?

      We used both GWAS and crosses (F2) to validate the presence of the QTL. So ,evidence is not only brought by GWAS. We did not use non parametric tests as we will have difficulty to account for population structure/relatedness with such approaches. Our GWAS approach is certainly a little underpowered associated with the number of individuals we used and certainly the polygenic nature of the root growth traits. But F2 crosses allow us to put more evidence weight on some region we identified with GWAS.

      2) You don't explain what you do with heterozygotes, nor discuss the level of inbreeding in general.

      We are dealing with inbred lines, but indeed there are not completely fixed inbred lines. For the remaining heterozygotes, they were randomly fixed in one or the other alleles. The median heterozygosity value was low at 5.6%. We clarified this point in the material and methods.

      3) The finding that over 30% of RNA-seq reads don't seem to have an annotated home should give you pause. Do they map anywhere? At least discuss what is going on. Also, note that you likely have enormous errors in SNP-calling due to cryptic structural variation - think about what this might do?

      We agree with reviewer #1. We added a few sentences in the result section to clarify this point: “When further analyzed, 15.15% of the unmapped reads (with no correspondence to predicted CDS) were found not to match the reference genome. These might correspond either to unsequenced regions or to genotype-specific genomic regions that are not present in the reference line. The remaining unmapped reads corresponded to either rRNA and tRNA genes (40.28% of the unmapped reads) or to non-annotated genes or non-coding RNAs (44.57% of the unmapped reads).” As we used the same reference genome for mapping the RNAseq reads, some genes might not being present in our analysis for the two lines we studied.

      4) Did you consider moving PgGRXC9 into Arabidopsis?

      This is a great suggestion. In fact, we plan to explore more how some GRXs regulate root growth and how this is conserved in plants in a follow up project. This is however beyond the scope of this manuscript.

      Minor suggestions:

      1) Why not calculate H^2 simply as line variance divided by total?

      Heritability estimated on single individuals in population, approaches generally used for human and animal breeding led directly to line variance divided by total phenotypic variance.

      But in plant breeding (or plant science), we generally work on replicated genotypes in different blocks/experimental repetition. So we estimate the heritability of the mean phenotype of genotypes. There is ample literature (Nyquist, 1991; Holland et al. 2003; for a very nice and smartly written explanation, on the introduction of this PhD: http://opus.uni-hohenheim.de/volltexte/2020/1720/pdf/20200221_PhD_Thesis_Publikationsversion.pdf). Calculation of heritability (of the mean phenotype) should take into account for the calculation of the phenotypic variance (denominator) the number of replicate genotypes (we do not have a single plant, but several clones when using inbred lines: n). The meaning of the formula is that the error in the model is inflated because we have n replicate plants per genotype. And so to estimate the heritability of the average genotype, we have to take into account this inflated variance in the errors.

      2) While the paper overall is well-written, the captions need further proof-reading.

      We corrected all the captions.

      Reviewer #2 (Recommendations For The Authors):

      Major suggestions:

      1) The experimental support for the mutant phenotype of roxy19 needs to be further substantiated. Current methods available for CRISPR mutagenesis make it relatively easy to generate additional alleles. Alternatively, the authors could complement the mutant with a wild-type copy of the gene. These approaches represent the standard of the field and should be used here as well.

      We agree with rev #2. We added some sentences in the discussion to stress out the limitations of our study to link the QTL to PgGRXC9.

      As stated above we’d like to explore more how some GRXs regulate root growth and how this is conserved in plants. We plan to generate new single and multiple mutants in ROXY19 and its closest homologues (using CRISPR). This is, however, beyond this manuscript.

      2) The authors may want to state more clearly what the hypothesis is for how redox levels might contribute to root length differences and more clearly state what the limits of their current study are.

      We modified the discussion to try to clearly indicate the limitations of our study.

      3) Differences in root growth can be the consequence of a number of different parameters that contribute to root elongation and the authors need to more clearly define which of these are likely affected in their different genotypes.

      We agree with Reviewer #2. However, as stated before, we plan to further explore the molecular and cellular mechanisms responsible for the phenotype we observe in Arabidopsis. This will need extra work and is beyond the scope of this manuscript.

      4) Page 13, first paragraph. The authors provide an overly strong statement that suggests they have determined the molecular basis for the difference in PgGRXC9: " Altogether, our results suggest that PgGRXC9 is a positive regulator of root growth and that a polymorphism in the promoter region of PgGRXC9 associated with changes in its expression level appeared responsible for a quantitative difference in root growth between the two lines."

      While their results suggest the PgGRXC9 locus is associated with root growth variation, they have not directly tested the effect of the polymorphisms in the promoter on gene expression and this statement needs to be weakened.

      We changed the text to: “Altogether, our results suggest that PgGRXC9 is a positive regulator of root growth and that a polymorphism in the promoter region of PgGRXC9 might led to changes in its expression level and ultimately to a quantitative difference in root growth between the two lines. However, the effect of the polymorphisms in the promoter on gene expression need to be tested to validate this hypothesis.”

      We also changed the title of the manuscript to better reflect our results.

      Minor suggestions:

      1) Page 4: "FTSW below 0.3 was considered a stressful condition." It was not specified how this threshold was determined.

      This value corresponds to the measured FTSW value at which pearl millet genotypes subjected to a dry down generally start to reduce their transpiration rate (see Fig. 1 of Kholová et al, 2010; https://doi.org/10.1093/jxb/erp314). At FTSW values above 0.3, transpiration is not affected. At FTSW values around 0.3, the water supply from pearl millet roots cannot fully support transpiration. The plant enters a drought stress responsive phase and progressively closes its stomata to reduce water losses and decrease plant productive functions to match water supply. We have clarified this in the manuscript.

      2) Page 6: Figure 1; footnote: at the end of the description of panel A, a comma is missing between "red" and "blue."

      Thanks for pointing that out. This was corrected.

      3) The root growth data determined by X-ray imaging is not significant (Fig S4B), yet the authors describe the result in the main text without qualification. The authors should clarify this in the text.

      We added some text to clarify this.

      4) Page 9: Figure 2C; It would be better to enlarge these images and annotate them to indicate what specific anatomical features have been measured. Currently, only an expert in the field would be able to interpret these images.

      While we understand the point made by Reviewer #2, Fig2C was meant to illustrate differences in the root tip of the two lines.

      5) Page 9: Figures 2D and E; the number of biological samples measured is not indicated (what is "n"?).

      Thanks again for pointing this out. This was added to the figure legend.

      6) Page 14: Figure 4B; scale bar needs to be included.

      Scale bars were added to the pictures.

      7) Page 14: Figure 4; I recommend adding confocal images or DIC of cleared root apex tissues to easily compare the RAM size and cell lengths in both WT and roxy19 mutant.

      Once again, we plan to have a follow up study on the molecular and cellular mechanisms of action of ROXY19 and its closest homologues on root development. We believe a thorough analysis of differences in phenotype could be illustrated in a future manuscript.

      8) Page 18: main text; "we propose that redox regulation in the root meristem is responsible for a root growth QTL in pearl millet." This statement is ambiguous in the description of the mechanism. The authors do not clarify if the role they propose for PgGRXC9 is in the meristematic or elongation zone. Likely the authors are not able to know precisely where the gene is acting at this point, and so the presented hypothesis needs to more clearly state what limitations there are in assigning a mode of action for the PgGRXC9 and ROXY19 genes in root growth.

      We rewrote this paragraph to clarify the current gap in our understanding of the putative PgGRXC9 function.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Editor and the referees for their questions and remarks. In this document we provide a point-by-point response to revisions requested by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Jafarinia et al. have made an interesting contribution to unravelling the molecular mechanisms underlying pathological phenotypes of repeat expansion of the C9orf72 gene. The repeat expression leads to the expression of polyPR proteins. Using coarse-grained molecular dynamics simulations, the authors identify putative binding partners involved in nucleocytoplasmic transport (NCT), and that conjecture that polyPR affects essential processes by binding to NCT-related proteins. The results are well-reported, but only putative, and need experimental support to be more conclusive. Also, a comparison with results from all-atom MD simulations in explicit water could help verify the results. But even without these, the work is very useful as a first step to unravel the role of polyPR and related peptides.

      We greatly appreciate the reviewer's positive assessment of our work and the suggestions. We acknowledge the need for more experimental validation of the binding behavior of some of the transport components. Our results coincide with the experimental findings of Hutten et al. [1] ([16] in our paper) for example regarding the binding of polyPR to Kapβs and Impαs, but experimental validation of additional transport components, especially for RanGAP, would be valuable. We hope that our work will inspire colleagues from the field to actually perform such experiments.

      We also agree with the reviewer's suggestion that all-atom simulations can provide further details on the molecular conformations at the local NTR-PR binding regions. Nonetheless, such simulations for all transport components, particularly for interactions involving large conformational flexibility of longer polyPR chains such as PR50, would require significant computational expenses. In a recent publication (Jafarinia et al. [2]) we reported on the close resemblance in binding behavior between our coarse-grained MD data and the all-atom MD simulations of (Nanaura et al. [3]), both showing polyPR binding to a negatively-charged cavity of Kapβ2. We expect future MD simulations to elucidate more atomistic detail with the continuously increasing power of high-performance computing clusters.

      Reviewer #2 (Public Review):

      This study used coarse-grained molecular dynamics simulation to explain how the binding of polyPR might interfere with distinct stages of the transport cycle. This finding shows that the interaction between polyPR and transport components is driven by electrostatic interactions and is correlated with the salt concentration and the length of polyPR, providing an important basis for subsequent exploration of the impact of C9orf72 R-DPRs on NCT disruption.

      We appreciate the reviewer's positive feedback and the recognition of the significance of our work.

      Reviewer #3 (Public Review):

      Onck and co-workers present in this work the identification of binding partners and sites of polyPR on various nuclear transport components and elucidate how polyPR might potentially influence the transport process. It's interesting to note that some interaction sites on transport components also serve as their inherent/functional binding sites. The difference in the effects between short polyPR (PR7) and long polyPR (PR50) is also evident, although the authors might need to clarify the mechanisms better. Overall, the manuscript is well organized and concisely written, and it would greatly enhance our understanding of the toxicity induced by polyPR. In general, the 1-bead per atom force field model used in the study is well-tuned for studying the interactions between polyPR and proteins, as the essential cation-pi interactions (between Arg and Phe/Tyr/Trp) were included using an 8-6 LJ model.

      We thank the reviewer for recognizing the suitability of our 1-bead-per-amino-acid force field for studying R-DPRs' interactions with transport components and for acknowledging our work's contribution to understanding polyPR toxicity mechanisms. Below we comment on the mechanisms describing the difference between short and long polyPR molecules.

      Recommendations for the authors:

      1) Regarding Figure 2 (also see below for more specific comments), there is a major concern that the dipole moment is not included in Fig 2b (as the correlation is better with f=0), but the authors still conclude that this is generally important (lines 258-261). As a minimum, this needs to be discussed more carefully. Is f (i..e. the importance of dipole moment for binding) dependent on the specific binding partner, or what is going on? Maybe, there is a good explanation?

      Indeed, the significance of the dipole moment depends on the specific type of transport component involved. Our analysis reveals that for Kapβs, see figure 2b, the best-fit is obtained with f=0, indicating that the separation of charge within Kapβs has a relatively minor effect on their interaction with polyPR. Instead, the primary determinant for polyPR-Kapβ interaction appears to be the net charge per residue (NCPR), with a more negative NCPR leading to stronger interactions.

      We attribute this behavior to the structural characteristics of Kapβs, particularly the superhelical structure which features inner and outer surfaces with differing charge distributions. Importantly, this structural arrangement creates an inner surface characterized by a negative electrostatic potential. As demonstrated in our previous work, polyPR predominantly binds to this negatively charged cavity within Kapβs. Consequently, the separation of charges on the Kapβ surface becomes less influential compared to the overall charge. Other transport components, however, depicted in figure 2a, do not share this feature and the distribution of charges over the surface becomes a more critical factor in polyPR interactions. We have now added this explanation to page 6, and emphasized in the conclusion section that the effect of dipole moment is only observed for the transport components in figure 2a.

      2) Write out nucleoporin, Nup, at first appearance (line 51).

      We have changed it in line 51.

      3) Fig 1: a (representative) CG structure of polyPR (PR7,PR20 and PR70) would be very useful.

      We have added a CG representation of PR7 and PR20 to figure 1.

      4) Please use chi-square, not R-square, to evaluate the fit, as chi-square takes experimental errors into account.

      We use R-square as a standard measure to assess the quality of the fit in the simulations, as it considers the summation of residuals. This choice aligns with the methodology we have used in our previous publications and therefore prefer to use this measure here as well.

      5) Please use a dot (not a full stop) for multiplication in line 151 and Figure 2 legend.

      We made the adjustment in line 151, the caption of figure 2, and the y-axis label of figure S2.

      6) 330: it is very unconventional to plot half the std dev as an error bar. Please plot the std dev (standard error) of the mean.∙

      We made the suggested change and now the error bars in figure 2 are standard errors of the mean (SEM) calculated from block averaging with three blocks at equilibrium. We also amended the caption of figure 2 and the Methods section.

      7) Please write an explicit equation for the linear relation that is plotted in Figure 2. Something like: C_t = a(NCPR - fM/Rg)+b ? That would make it easier to read.

      We have now added the linear equation of the fit to a new table S4, and included a reference to it in the caption of figure 2.

      8) Fig 2: why is the fit to PR7 not reported/shown?

      The fits for PR7 resulted in R2 values of 0.89 (a) and 0.83 (b) for 200M and of 0.7 (a) and 0.59 (b) for 100 mM. Because of the low R2 values for 100 mM, the fits for PR7 are not shown. We have added this explanation to the caption of figure 2.

      9) Fig 4: isn't the blue shape KapB (and not importin)?

      We changed "importin" to "Kapβ Imp" for consistency.

      10) In the interest of reproducibility, a recommendation is to make the scripts for setting up, running, and analyzing the simulations freely available, e.g. at GitHub. This will increase reproducibility and transparency.

      At the moment we do not have the scripts available on GitHub. However, codes can be provided by the authors upon reasonable request, as also mentioned in the data availability statement in the paper.

      11) Can the authors explain the salient advances in this article versus the one published last year?

      In our previous work, we showed that polyPR binds to the Kapβ family of nuclear transport receptors (NTRs), consistent with experimental findings. While this provided valuable insights, it was essential to broaden our investigation as C9orf72 toxicity not only affects the Kapβ family of NTRs but also disrupts other key regulators of NCT. For instance, recent literature (see lines 87-91 in our paper) showed that Ran and its regulators RanGAP and RanGEF are mislocalized in cells expressing R-DPRs, and genetic screening studies have identified several nucleocytoplasmic transport genes as modifiers of R-DPR-mediated toxicity.

      In the present study, we therefore delved deeper into the underlying mechanisms of polyPR-modification of NCT. We focused on exploring whether polyPR directly interacts with Impα isomers, CAS/Cse1, RanGEF, RanGAP, Ran, and NTF2. By doing so, we unveiled a network of direct interactions between polyPR and a remarkably wide range of NCT components. This newfound insight is valuable for interpreting existing experimental findings, such as the mislocalization of RanGAP. We also demonstrate that polyPR binding is influenced not only by factors such as the net charge per residue and the polyPR chain length, as previously observed for Kapβs, but also by the spatial separation of charges, incorporated by an additional dependence on dipole moments in influencing the total number of contacts with polyPR. This sheds new light on how polyPR interacts with numerous targets within the cellular environment, providing a valuable reference for future (experimental) investigations of R-DPR-compromised nuclear transport. These points are explained in the last paragraph of the introduction and paragraphs 2,3 of the conclusion section. Paragraph 2 of the conclusion is also modified for clarification.

      12) In Figure 2(a), the vertical coordinates of the first graph do not match the others.

      We have now modified figure 2a left panel to match the others.

      13) When the polyPR length is large enough, it seems that the binding of polyPR to RanGEF and NTF2 is not significantly improved.

      The binding behavior depends on polyPR length, as well as on the net charge per residue and the dipole moment (expressed as NCPR-fM/R_g). We note that the number of contacts in figure 2 is normalized by the polyPR length so that for both NTF2 and RanGEF the total number of contacts increase with length (PR7 to PR20) when binding occurs. Specifically, for RanGEF, especially at lower ion concentrations (100 mM), PR7 and PR20 exhibit a similar number of contacts per unit length of polyPR. This implies that the absolute number of contacts between PR20 and RanGEF is higher than that of PR7. However, as we extend the polyPR length to PR50, there is a reduction in the number of contacts per unit length of polyPR. This phenomenon indicates that the more extended PR50 has regions that make little to no contact with RanGEF, resulting in a smaller number of contacts per unit length for PR50. Lines 188-195 are now modified to put more emphasis on the difference between number of contacts and number of contacts normalized by polyPR length.

      14) The representation of the mechanism in Figure 4 is not intuitive enough and the color scheme still needs to be improved.

      We have tried to improve clarity by including the names of each transport component next to their schematic representations.

      15) Figure 3 shows that the longer polyPR exhibits a higher contact probability with individual residues compared to a shorter polyPR, is this result in conflict with Figure 2?

      We re-iterate here that the number of contacts in figure 2 is normalized by the polyPR length, while the results in Fig. 3 are not.

      Figure 3 and figure S4 demonstrate that as the length of polyPR increases, the contact probability of individual residues of transport components for interaction with polyPR also increases.

      In figure 2, we have normalized the time-averaged number of contacts by the length of polyPR. For example, in the top-right panel of figure 2a, when comparing results for PR7 with PR50 interaction with RanGAP, a higher value for PR7 indicates that PR7 makes more contacts per unit of its length with RanGAP. In terms of absolute number of contacts, however, the PR50 chain makes more contacts with RanGAP, resulting in a higher contact probability. We now added a sentence (see lines 188-189) for clarification.

      In summary, when a short polyPR strongly binds to a transport component (evidenced by a relatively large number of contacts), it makes more contacts per unit length than a large poyPR. This occurs because for shorter polyPRs most of the residues come into contact with the target protein. In contrast, for longer polyPRs, only certain parts of the chain are in contact with the transport components, while other regions make fewer or no contacts. This is explained in lines 188-195.

      16) In S2 and S3, does the data require an error bar?

      NCPR, defined as total charge divided by sequence length of the transport components, is a constant and therefore figure S3 does not require an error bar.

      In figure S3 we have added error bars (standard deviation) for the dipole moment calculated from 2.5 us simulations of the isolated transport components.

      17) What is the physiological significance when the salt concentration is 100 mM?

      We conducted simulations at two different salt concentrations: 200 mM, which aligns with in vitro conditions as reported in Hutten et al. [1], and a lower 100 mM salt concentration. The inclusion of the 100 mM salt concentration enables us to assess the significance of salt concentration, and to confirm the dominance of electrostatic interactions in polyPR binding. We also note that this range of salt concentration is commonly used in in-vitro experiments [1, 4, 5].

      18) Please introduce abbreviation NLS in the abstract.

      We added the full name of NLS to the abstract.

      19) Given the high number of Arg residues in its sequence, polyPR should interact with many proteins. It would be beneficial to discuss the frequency of binding/non-binding interactions of polyPR with nuclear transport components in comparison to general proteins.

      We appreciate the reviewer's comment. While such a comparison is indeed interesting, our study primarily focused on elucidating the interactions between polyPR and crucial nuclear transport components, aiming to provide insights into potential defects in nucleocytoplasmic transport. The broader comparison of polyPR interactions with different protein classes in the proteome is indeed an interesting direction for future research, but out of the scope of the current manuscript.

      20) The authors should provide a convergence check to determine whether the 2.5 µs simulations are sufficient for sampling the interaction modes, particularly with the long PR50.

      We have included a new figure (figure S5) and additional text in the Methods section to verify that extending the simulation duration does not alter the contact probabilities (which are indicators of binding modes) presented in figure 3a, confirming convergence of our computations.

      21) In reference to Figure 4, the upper panel merely summarizes the known transport mechanisms, while the lower part (A-H) provides potential novel insights from this study. Unfortunately, these novel insights are not sufficiently detailed. It is recommended to include more details to make these relevant plots clearer by expanding the corresponding discussions (currently, only the last paragraph in the Results section addresses these). If possible, the authors should also carry out some CG simulations of the most relevant processes to further elucidate the interference caused by polyPR.

      We have taken the reviewer's feedback into consideration and made the suggested revisions. Specifically, we have expanded the last paragraph of the discussion to provide more detailed explanations of the insights derived from our computational model. For each mechanism, we begin by presenting the reader with the baseline understanding of normal function of the transport component. Subsequently, we discuss how the findings presented in figures 2 and 3 offer insights into polyPR's potential interference with the function of NCT components. Furthermore, we have made improvements to the schematic representation of mechanisms in figure 4 to enhance clarity.

      At the moment, accurately capturing the binding of NCT components to their native binding targets and the competition with polyPR are best resolved by all-atom molecular dynamics simulations, which come with significant computational demands. This level of detail and computation-intensive analyses is beyond the scope of the current study, but we hope that our results will provide the groundwork for future, more detailed investigations.

      References

      1. Hutten, S., et al., Nuclear Import Receptors Directly Bind to Arginine-Rich Dipeptide Repeat Proteins and Suppress Their Pathological Interactions. Cell Rep., 2020. 33(12): p. 108538.

      2. Jafarinia, H., E. Van der Giessen, and P.R. Onck, Molecular basis of C9orf72 poly-PR interference with the β-karyopherin family of nuclear transport receptors. Sci. Rep., 2022. 12(1): p. 21324.

      3. Nanaura, H., et al., C9orf72-derived arginine-rich poly-dipeptides impede phase modifiers. Nat Commun, 2021. 12(1): p. 5301.

      4. Brady, J.P., et al., Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. Proceedings of the National Academy of Sciences, 2017. 114(39): p. E8194-E8203.

      5. Fisher, R.S. and S. Elbaum-Garfinkle, Tunable multiphase dynamics of arginine and lysine liquid condensates. Nat. Commun., 2020. 11(1): p. 4628.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. provide valuable data for understanding molecular features of the human spinal cord. The authors made considerable efforts to acknowledge and objectively address the limitations of Visium while attempting to overcome them by utilizing single-nucleus RNA sequencing (snRNA-seq) from the same tissue. By mapping snRNA-seq clusters to Visium data, they offer spatial information, complemented by RNA-ISH and immunofluorescence (IF) validation. They also discuss gender-related differences and the similarities between human and mouse data, aiming to establish a crucial foundation for experimental research. However, I have some comments below.

      1) The observation of gender-related differences is interesting. The authors reported that SCN10A, associated with nociceptos, exhibited stronger expression in females. While they intend to validate this finding through IF, the quantitative difference is not clearly observed in the IF data (Figure 5f). It would be essential to provide validation through DAPI-based cell counts, demonstrating the difference in CHAT/SCNA10A co-expression.

      Thank you for this important question! We have added panel G in Figure 5, which provided the quantitative analysis of the percentage of CHAT neurons that expressing SCN10A in male and female spinal cord.

      2) It is meritorious that in novel features of the transcriptomic study, the authors considered gender-related differences and similarities between humans and mice. Nevertheless, despite the extensive bioinformatics-based analyses performed, the results mostly confirm what has been previously reported (Nguyen et al. 2021; Yadav et al. 2023; Jung et al. 2023).

      Thank you! In addition to confirming the findings from previous studies, our results also provided new information regarding the difference between human and mouse. For example, we found that PVALB and SST showed broader expression across human DRG neuronal clusters than in mice, suggesting that genes are more selectively expressed in mice than in human DRGs. Moreover, we identified several genes associated with pain that were differentially expressed in motor neurons between sexes.

      3) The study did not perform snRNA-seq in the DRG. The limitations of Visium in cell type separation are acknowledged, and the authors are aware that Visium alone has limitations in describing cell expression patterns. The authors need to validate their findings via analyses of public DRG snRNA-seq data (Jung et al. 2023 Ncom; Nguyen et al. 2021eLife) before drawing broad conclusions.

      Thank you for this critical question! It is right that snRNA-seq has a higher resolution in describing cell expression patterns compared to the spatial transcriptomics. We acknowledged the limitation that we only performed spatial transcriptomics in human DRG without snRNA-seq. Nevertheless, our results of spatial transcriptomics in human DRG were similar to previously public snRNA-seq data of human DRG, suggesting a feasibility of using spatial transcriptomics in human DRG.

      4) Figure 7's comparison between human Visium spot data and Renthal et al.'s mouse snRNA-seq may have limitations as Visium spot data could not provide a transcriptional profile at the single cell resolution. The authors need to clarify this point.

      Thank you! We have clarified this in the limitation section.

      5) Recent findings indicate that type 2 cytokines can directly stimulate sensory neurons. This includes the expression of IL-4RA, IL31RA, and IL13RA in DRG. These findings support the role of JAK kinase inhibitors in mediating chronic itch. Demonstrating the expression of these itch receptors in DRG would be valuable.

      We have provided the expression patterns of IL-4RA, IL31RA, and IL13RA in human and mouse DRG (Figure 7-figure supplement 4), and cited the relevant paper.

      6) Given that juxtacrine and paracrine signals operate from 0 to 200 um, spatial information is vital to understanding intercellular communication. The presentation of spatial information using Visium is meaningful, and more comprehensive analyses of potential interaction based on distance should be provided, beyond the top 10 interactions (Figure 8).

      Thank you for this good question! In this study, we focused on the putative projections from DRG to spinal neuronal types, which may be an important future direction for research on sensory transduction. It will be interesting to determine the intercellular communication in the spinal spot using the spatial transcriptomics data in future studies.

      7) The gender-related differences are interesting and, if possible, it would be interesting to explore whether age-related differences or degeneration-related factors exist. Using public data could allow the examination of age-related changes.

      We agree with the reviewer that it is of great importance to identify the age-related differences using spatial transcriptomics and scRNA-seq data of human spinal cord. However, it is currently difficult to obtain comprehensive results due to the limited human spinal cord datasets regarding different ages.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors generated a comprehensive dataset of human spinal cord transcriptome using single-cell RNA sequencing and the Visium spatial transcriptomics platform. They employed Visium data to determine the spatial orientation of each cell type. Using single-cell RNA sequencing data, they identified differentially expressed genes by comparing human and mouse samples, as well as male and female samples.

      Strengths:

      This study offers a thorough exploration of both cellular and spatial heterogeneity within the human spinal cord. The resulting atlas datasets and analysis findings represent valuable resources for the neuroscience community.

      Weaknesses:

      The analysis of spatial transcriptomics data was conducted as it is single-cell RNAseq data. However, there are established tools for effectively integrating these two types of data. The incorporation of deconvolution methods could enhance the characterization of each spot's cell type composition.

      Thank you very much for your positive comments and suggestions!Indeed, we have used deconvolution methods to incorporate the spinal snRNA-seq and spatial transcriptomics data.

      Reviewer #3 (Public Review):

      Summary:

      Zhang et al sought to use spatial transcriptomics and single-nucleus RNA sequencing to classify human spinal cord neurons. The authors reported 17 clusters on 10x

      Visium slides (6 donors) and 21 clusters by single-nucleus sequencing (9 donors). The authors tried to compare the results to those reported in mice and claimed similar patterns with some differing genes.

      Strengths:

      The manuscript provides a valuable database for the molecular and cellular organization of adult human spinal cords in addition to published datasets (Andersen, et al. 2023; Yadav, et al. 2023).

      Weaknesses:

      The results are largely observatory and lack quantitative analysis. Moreover, the assertions regarding the sex differences in motor neurons and the potential interactions between DRG and spinal cord neuronal subclusters appear preliminary and necessitate more rigorous validation.

      Thank you very much! We have provided the quantitative analysis of the differential expression of SCN10A in male and female spinal cord motor neurons. Our sequencing data revealed putative projections from DRG to spinal neuronal types, which may be an important future direction for research on sensory transduction. We did not use animal models to verify these interactions between DRG and spinal cord neuronal subclusters, which is a major limitation in our study. Nevertheless, our analysis results will provide an important resource for future research to investigate the molecular mechanism underlying spinal cord physiology and diseases.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents important findings for the field of Alzheimer's disease, especially for the electrophysiology subfield, by investigating the temporal evolution of different disease stages typically reported using M/EEG markers of resting-state brain activity. The evidence supporting the conclusions is solid and the methodology as well as the descriptions of the processes are of high quality, although a separation of individuals who are biomarker positive versus negative would have strengthened the interpretability of the results and the conclusions of the study.

      Response: Thank you for the positive assessment of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to infer the trajectories of long range and local neuronal synchrony across the Alzheimer's disease continuum, relative to neurodegeneration and cognitive decline. The trajectories are inferred using event-based models, which infer a set of data-driven disease stages from a given dataset. The authors develop an adapted event-based modelling approach, in which they characterise each stage as a particular biomarker increasing by a particular z-score deviation from controls. Fitting infers the optimal set of z-scores to use for each biomarker and the order in which each biomarker reaches each z-score. The authors apply this approach to data from 148 individuals (70 cognitively unimpaired older adults and 78 individual with mild cognitive impairment or Alzheimer's disease), identifying trajectories in which long-range (amplitude-envolope correlation) and local (regional spectral power) neuronal synchrony in the alpha and beta bands becomes abnormal prior to neurodegeneration (measured as the volume of the parahippocampal gyrus) and cognitive decline (measured using the mini-mental state examination).

      Strengths:

      • The main strength is that the authors assess two models. In the first they derive a staging system based only on the volume of the parahippocampal gyrus and mini-mental state examination score. They then investigate how neuronal synchrony metrics change compared to this staging system. In the second they derive a staging system that also includes an average (combined long-range and local) neuronal synchrony metric and investigate how long-range and local synchrony metrics change relative to this staging system. This is a strength as the first model provides confidence that there is not overfitting to the neuronal synchrony data, and the second provides more detailed insights into the dynamics of the early neuronal synchrony changes.

      • Another strength is that the authors automatically infer the optimal z-scores to choose, rather than having to pre-select them manually, as in previous approaches.

      Response: Thank you for the positive comments and a succinct summary of the paper and its strengths.

      Weaknesses:

      • The dataset is small and no external validation is performed.

      Response: We agree that future validation studies of the predictions are necessary. We now include the related sentences in the last paragraph of the limitations section in the revised manuscript.

      • A high proportion of the data is from controls (nearly 50%) with no biomarker evidence of Alzheimer's disease, and so the changes may be driven by aging or other non-Alzheimer's effects.

      Response: We would like to clarify that the z-scores of the metrics used in the EBMs were computed using age-adjusted values. All our controls were recruited from an ongoing longitudinal study of healthy aging. Amongst the 70 controls, 39 have confirmed A-beta negative PET scans and 8 were confirmed A-beta positive PET scans, and in the rest of the 23 we do not have any biomarker data available. However, in all the controls, we have conducted comprehensive neuropsychological assessment (see Appendix 1—table 1 in the revised supplementary file) and based on this data we can be quite confident about their lack of clinical deficits, and we have a very high degree of confidence that none of the controls have any neurodegeneration (AD-related or otherwise). Consistent with this assessment, in our EBM analyses, most of the control participants were indeed categorized to the preclinical stages.

      • Inferring the optimal z-scores is a strength, however as different sets of z-scores are allowed per biomarker, there is a concern that the changes reflected are mainly driven by the choice of z-score, rather than the markers themselves (e.g. if lower z-scores are selected for one marker than another, then changes in that marker will appear to be detected earlier, even if both markers change at the same time).

      Response: Indeed, the biomarker sequence depends on the choice of the z-scores per biomarker. However, please note that our choice of z-scores is based on maximizing the sequence likelihood. Therefore, other values of the z-scores will have by construction a smaller likelihood of sequence occurrence compared to the results shown.

      • In equation 2 it is unclear why the gaussian is measured based on a sum over I. The more obvious choice would be to use a multivariate gaussian with no covariance, which would mean taking the product rather than the sum over I.

      Response: We thank the reviewer for pointing this out and we now clarify this point. In this revision, we do not use the term ‘multivariate’. Indeed, the model likelihood assumes independence for each metric’s priors, and hence is the product of each metric’s univariate gaussian probability distribution. This can be seen in equations 1 and 2 of the revision manuscript (Section titled “Event-based sequencing modeling’). The assumption about independent priors is similar to the one used in the original event-based model (see equation (2) in A .L. Young et al., Nature Comm. 9.1 (2018): 4273).

      • In the original event-based model, k is a hidden variable. Presumably that is also the case here, however the notation k=stage(j) makes it seem like each subject is assigned a stage during the sequence optimisation.

      Response: We would like to clarify that the posterior probability of each stage for every subject is estimated during the sequence optimization. To clarify the notation, we have now deleted the term “stage” and use “tj” to denote stages for each subject j. The sequence optimization was performed with the assumption of a uniform prior distribution p(tj=k) = 1/(N+1) for each stage k. Then, the posterior probability p(tj=k|Zj,S), i.e., the probability that subject j belongs to stage k, given the metrics and the sequence, was computed during the sequence optimization procedure.

      • Typically for event-based modeling, positional variance diagrams are created from the markov chain monte carlo samples of the event sequence, enabling visualisation of the uncertainty in the sequence, but these are not included in the study.

      Response: In the revised supplementary file, we have now included positional uncertainty diagrams for the optimal set of z-score events that were created from 50,000 MCMC samples. Please see Appendix 1—figure 2 for the AC-EBM and Appendix 1—figure 9 for the SAC-EBMs.

      • Many of the figures in the manuscript (e.g. Figure 1E/G, Figure 2A/B, Figure 3A/B/E/F/I/J, Figure 4 A/B/E/F/I/J) are based on averages in both the x and the y axis. In the x dimension, individuals have a weighted contribution to the value on the y axis, depending on their stage probability. In the y dimension, the values are averages across those individuals, and the error bars represent the standard error rather than the standard deviation. Whilst the trajectories themselves are interesting, they may not be discriminative at the individual level and may be more heterogeneous than it appears.

      Response: In the current study, the predictions of trajectories are intended at the cohort level. Individual level investigations will be the topic of future investigations.

      • The bootstrapped statistical analyses comparing metrics between the stages do not consider the variability in the sequence.

      Response: Please see the response above. The positional uncertainty diagrams are included in the revised supplementary file.

      Reviewer #2 (Public Review):

      Summary:

      This work presented by Kudo and colleagues is of great importance to strengthen our understanding of electrophysiological changes in the course of AD. Although the main conclusions regarding functional connectivity and spectral power change through the course of the disease are not new and have been largely studied and theorised on, this article offers an innovative approach that certainly consolidates previous knowledge on the topic. Not only that, this article also broadens our knowledge presenting useful and important details on the specificity of frequency and cortical distribution of these early alterations. The main take-home message of this work is the early disruption of electrophysiological signatures that precedes detectable alterations in other more commonly used pathology markers (i.e. gray matter atrophy and cognitive impairment). More specifically, these signatures include long-range connectivity in the alpha and beta bands, and local synchrony (spectral power) in the same frequency bands.

      Response: Thank you for the positive comments and for providing a nice succinct summary.

      Strengths:

      The present work has some major strengths that make it paramount for the advance of our understanding of AD electrophysiology. It is a very well written manuscript that, despite the complexity of the analyses employed, runs the reader through the different steps of the analysis in a pedagogic and clever way, making the points raised by the results easy to grasp. The methodology itself is carefully chosen and appropriate to the nature of the question posed by the researchers, as event-based models are well-suited for cross-sectional data.

      The quality of the figures is outstanding; not only are they aesthetic but, more importantly, the figures convey information exceptionally well and facilitate comprehension of the main results.

      The conclusions of the paper are, in general, well described and discussed, and consider the state-of-the-art works of AD electrophysiology. Furthermore, even though the conclusions themselves are not groundbreaking at all (synaptic damage preceding structural and cognitive impairment is one of the epitomes of the pathological cascading model proposed by Jack in 2010), this article is innovative and groundbreaking in the way they address with clever analyses in a relatively large sample for neuroimaging standards.

      Response: Thank you for the positive comments of the strengths of the paper.

      Weaknesses:

      The main limitation of the work revolves around sample definition and inclusion criteria that are somewhat confusing obscuring some of the points of the analyses. Firstly it is not clear why the purely clinical approach is employed to diagnose the "probable Alzheimer´s Disease" for the 78 participants in the "AD group". In the same paragraph, it is stated that 67 out of the 78 participants show biomarker positivity, thus allowing a more biologically guided diagnosis that is preferred according to current NIA-AA criteria. This would avoid highly possible mixing of different subtypes of dementia etiologies. One might wonder, why would those 11 participants be included if we have strong indications that their symptoms are not due to AD? Furthermore, the real pathological status of the control group is somewhat questionable. The authors do not specify whether common AD biomarkers are available for this subgroup. In that case, it would have highly increased the clarity and interpretability of the results if this group was subdivided in a preclinical and completely healthy control group. This would be particularly interesting since a significant proportion of the control group is labeled as belonging to stages 2,3,4 (MCI) and even 5 (mild dementia). This raises the question of whether these participants are true healthy controls mislabeled by the EBM model, or actual cognitive controls with actual underlying AD pathology well identified by the model proposed.

      Response: Please see responses above to a similar comment from R1. To clarify, all our controls were recruited from an ongoing longitudinal study of healthy aging. Amongst the 70 controls, 39 have confirmed A-beta negative PET scans and 8 were confirmed A-beta positive PET scans, and in the rest of the 23 we do not have any biomarker data available. The biomarker positivity rates in our control cohort are completely consistent with the prevalence of A-beta positivity in cognitively healthy individuals and are within a normal biological continuum for amyloid beta (Jansen WJ et al. 2015). In all the controls, we have conducted comprehensive neuropsychological assessment (see Appendix 1—table 1 in the revised supplementary file) and based on this data we can be quite confident about their lack of clinical deficits, and we have a high degree of confidence that none of the controls have any neurodegeneration (AD-related or otherwise). We include these details in the revision (see the revised ‘Participants’ section in the Materials and methods.).

      Jansen WJ et al., 2015 JAMA; 667 313(19):1924-1938.

      On this note, Figure 2 (C and D) and Figure 3 (C, G and K) show a cortical surface depicting the mean difference of each stage vs the control group, which again, is formed by subjects that can be included (and in fact, are included) in all those stages, obscuring the meaning and interpretability of these cortical distributions.

      Response: We would like to clarify that these figures depict the regional maps of each metric for each stage of AD progression, not the contrast against a control group.

      Reviewer #1 (Recommendations For The Authors):

      • If possible, perform independent validation of the results.

      Response: This is something we indeed intend to examine in our future investigations.

      • Repeat the analysis in the subset of individuals that are amyloid positive.

      Response: Amongst the 78 AD patients, 20 had autopsy confirmed AD neuropathology, an additional 41 patients had molecular pathology identified by Abeta-PET, and another additional 9 had fluid biomarker (CSF) confirmation of amyloid and tau levels consistent with AD diagnosis. Eight remaining patients had a diagnosis of AD with high certainty, based on clinical presentation, neurological assessment, and cortical atrophy on MRI. Given that there are only eight patients who had clinical diagnosis of AD (with no biomarkers), and the comprehensive clinical characterization of all the AD patients in our cohort (Appendix 1—table 1), we do not believe that any subgroup analysis is warranted.

      • When inferring the optimal z-scores, select the same set of z-scores per biomarker, or include diagrams of stage vs z-score that include all of the markers so that it is easy to see how one marker changes relative to the others (overlay Figure 1G on Figure 2A and 2B).

      Response: How the neural synchrony metrics, PHG volume and MMSE scores change relative to each other is exactly what we show in Figures 3 B/F/J and 4 B/F/J. Since each EBM model optimizes the z-score thresholds, sequence likelihood and posterior probability of each stage for each subject, the EBM framework provides the most likely estimate for each metric at every stage. Therefore, the SAC-EBM model gives the most accurate description of the relative differences in these metrics over the AD progression stages. The reviewer’s suggestion to overlay Figure 1G (now figure 1F, based on optimized z-scores for PHG volume and MMSE scores) on Figures 2A and 2B will be inaccurate, as the neural synchrony measures plotted in figures 2A and 2B are not for optimized z-scores.

      • Change equation 2 to use a multivariate gaussian.

      Response: We now clarify that we use a factorized multivariate form that reflects independent priors for each metric which are Gaussian.

      • Clarify whether k is a hidden variable and possibly change the notation.

      Response: We now clarify that in our notation, k is a label for the stage [k=1,..,7 (when I=2) or k=1,...,10 (when I =3)] and is indeed a hidden variable and not observed (but inferred from the EBM). Specifically, the posterior probability for each subject j belonging to stage k was estimated as part of the sequence optimization procedure.

      • Generate positional variance diagrams of the MCMC samples.

      Response: We are doing the MCMC to obtain the most likely sequence. We have now included positional variance diagrams of the optimal set of z-score events in Appendix 1—figure 2 and Appendix 1—figure 9 in the revised supplementary file.

      • It would be interesting to study whether the stages are predictive of conversion or look at longitudinal data, if available.

      Response: This is something we indeed intend to examine in our future investigations.

      • Also look at statistics across MCMC samples of the sequence.

      Response: Thank you for this suggestion. In the Appendix 1—figure 10, we now include an example of the MCMC samples for an SAC-EBM including the alpha-band AEC. We then derived the positional variances for each metric that are now shown in Appendix 1—figure 2 and Appendix 1—figure 9.

      Reviewer #2 (Recommendations For The Authors):

      Some really minor changes are suggested on two specific points that somewhat confused me as a reader and got me stuck in the reading process to try to get the meaning of what I was seeing/reading:

      1. It is not specified (or at least I was unable to find it) what are you comparing exactly for the group comparison in the long-range synchrony metric (AEC) before creating your scalar metric. Are you comparing individual links (in which case you would have 93 link values for each ROI to compare)? Or are you comparing the strength for each ROI (thus, one value -the individual links sum- for each ROI)? I guess it should be the latter for what I see in the figures but it could be useful to specify it.

      Response: The reviewer is correct. We compare the strength of each ROI, i.e., averaging over edges of the symmetric AEC matrix of functional connectivity. We now clarify this in the Amplitude-envelope correlation section and the caption of the revised Appendix 1—figure 6.

      1. In Figure 1 (which, by the way, is exceptionally aesthetic, congratulations for that!) I got stuck for a relatively long time in a really small detail and I am not completely sure if I came to the right conclusion. It is regarding the X axis of the histograms in panels B and D. They are expressed as "PHG volume loss" and "MMSE decline". So I supposed those histograms were showing some kind of subtraction, (maybe from stage X to stage Y, or from group X to group Y). I was trying to understand the histogram and rereading methods to see if I overlooked any description of that graphic and then just realized they might be just the Z-score itself for each group (control and AD) with respect to the whole population. If that is the case I would suggest changing the X-label to "PHG z-score" and "MMSE z-score" avoiding the reference to "loss and "decline" as they are just reflecting the direct transformation to z-score.

      Response: Thank you. We would like to clarify that the z-score for PHG volume and MMSE scores were sign-inverted so that higher values denote “PHG Volume loss” and “MMSE decline”, respectively. We now clarify this point in the revised text and legend for the revised figure 1.

      Lastly, regarding the point I raised in the limitations section of the public review, I understand it might fall out of the scope of eLife reviewing process as it would require a more extensive change of the current manuscript, which is great as it is. But as a reader and researcher in the field, I would have recommended using biomarkers to divide the control group (if available) thus including in the models only those belonging to the AD continuum according to their biomarker status, and leaving those control without any biomarker positivity as the reference group for the figures I mention in that section (those showing differences for each stage in the cortical surface with respect to the control group).

      Response: Please see a similar comment from R1. Amongst the 70 controls, 39 have confirmed A-beta negative PET scans and only 8 were confirmed A-beta positive PET scans, and in the rest of the 23 we do not have any biomarker data available. In all the controls, we have conducted comprehensive neuropsychological assessment (see Appendix 1—table 1 in the revised supplementary file) and based on this data we can be quite confident about their lack of clinical deficits, and we have a high degree of confidence that none of the controls have any neurodegeneration (AD-related or otherwise). Since only 8 participants were confirmed as amyloid positive in the control group and this sample size is small, we do not conduct this recommended re-analysis in this manuscript.

    1. Author Response

      We appreciate your comments and also thanks to the reviewers for providing valuable feedback and recommendations. For most of the recommendations, we will respond in the revised version, which will provide more information for readers to understand and apply the study. For some of the recommendations, we can give quick responses as follows:

      Reviewer #2 (Public Review):

      The differences between passive and active immunolabeling, as well as photobleaching data, should be addressed for a comprehensive understanding.

      In passive immunolabeling, antibodies penetrate and achieve their targets merely via diffusion, without any additional force. In contrast, active immunolabeling utilizes an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process (i.e., one day vs. 2 months for a whole mouse brain). In our study, the samples we were dealing with were centimeter-sized; therefore, we employed only active electrophoretic immunolabeling (details provided in Materials and Methods). However, for laboratories that do not possess adequate devices or handle small specimens, they can employ passive immunolabeling instead. As for the photobleaching data, we will provide it in the revised version.

      The compatibility of MOCAT with genetically encoded fluorescent proteins remains unclear and warrants further investigation.

      We agree with the possibility that the encoded fluorescent proteins will be affected. Since there is evidence that fluorescence can be quenched by xylene and alcohol, which are two organic solvents used in paraffin processing, we think boost immunolabeling is necessary for observing genetically encoded fluorescent proteins. We also pointed out this limitation in the Discussion:

      “Fourth, endogenous fluorescence—such as GFP, YFP, and tdTomato—may be quenched during paraffin processing and thus need to be visualized by means of additional immunolabeling.”

      However, the extent to which endogenous fluorescence will be quenched during the paraffin processing and MOCAT procedure, and how much boost labeling can rescue, is worth investigating for broadening the application of MOCAT. We will provide it in the revised version.

      The composition of NFC1 and NFC2 solutions for refractive index matching should be provided.

      Since NFC1 and NFC2 are commercial products from Nebulem (Taiwan), the composition is non-disclosable. However, the refractive index of NFC1 and NFC2 is 1.47 and 1.52, respectively.

    1. Author Response:

      Update, January 11, 2024:

      During the course of our careful revising of the paper, we discovered an inconsistency in the way we presented data for figures 5 and 6. Specifically, we used optogenetics to induce ataxia in mice. However, "ataxia", as a phenotype, can be initiated by a spectrum of cell dysfunctions as revealed by previous studies. We systematically explored this with optogenetics in this current work. Our error is that we presented one stimulation paradigm to show ataxic cell firing (2 ms on / 11 ms off square wave) and then presented a slightly different paradigm to show ataxic animal behavior (10 ms on / 10 ms off square wave). We note that our ataxia paradigms do not affect the outcomes of the dystonia and tremor stimulations. Importantly, the choice of ataxia paradigm does not change the conclusions of the paper. Regardless, for clarity we are actively working to make the stimulation parameters that we present consistent between figures 5 and 6.

      October 10, 2023:

      We would like to thank all three reviewers for providing excellent suggestions that will enable us to strengthen our manuscript and enhance the impact of our findings. We plan on addressing the comments by altering the text, providing additional data, revising the figures as requested, and most importantly by providing an improved classifier model. Where relevant, we will also provide the reviewers with a response to specific questions that they raised. We will respond to the reviewer’s comments in a point-by-point manner when we submit a revised manuscript. Below, we include an outline of the main points that we intend to address.

      Although we will respond in full to all comments and suggestions in the revised documents, here we outline only the major areas in order provide context for our revisions. 1) The major point of concern raised by the reviewers is the strength of the classifier model. We agree with the reviewers that we should put forward the strongest model possible as this forms a core component of our paper. We are planning on retraining our model using the suggestions put forward by the reviewers in the public and author-directed comments. Importantly, given the healthy discussion about our model, our revised manuscript will now also include additional clarification about the choice of the model architecture and limitations of our data structure. Based on the reviewers’ comments, we will include a brief discussion about possible future ways of improving the model. 2) We will provide additional figures and updated figure panels to reflect the new data analyses. Ultimately, we agree that the major strength of our manuscript lies within the many mouse models tested and validation of the classification in different genetic, pharmacological, and optogenetic mouse models, a point raised by all three reviewers. We are confident that the revised images will reflect these strengths. 3) In addition to improving our classifier model, we are planning on making textual changes to clarify several parts of the text and propose a new title that better reflects the data put forth in our manuscript. 4) There are several minor but important comments that were raised by all three reviewers. We will also incorporate these changes as suggested.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Recommendations For The Authors):

      1. Fig. 2B: In their previous comment #6, I assume that Reviewer #2 was asking about peaks that were called as statistically significant above background, not just "higher" as assessed by eye. The authors have now marked peaks that are "higher" but still do not indicate that they were called as statistically significant by any software. I agree that they need to indicate in the figure which peaks were discovered by formal analysis.

      Response: Thank you for the professional suggestions. We used the Piranha (version 1.2.1) software to call peaks from CLIP-seq data, in which the P-value threshold for peaks (i.e., the -p parameter) was set as 0.05. And then any region above the IgG peak could be a binding region, and of course, the higher the peak, the more pre-mRNA SRSF1 binds in that region.

      1. Similar to the above comment, in Fig. 7G "visual analysis" of IGV tracks is not an assay. It is fine to show the tracks as an example of the differential expression called using DESeq2, but this should be described for what it is.

      Response: We thank the reviewer for the professional comments. Following this advice, we have corrected the text in this revised version (Page 11, Line 233).

      1. Fig 5C: TUNEL results are supported by a single image of only a few cells. It is important to include quantitation as has been done for other microscopy data.

      Response: Thank you for the professional suggestions. Following this advice, we have added the quantitative data in Figure 5C. Also, we have added specific quantification methods to the text (Page 23, Line 484-485).

      1. Legend to Fig 6C-E: I assume n=4 refers to the number of animals. It would be best to also know many cells/tubules were counted for each animal.

      Response: Thank you for the helpful comments. Following this advice, we have revised the legend for Figure 6D, E (Page 12, Line 246-249).

      1. There appears to be a mistake in line 285-287, which reads: "the overall analysis of aberrant AS events showed that SRSF1 effectively promotes the occurrence of SE and MXE events and inhibits the occurrence of RI events." The data in Fig 8C appears to show the opposite, with more SE and MXE, and fewer RI events, in the SRSF1 KO. This would imply that SRSF1 normally inhibits SE/MXE and promotes RI.

      Response: Thank you very much for the professional comments. Following this advice, we have corrected the text in this revised version (Page 14, Line 286-288).

      1. In Fig. 8E, an upper band is depleted in SRSF1 KO, but in Figure 8J, a much lower band is depleted. How is this explained?

      Response: Thank you for the professional suggestions. Since exon 7 of Tial1 is in the non-coding region, the lower band in Figure 8E does not correspond to the lower band in Figure 8J. For better understanding, we show the detailed information of Tial1 in the attached Figure S3.

      1. Line 81: As a very minor point, "AS" is defined as alternative splicing in the abstract, but should be re-defined again in the main text when first mentioned.

      Response: Thank you for the helpful comments. Following this advice, we have corrected the text in this revised version (Page 3, Line 81).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for their valuable and constructive feedback. In the revised manuscript, we have incorporated and addressed the suggestions provided by the reviewers.

      Reviewer #1 (Recommendations For The Authors):

      The primary recommendation is to provide additional language explaining how KinCytE will be updated.

      Response: We appreciate the reviewer’s insightful feedback regarding the KinCytE update. In response, we have included additional details in the “Development and use of KinCyte’ section as follows: “We welcome researchers to actively participate in advancing the development of KinCytE by sharing external screening data, especially data on new secreted factors and cell types that extend beyond macrophages. This collaborative effort promises to enhance our understanding of kinase-focused networks, opening new avenues for cutting-edge therapeutic approaches”. In addition, we explicitly state in the "Data, Software, and Availability" section, "To contribute data, kindly email the corresponding author and refer to Table S2 for guidance on the preferred file format."

      Reviewer #2 (Recommendations For The Authors):

      Would have been nice to see a validation of the regression models from outside of the training data. I would also consider removing statements like "We anticipate that KinCytE will be highly sought after by biologists... " , it reads like a grant application (and this is not)! Could tone the language down a bit. In the future, you might consider displaying your graphs as "biofabrics", they're much cleaner than "hairballs" (PMID: 23102059). Or potentially, show a hierarchical view where the selected cytokine (or other) is at the root, and you can immediately see what's connected. Anyway, the network display can be expanded. Consider maybe adding the nearest neighbors to the table on the right after selecting the node. Generally, though, I like how it works.

      There needs to be a button to download the graph as a .csv file. Maybe the subgraph after selecting a node (or set of nodes). Also, once you're at a graph view, it's hard to guess how to get back to the starting page. Maybe just one button with a "home" on it would fix that. On the Kinases Discovery, why are the gene symbols all lower case? Very cool!

      Response:: We greatly value the reviewer's constructive suggestions. To incorporate these, we have made the following changes:

      (1) "We anticipate that KinCytE will be highly sought after by biologists... " This sentence is removed.

      (2) A ‘SAVE CSV’ button is added to the bottom right of the Cytokine Explorer page, which allows the users to download the graph as a csv file.

      (3) A redesigned KinCyte logo now functions as the 'HOME' button, located at the top left of the webpage, ensuring that users can easily return to the homepage at any time.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript describes the synergy among PI3Kbeta activators, providing compelling results concerning the mechanism of their activation. The particular strengths of the work arise to a great extent from the reconstitution system better mimicking the natural environment of the plasma membrane than previous setups have. The study will be a landmark contribution to the signaling field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript aims to provide mechanistic insight into the activation of PI3Kbeta by its known regulators tyrosine phosphorylated peptides, GTP-loaded Rac1 and G-protein beta-gamma subunits. To achieve this the authors have used supported lipid bilayers, engineered recombinant peptides and proteins (often tagged with fluorophores) and TIRF microscopy to enable bulk (averages of many molecules) and single molecule quantitation. The great strength of this approach is the precision and clarity of mechanistic insight. Although the study does not use "in transfecto" or in vivo models the experiments are performed using "physiologically-based" conditions and provide a powerful insight into core regulatory principles that will be relevant in vivo.

      The results are beautiful, high quality, well controlled and internally consistent (and with other published work that overlaps on some points) and as a result are compelling. The primary conclusion is that the primary regulator of PI3Kbeta are tyrosine phosphorylated peptides (and by inference tyrosine phosphorylated receptors/adaptors) and that the other activators can synergise with that input but have relatively weak impacts on their own.

      Although the methodology is not easily imported, for reasons of both cost and the experience needed to execute them well, the results have broad importance for the field and reverse an impression that had built in large parts of the broader signalling and PI3K communities that all of the inputs to PI3Kbeta were relatively equivalent, however, these conclusions were based on "in cell" or in vivo studies that were very difficult to interpret clearly.

      Reviewer #2 (Public Review):

      The manuscript of Duewell et al has made critical observations that help to understand the mechanisms of activation of the class IA PI3Ks. By using single-molecule kinetic measurements, the authors have made outstanding progress toward understanding how PI3Kbeta is uniquely activated by phosphorylated tyrosine kinase receptors, Gbeta/gamma heterodimers and the small G protein Rac1. While previous studies have defined these as activators of PI3Kbeta, the current manuscript makes clear the quantitative limitations of these previous observations. Most previous quantitative in vitro studies of PI3Kbeta activation have used soluble peptides derived from bis-phosphorylated receptors to stimulate the enzyme. These soluble peptides stimulate the enzyme, and even stimulate membrane interaction. Although these previous studies showed that the release of p85-mediated autoinhibition unmasks an intrinsic affinity of the enzyme for lipid membranes, they ignored what would be the consequence of these peptide sequences being present in the context of intrinsic membrane proteins. The current manuscript shows that the effect of membrane-conjugated peptides on the enzyme activity is profound, in terms of recruiting the enzyme to membranes. In this context, the authors show that G proteins associated with the membranes have an important contribution to membrane recruitment, but they also have a profound allosteric effect on the activity on the membrane, These are observations that would not have been possible with bulk measurements, and they do not simply recapitulate observations that were made for other class IA PI3Ks.

      An important observation that the authors have made is that Gbeta/gamma heterodimers and RAc1 alone have almost no ability to recruit PI3Kbeta to the membranes that they are using, and this is central to one of the most profoundly novel activation mechanisms offered by the manuscript. The authors propose that the nSH2- and Gbeta/gamma binding sites partially overlap, so that Gbeta/gamma can only bind once the nSH2 domain releases the p110beta subunit. This mechanism would mean that once the nSH2 is engaged by membrane-conjugated pY, the Gbg heterodimer can bind and increase the association of the enzyme with membranes. Indeed, this increased membrane association is observed by the authors. However, the authors also show that this increased recruitment to membranes accounts for relatively little increase in activity, and that the far greater component of activation is due to an allosteric effect of the membrane association on the activity of the enzyme. The proposal for competition between Gbg binding and the nSH2 is consistent with the behavior of an nSH2 mutant that cannot bind to pY and which, consequently, does not vacate the Gbg-binding site. In addition to the outstanding contribution to understanding the kinetics of activation of PI3Kbeta, the authors have offered the first structural interpretation for the kinetics of Gbg activation in synergy with pY activation. The proposal for an overlapping nSH2/Gbg binding site is supported by predictions made by John Burke, using alphafold multimer. Although there is no experimental structure to support this structural model, it is consistent with HDX-MS analyses that were published previously.

      Reviewer #1 (Recommendations For The Authors):

      1. The approx relative concentrations (surface densities ) of Rac1-GTP, GBetagammas and PY-peptides used in experiments in Fig 1 are not easy to understand and useful to give an intuitive feel for the relative sensitivity of the PI3Kbeta reporter to those inputs.

      In our revised manuscript, we provide densities of the individual signaling inputs used to reconstitute Dy647-PI3Kβ membrane recruitment (see Figure legend 1). We provide a more detailed explanation about our quantification method in subsequent figures where the membrane surface density of signaling inputs is varied to modulate the strength of PI3Kβ membrane localization and activity.

      Building off the quantification of Rac1-GTP and pY membrane density measurements presented in our initial manuscript submission, we now include an estimate of the GβGγ membrane density. For these new measurements, we recombinantly expressed and purified additional SNAP-GβGγ protein, which we fluorescently labeled with AlexaFluor 555. The membrane surface density of GβGγ was quantified at equilibrium using a combination of AF488-SNAP-GβGγ (bulk signal) and dilute AF555-SNAP-GβGγ (0.0025%), which allowed us to resolve and count the single molecule density (Figure 3A). We calculate the total surface density of GβGγ based on the AF555-SNAP-GβGγ dilution factor. In the methods section titled, “surface density calibration,” we describe our protocol.

      1. The estimates of the PIP3 concentrations/densities measured using the BTK reporter seem good but its unclear (to me) how they were derived.

      The density of PI(3,4,5)P3 lipids in our supported lipid bilayers was calculated based on the incorporation of a define molar ratio of PI(3,4,5)P3 in our small unilamellar vesicles. Based on the average footprint of 0.72 nm2 for a single lipid, we calculated the density of lipids per µm2. In the methods section titled, “kinetic measurements of PI(3,4,5)P3 lipid production,” we include the following description:

      “Assuming an average footprint of 0.72 nm2 for phosphatidylcholine (Carnie et al., 1979; Hansen et al., 2019), we calculated a density of 2.8 × 104 PI(3,4,5)P3 lipids/μm2 for supported membranes that contain an initial concentrations of 2% PI(4,5)P2. We assume that the plateau fluorescence intensity of the AF488-SNAP-Btk sensor following reaction completion in the presence of PI3Kβ represents the production of 2% PI(3,4,5)P3. The bulk membrane intensity of AF488-SNAP-Btk was normalized from 0 to 1, and then multiplied times the total density of PI(3,4,5)P3 lipids to generate kinetic traces that report the kinetics of PI(3,4,5)P3 production.”

      Minor points

      l164; Rac1(GTP) AND GBeta gammas. In this context it should be OR. Or have I misunderstood?

      l1093; kineticS measurementS.

      Thank you for pointing out these typos. We made the appropriate edits.

      The paper of Suire etal (Suire, S., Lécureuil, C., Anderson, K. E., Damoulakis, G., Niewczas, I., Davidson, K., Guillou, H., Pan, D., Jonathan Clark, Phillip T Hawkins, & Stephens, L. (2012). GPCR activation of Ras and PI3Kc in neutrophils depends on PLCb2/b3 and the RasGEF RasGRP4. The EMBO journal, 31(14), 3118-3129. https://doi.org/10.1038/emboj.2012.167) make the point that in vivo it appears that although Ras-activation is required for full activation of PI3Kgamma (and can activate PI3Kgamma in vitro directly) if you use tools to activate Ras in the absence of receptor and Gbetagamma signalling, it has no affect on PIP3 . This directly supports the authors conclusions.

      Thank you for sharing this citation. We incorporated the reviewer’s insight into our discussion section to broaden the significance of our work.

      Reviewer #2 (Recommendations For The Authors):

      There are only a few relatively minor points that could be addressed to improve the paper:

      1. Why is the density still going up after 10 minutes in Figure 1 Figure supplement 2? Doesn't this seem like a very long time? Are we seeing fast on/off combined with fast on/slow off? Are the particles eventually becoming stuck in odd places or are they slowly denaturing?

      Our movies do not indicate a slow accumulation of immobilized or stuck Dy647-PI3Kβ particles on the membrane surface. On the long timescale, we believe that a small fraction of Dy647-PI3Kβ molecular do exhibit longer dwell times on membranes containing a high density of pY (>6,000 molecules/µm2). This is likely due to membrane hopping of Dy647-PI3Kβ. In other words, rather than Dy647-PI3Kβ dissociating from the membrane surface directly into the solution, the Dy647-PI3Kβ molecule immediately rebinds to another membrane conjugated pY peptide. This type of behavior of a peripheral membrane binding protein is generally correlated with there being a higher surface density of the binding partner (Yasui et al., 2014). Characterization of potential Dy647-PI3Kβ membrane hopping will require additional experimentation (e.g. PI3Kβ mutants) and quantitative analysis that goes beyond the scope of this study.

      1. Lines 188-189. "By quantifying the average number of Alexa488-pY particles per unit area of supported membrane we calculated the absolute density of pY per μm2 (Figure 2D). I think this should be Figure 2C, right hand y-axis.

      Thank you for identifying our typo. We’ve corrected the text for clarity.

      1. Lines 102-193. "When Dy647-PI3Kβ was flowed over a membrane containing a low density of {less than or equal to} 500 pY/μm2, we observed rapid equilibration kinetics consistent with a 1:1 binding stoichiometry (Figure 2E).” There is no density shown in Fig. 2E. There is only "membrane intensity." Perhaps it was their intent to include a right-hand axis with density (number of particles/area), as they did in Figure 2C. However, they did not, so Figure 2E does not support the text. The value of Intensity/#py/um**2 does not appear to be the same for Figure 2C as for Figure 2E, assuming that the statement in the text is correct. The authors should include the density as a right-hand axis in 2E.

      We have reworded this portion of the results section for clarity. In reading the reviewers comment, we recognize that a more convincing way to support our claim of a 1:1 binding stoichiometry would be to show that there are ~500 Dy647-PI3Kβ/μm2 membrane bound complexes when the pY surface density equals ~500 pY/μm2. For us to make this connection, we would need to perform experiments using a Dy647-PI3Kβ concentration that fully saturates all the binding pY binding sites. However, at this elevated Dy647-PI3Kβ solution concentration, individual Dy647-PI3Kβ complexes can start to bind to a single phosphotyrosine of the dually phosphorylated peptide due to competition for pY binding sites. As an alternative to performing the experiment described above, we can infer binding stoichiometry from the shape of the membrane absorption kinetic traces. For example, a simple bimolecular interaction exhibits rapid equilibration kinetics with a hyperbolic shaped kinetic trace. Systems that have more complex binding equilibria, however, generally take longer to equilibrate (due to the change in KOFF) and can often be broken down into 2 or 3 distinct dissociation constants (KD). This type of kinetic analysis has previously been used to describe multivalent membrane binding interactions for the Btk-PI(3,4,5)P3 (Chung et al., 2019) and PI3Kγ-GβGγ (Rathinaswamy et al., 2021) complexes. Considering that there are multiple interpretations of the Dy647-PI3Kβ membrane absorption traces show in Figure 2E, we refrain from saying that our results explicitly reveal a 1:1 binding stoichiometry. Instead, we provide several possible explanations for the results. Ultimately, additional experiments and kinetic modeling of wild type and mutant PI3Kβ is necessary to define the binding stoichiometry under different conditions.

      1. Table 1. The authors have analysed the data to extract two dwell times and two diffusion coefficients. The legend should make this clear, referring to D1 as the slow diffusion component and D2 as fast diffusion, similarly, there are short and long dell times. This should be stated in the legend. There are two columns labelled "alpha". This presumably should be alpha1 and alpha2, the fractions of particles with short and long dwell times. The table legend should clarify this.

      In our revision, additional text has been added to the figure legends and Table 1.

      Text from Table 1: “Alpha (α) equals the fraction of molecules with the characteristic dwell time, τ1 (DT = dwell time). The fraction of molecules with the characteristic dwell time, τ2, equals 1-α. Alpha (αD) equals the fraction of molecules with the characteristic diffusion coefficient, D1. The fraction of molecules with diffusion coefficient, D2, equals 1-αD.”

      1. In the legend for Figure 5 figure supplement 1, for part D, the "Cumulative membrane of binding events..." The "of" should be deleted.

      Thank you for identifying this typo.

      1. Lines 423-426: "We found that PI3Kβ kinase activity is also relatively insensitive to either Rac1(GTP) or GβGγ alone. This is in contrast to previous reports that showed Rho-GTPases (Fritsch et al. 2013) and GβGγ (Katada et al. 1999; Hashem A. Dbouk et al. 2012; Maier, Babich, and Nürnberg 1999) can activate PI3Kβ, albeit modest, compared to synergistic activation with pY peptides plus Rac1(GTP) or GβGγ." It is not clear what this statement means. On the surface, it might be interpreted as saying that these previous studies had some flaw that led the authors to conclude that there is some activation caused by Rac1 or Gbeta/gamma on their own. The current manuscript is an important contribution to understanding the mechanism of synergistic activation, but it is also true that the Hansen and his colleagues have not used the same membranes as were used previously. The authors state that they have used a wide range of membrane compositions, but the only ones that have appeared in the manuscript are nearly pure PC (with 2% PIP2) or PC with 20% PS. Extensive studies with varying membrane compositions are beyond the scope of the current study, since the current manuscript concisely makes important observations regarding mechanism. However, it would be helpful for readers if the authors at least mention the differences in membrane compositions among the studies.

      The reviewer raises an important point concerning our interpretation of PI3Kβ activation data in relationship to existing literature. In our original submission, we made conclusions concerning how individual signaling inputs modulate PI3Kβ activity, without showing all our data or providing sufficient explanation. In our revised manuscript, we include PI3Kβ kinase activity measurements performed in the presence of either pY, Rac1(GTP), or GβGγ alone (Figure 5B-5C). These experiments were reconstituted on supported membranes in the absence or presence of 20% PS lipids. We found that increasing the density of anionic lipids increased the overall activity of PI3Kβ in the presence of pY or GβGγ alone. This is consistent with a subtle increase in PI3Kβ membrane affinity due to the negatively charged PS lipids. Mutations that disrupt the direct interaction between PI3Kβ and GβGγ eliminated the observed lipid kinase activity. We were unable to detect PI3Kβ activity in the presence of Rac1(GTP) alone. In conclusion, we’re able to detect some PI3Kβ activity in the presence of GβGγ alone, which is consistent with previous reports (Dbouk et al., 2010; Katada et al., 1999; Maier et al., 2000). In the future, a more comprehensive analysis will be required to map the relationship between PI3Kβ activity, membrane localization, and lipid composition. For example, previous reconstitutions have revealed differential activation of PI3Kα that depends on the most abundant lipid being phosphatidylethanolamine (PE) rather than phosphatidylcholine (PC) (Hon et al., 2012; Ziemba et al., 2016). PE lipids comprise 25-30% of the cellular plasma membrane (Yang et al., 2018) and have been used in previous studies to measure PI3K lipid kinase activity on small unilamellar vesicles (Dbouk et al., 2010; Hon et al., 2012).

      In this study, we elected to use a simplified membrane composition that minimized non-specific membrane localization of fluorescently labeled PI3Kβ. This allowed us to more clearly define the strength of individual and combinations of protein-protein interactions that regulate PI3Kβ localization and kinase activity. When reconstituting amphiphilic molecules (i.e. lipids) in aqueous solution a variety of structures, including micelles, inverted micelles, and planar bilayers can form based on the lipid composition (Kulkarni, 2019). The organization of these membrane structures is related to the molecular packing parameter of the individual phospholipids (Israelachvili et al., 1976). The packing parameter (P=v⁄((a•l_c))) depends on the volume of the hydrocarbon (v), area of the lipid head group (a), and the lipid tail length (l_c). When generating supported lipid bilayers on a flat two-dimensional glass surface, we aim to create a fluid lamellar membrane. We find that phosphatidylcholine (PC) lipids are ideal for making supported lipid bilayers because they have a packing parameter of ~1 (Costigan et al., 2000). In other words, PC lipids are cylindrical like a paper towel roll. In contrast, cholesterol and phosphatidylethanolamine (PE) lipids have packing parameters of 1.22 and 1.11, respectively (Angelov et al., 1999; Carnie et al., 1979). This gives cholesterol and PE lipids an inverted truncated cone shape, which prefers to adopt a non-lamellar phase structure. Due to the intrinsic negative curvature of PE lipids, they can spontaneously form inverted micelles (i.e. hexagonal II phase) in aqueous solution when they are the predominant lipid species (Israelachvili et al., 1980; Kobierski et al., 2022; Wnętrzak et al., 2013). In the methods section of our manuscript, we note that from our experience incorporation of PE lipids dramatically reduces the protein-maleimide coupling efficiency, displayed more membrane defects, and resulted in a larger fraction of surface immobilized Dy647-PI3Kβ. This could be related to the intrinsic negative curvature of PE membranes. However, further investigation is needed to decipher these issues.

      Angelov B, Ollivon M, Angelova A. 1999. X-ray Diffraction Study of the Effect of the Detergent Octyl Glucoside on the Structure of Lamellar and Nonlamellar Lipid/Water Phases of Use for Membrane Protein Reconstitution. Langmuir 15:8225–8234. doi:10.1021/la9902338

      Carnie S, Israelachvili JN, Pailthorpe BA. 1979. Lipid packing and transbilayer asymmetries of mixed lipid vesicles. Biochim Biophys Acta 554:340–357. doi:10.1016/0005-2736(79)90375-4

      Chung JK, Nocka LM, Decker A, Wang Q, Kadlecek TA, Weiss A, Kuriyan J, Groves JT. 2019. Switch-like activation of Bruton’s tyrosine kinase by membrane-mediated dimerization. Proc Natl Acad Sci 116:10798–10803. doi:10.1073/pnas.1819309116

      Costigan SC, Booth PJ, Templer RH. 2000. Estimations of lipid bilayer geometry in fluid lamellar phases. Biochim Biophys Acta 1468:41–54. doi:10.1016/s0005-2736(00)00220-0

      Dbouk HA, Pang H, Fiser A, Backer JM. 2010. A biochemical mechanism for the oncogenic potential of the p110 catalytic subunit of phosphoinositide 3-kinase. Proc Natl Acad Sci 107:19897–19902. doi:10.1073/pnas.1008739107

      Hansen SD, Huang WYC, Lee YK, Bieling P, Christensen SM, Groves JT. 2019. Stochastic geometry sensing and polarization in a lipid kinase–phosphatase competitive reaction. Proc Natl Acad Sci 116:15013–15022. doi:10.1073/pnas.1901744116

      Hon W-C, Berndt A, Williams RL. 2012. Regulation of lipid binding underlies the activation mechanism of class IA PI3-kinases. Oncogene 31:3655–3666. doi:10.1038/onc.2011.532

      Israelachvili JN, Marcelja S, Horn RG. 1980. Physical principles of membrane organization. Q Rev Biophys 13:121–200. doi:10.1017/s0033583500001645

      Israelachvili JN, Mitchell DJ, Ninham BW. 1976. Theory of self-assembly of hydrocarbon amphiphiles into micelles and bilayers. J Chem Soc Faraday Trans 2 Mol Chem Phys 72:1525–1568. doi:10.1039/F29767201525

      Katada T, Kurosu H, Okada T, Suzuki T, Tsujimoto N, Takasuga S, Kontani K, Hazeki O, Ui M. 1999. Synergistic activation of a family of phosphoinositide 3-kinase via G-protein coupled and tyrosine kinase-related receptors. Chem Phys Lipids 98:79–86. doi:10.1016/S0009-3084(99)00020-1

      Kobierski J, Wnętrzak A, Chachaj-Brekiesz A, Dynarowicz-Latka P. 2022. Predicting the packing parameter for lipids in monolayers with the use of molecular dynamics. Colloids Surf B Biointerfaces 211:112298. doi:10.1016/j.colsurfb.2021.112298

      Kulkarni CV. 2019. Calculating the “chain splay” of amphiphilic molecules: Towards quantifying the molecular shapes. Chem Phys Lipids 218:16–21. doi:10.1016/j.chemphyslip.2018.11.004

      Maier U, Babich A, Macrez N, Leopoldt D, Gierschik P, Illenberger D, Nürnberg B. 2000. Gβ 5 γ 2 Is a Highly Selective Activator of Phospholipid-dependent Enzymes. J Biol Chem 275:13746–13754. doi:10.1074/jbc.275.18.13746

      Rathinaswamy MK, Dalwadi U, Fleming KD, Adams C, Stariha JTB, Pardon E, Baek M, Vadas O, DiMaio F, Steyaert J, Hansen SD, Yip CK, Burke JE. 2021. Structure of the phosphoinositide 3-kinase (PI3K) p110γ-p101 complex reveals molecular mechanism of GPCR activation. Sci Adv 7:eabj4282. doi:10.1126/sciadv.abj4282

      Wnętrzak A, Lątka K, Dynarowicz-Łątka P. 2013. Interactions of alkylphosphocholines with model membranes-the Langmuir monolayer study. J Membr Biol 246:453–466. doi:10.1007/s00232-013-9557-4

      Yang Y, Lee M, Fairn GD. 2018. Phospholipid subcellular localization and dynamics. J Biol Chem 293:6230–6240. doi:10.1074/jbc.R117.000582

      Yasui M, Matsuoka S, Ueda M. 2014. PTEN Hopping on the Cell Membrane Is Regulated via a Positively-Charged C2 Domain. PLoS Comput Biol 10:e1003817. doi:10.1371/journal.pcbi.1003817

      Ziemba BP, Burke JE, Masson G, Williams RL, Falke JJ. 2016. Regulation of PI3K by PKC and MARCKS: Single-Molecule Analysis of a Reconstituted Signaling Pathway. Biophys J 110:1811–1825. doi:10.1016/j.bpj.2016.03.001

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the referee for the positive review.

      Reviewer #2 (Public review):

      We thank the referee for his/her constructive comments

      1. The weakness of this work is the lack of clarification on the function of eIF2A in general. The novelty of this study was limited.

      We believe our study is valuable in providing strong evidence that eIF2A does not functionally substitute for eIF2 in tRNAi recruitment even when eIF2 function is impaired, and in showing that it does not contribute to translational control by uORFs or IRESs, thus ruling out the most likely possibilities for its function in yeast based on studies of the mammalian factor. We agree that the function of yeast eIF2A remains to be identified; however, we think this should be regarded as a limitation rather than a weakness in experimental design or data obtained in the current study.

      1. Related to this, it would be worth investigating common features in mRNAs selectively regulated (surveyed in Figure 3A).

      We did not embark on this because only 17 of the 32 transcripts showing TE reductions in Fig. 3A showed a pattern of TE changes consistent with a conditional requirement for eIF2A under conditions of reduced eIF2 function, exhibiting greater TE decreases when both eIF2 function was impaired by phosphorylation and eIF2A was eliminated from cells. Moreover, we could validate this conditional eIF2A dependence by LUC reporter for only a single mRNA, HKR1.

      Also, it would be worth analyzing the effect of eIF2A deletion on elongation (ribosome occupancy on each codon and/or global ribosome footprint distribution along CDS) and termination/recycling (footprint reads on stop codon and on 3′ UTR).

      We have analyzed the effects of deleting eIF2A on ribosome pausing at individual codons by calculating tri-peptide pause scores from our ribosome profiling data. The results shown in new Fig. 7 reveal that eIF2A plays no discernible role in stimulating the rate of decoding of any three-codon combinations.

      1. Regarding Figure 3D, the reporters were designed to include promoter and 5′ UTR of the target genes. Thus, it should be worth noting that reporter design was based on the assumption that eIF2A-dependency in translation regulation was not dependent on 3′ UTR or CDS region. The reason why the effects on ribosome profiling-supported mRNAs could not be recapitulated in reporter assay may originate from this design. This should be also discussed.

      We agree and included this stipulation in the DISCUSSION, while at the same time noting that the native mRNAs were examined in the orthogonal assay of polysome distributions.

      1. Related to the point above, the authors claimed that eIF2A affects "possibly only one" (HKR1) mRNA. However, this was due to the reporter assay which is technically variable and could not allow some of the constructs to pass the authors' threshold. Alternative wording for this point should be considered.

      We agree and revised text in the DISCUSSION to read: “A possible limitation of our LUC reporter analysis in Fig. 3D was the lack of 3’UTR sequences of the cognate transcripts, which might be required to observe eIF2A dependence. Given that native mRNAs were examined in the orthogonal assay of polysome profiling in Fig. 3E, the positive results obtained there for SAG1 and SVL3 in addition to HKR1 should be given greater weight. Nevertheless, our findings indicate a very limited role of yeast eIF2A in providing a back-up mechanism for Met-tRNAi recruitment when eIF2 function is diminished by phosphorylation of its α-subunit.”

      1. For Figure 3D, it would be worth considering testing the #-marked genes (in Figure 3C) in this set up.

      Actually, we did test 10 of the 17 mRNAs marked with “#”s in the reporter assays of Fig. 3C, which had been noted in the Fig. 3C legend.

      1. In box plots, the authors should provide the statistical tests, at least where the authors explained in the main text.

      At the first occurrence of a notched box plot (Fig. 2D), we explained in the main text that in all such plots, when the notches of different boxes do not overlap, their median values differ significantly with a 95% confidence level. In cases where overlaps between notches is difficult to assess by eye, we added the results of Mann-Whitney U tests with the p values indicated by asterisks, as explained in the legends. We added results of additional Mann-Whitney U tests to such box plots in Figs. 3B, 6A-C, and 6-supp. 1E & G and mentioned this in the corresponding legends.

      Reviewer #2 (Recommendations For The Authors):

      The first section of "Yeast eIF2A does not play a prominent role as a functional substitute for eIF2 in the presence or absence of amino acid starvation" can be subdivided into a couple of sections for better readability.

      Done.

      Although the authors have used SM to induce ISR in yeasts previously, the validation of eIF2alpha phosphorylation in Western blot would be helpful for readers. Also, it should be worth testing whether eIF2alpha phosphorylation was properly induced in eIF2A KO cells.

      The translational induction of GCN4 mRNA, which we have documented in WT and eIF2A∆ cells, provides a quantitative read-out of eIF2 functional attenuation superior to determining the proportion of eIF2α that is phosphorylated.

      For Figure 2B, the Venn diagram that shows the overlap between TE-changes genes in WT_SM/WT and those in eIF2A∆_SM/eIF2A∆ would be helpful (although a list was provided by the source data).

      The Venn diagram has been provided in a new figure, Figure 2-figure supplement 1B.

      For Figures 1C and 5A-B, the depiction of the positions of uORFs within the orange gene region would be helpful for readers.

      Done.

      For Figure 4A-C, the depiction of the IRES regions (if known) within the orange gene region would be helpful for readers.

      Done for the URE2 IRES, whose location is known.

      For Figures 1C, 4A-C, and 5A-B, the y-axis should have a label/scale.

      Added.

      For Figure 3C, the definition of #-marked genes should be concretely described (e.g., value range) in the legend.

      Added.

      For Figure 3D-E, the statistical test has been only shown in a couple of data. A full depiction of the statistical results for all the data sets may be helpful for readers.

      We explained that when notches in box plots do not overlap, their medians differ with 95% confidence. In cases where overlaps were difficult to discern, we added p values from Mann-Whitney U tests to the relevant box plots.

      For Figure 3E, it would be helpful if the authors could show the UV spectrum of the sucrose density gradient to show the regions isolated for the experiments.

      Added for a representative replicate gradient in the new figure, Figure 3-figure supplement 1.

      Reviewer #3 (Public Review):

      We thank the referee for his/her positive assessment of our study.

      Weaknesses:

      While no role of eIF2A in translation initiation is apparent, the authors do not determine what function eIF2A does play in yeast. Whether it plays a role in regulating translation in a different stress response is not determined.

      We agree that there are many additional possibilities to consider for functions of eIF2A in translation initiation, including different stress situations or mutant backgrounds; however, we regard this as a limitation rather than a weakness in the experimental design and data obtained in the current study in which we examined the most likely possibilities for eIF2A function in yeast based on studies of the mammalian factor.

      Reviewer #3 (Recommendations For The Authors):

      Curiously, the authors indicate that they could not replicate published results for eIF2A's repressor function for URE2, PAB1, or GIC1 translation. This is a little concerning and one wonders if the yeast strain used in the previous study is different in some way from the authors' strain. Did the authors obtain that strain to test it in their assays?

      The same WT and eIF2A∆ strains have been analyzed here and in the two cited studies on yeast IRESs.

      The authors do discuss the fact that eIF2A may function to regulate translation in response to different stresses. It would have been a strength to test an alternative stress in the current study. However, I also appreciate that this could be the subject of a future study.

      Agreed.

      One minor question I have is whether the yeast strains used possess L-A dsRNA virus? While it may not be that this virus would necessarily mask a role of eIF2A-dependent translation, do the authors have any specific thoughts on this? Would different results be obtained if cured strains were used?

      According to Ravoityte et al. (doi: 10.3390/jof8040381), the S. cerevisiae strain we employed, BY4741, harbors L-A-1 dsRNA; however, we have not explored whether curing the virus would alter the consequences of eliminating eIF2A.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the two reviewers for their constructive criticism, which helped to significantly improve our manuscript.

      During the revision process, we had to realize that the localization pattern reported for H. neptunium LmdCN-mCherry was an artifact caused by bleed-through of the BacA-YFP signal in the mCherry channel. More detailed studies showed that the fusion protein was detectable by Western blot analysis but, for unknown reasons, did not produce any fluorescence signal. Therefore, we have now removed the localization data shown in previous Figure 8B,C and Figure 8—figure supplement 1.

      To provide more evidence for a functional interaction between BacA and LmdC in H. neptunium, we have now established an inducible CRISPR interference system for this species and used it successfully to deplete LmdC (new Figure 9A-F). The loss of LmdC causes morphological defects very similar to those observed for the ΔbacA(D) mutant. In line with the physical interaction of BacA with the cytoplasmic region of LmdC observed in vitro, these findings support the hypothesis that the two proteins act in the same pathway. Consistent with the results obtained in H. neptunium, the absence of BacA leads to the delocalization of LmdC in R. rubrum. Moreover, we now provide in vivo evidence for a critical role of the cytoplasmic region of LmdC in the interaction of this protein with BacA in R. rubrum cells (new Figure 11). Together, these new findings strongly support the model that BacA and LmdC form a conserved morphogenetic module involved in the establishment of complex cell shapes in bacteria.

      Please see below for a more detailed explanation of our new results and for our response to the issues raised in the first round of review.

      Reviewer #1 (Public Review)

      In their study, Osorio-Valeriano and colleagues seek to understand how bacterial-specific polymerizing proteins called bactofilins contribute to morphogenesis. They do this primarily in the stalked budding bacterium Hyphomonas neptunium, with supporting work in a spiral-shaped bacterium, Rhodospirillum rubrum. Overall the study incorporates bacterial genetics and physiology, imaging, and biochemistry to explore the function of bactofilins and cell wall hydrolases that are frequently encoded together within an operon. They demonstrate an important, but not essential, function for BacA in morphogenesis of H. neptunium. Using biochemistry and imaging, they show that BacA can polymerize and that its localization in cells is dynamic and cell-cycle regulated. The authors then focus on lmdC, which encodes a putative M23 endopeptidase upstream of bacA in H. neptunium, and find that is essential for viability. The purified LmdC C-terminal domain could cleave E. coli peptidoglycan in vitro suggesting that it is a DD-endopeptidase. LmdC interacts directly with BacA in vitro and co-localizes with BacA in cells. To expand their observations, the authors then explore a related endopeptidase/ bactofilin pair in R. rubrum; those observations support a function for LmdC and BacA in R. rubrum morphogenesis as well.

      An overall strength of this study is the breadth and completeness of approaches used to assess bactofilin and endopeptidase function in cells and in vitro. The authors establish a clear function for BacA in morphogenesis in two bacterial systems, and demonstrate a physical relationship between BacA and the cell wall hydrolase LmdC that may be broadly conserved. The eventual model the authors favor for BacA regulation of morphogenesis in H. neptunium is that it serves as a diffusion barrier and limits movement of morphogenetic machinery like the elongasome into the elongating stalk and/or bud. However, there is no data presented here to address that model and the role of LmdC in H. neptunium morphogenesis remains unclear.

      We hypothesize that BacA establishes a barrier that prevents the movement of elongasome complexes into the stalk, either directly by sterical hindrance and/or indirectly by promoting the formation of an annular region of high positive inner cell curvature that cannot be passed by the elongasome. To test this model, we have now analyzed the localization dynamics of RodZ, a core structural component of the elongasome complex, in wild-type and ΔbacAD cells. We found that wild-type cells show dynamic YFP-RodZ foci whose movement is limited to the mother cell and the nascent bud, with no signal ob-served in the stalk. In ΔbacAD cells, by contrast, the fusion protein is consistently detected in all regions of the cell, including nascent stalks (new Figure 5). These results support the idea that BacA is required to confine the elongasome to the mother cell and bud regions and, thus, set the limits of the different growth zones in H. neptunium. We also attempted to follow the localization dynamics of other elongasome components, such as PBP2, MreC and MreD, but none of the corresponding fluorescent protein fusions was functional.

      In the past, we tried intensively to generate conditional mutants of lmdC, but all attempts to place the expression of this gene under the control of the copper- or zinc-inducible promoters available for H. neptunium were unsuccessful. To clarify the role of LmdC in H. neptunium morphogenesis, we have now established an inducible CRISPR interference system for this species and managed to block the ex-pression of lmdC using an sgRNA directed against the 5' region of its non-coding strand. We observed that cells lacking LmdC show a phenotype very similar to that of the ΔbacA mutant. Together with the finding that the N-terminal cytoplasmic region of LmdC physically interacts with BacA, this result strongly supports the hypothesis that BacA and LmdC act in the same pathway, forming a complex that ensures proper morphogenesis in H. neptunium (new Figure 9).

      The data presented illuminate aspects of bacterial morphogenesis and the physical and functional relationship between polymerizing proteins and cell wall enzymes in bacteria, a recurring theme in bacterial cell biology with a variety of underlying mechanisms. Bactofilins in particular are relatively recently discovered and any new insights into their functions and mechanisms of action are valuable. The findings presented here are likely to interest those studying bacterial morphogenesis, peptido-glycan, and cytoskeletal function.

      Reviewer #2 (Public Review):

      This is an excellent study. It starts with the identification of two bactofilins in H. neptunium, a demonstration of their important role for the determination of cell shape and discovery of an associated endopeptidase to provide a convincing model for how these two classes of proteins interact to control cell shape. This model is backed up by a quantitative characterisation of their properties using high-resolution imaging and image analysis methods.

      Overall, all evidence is very convincing and I do not have many recommendations on how to improve the manuscript.

      In my opinion, there are only two issues that I have with the paper:

      1. The single particle dynamics of BacA is presented as analysed and I would like to give some suggestions how to maybe extract even more information from the already acquired data:

      1.1. Presentation: Figure 5A is only showing projections of single particle time-lapse movies. To convince the reader that it was indeed possible to detect single molecules it would be helpful if the authors present individual snapshots and intensity traces. In case of single molecules these will show step wise bleaching.

      We have now added a supplementary video that shows both time series and intensity traces of individual BacA-YFP molecules (Figure 6—Video 1). It verifies the step-wise bleaching of the particles observed and thus shows that we observe the mobility of single molecules. Moreover, we have now included a supplementary figure that shows all trajectories identified within representative cells. This visualization provides a more comprehensive view of our data and further supports the notion that our analysis is based on the detection of single molecules.

      1.2. Analysis: Figure 5B and Supplement Figure 1 are showing the single particle tracking results, revealing that there are two populations of BacA-YFP in the cell. However, this data does not show if individual BacA particles transition between these two populations or not. A more detailed analysis of the existing data, where one can try to identify confinement events in single particle trajectories could be very revealing and help to understand the behaviour of BacA in more detail.

      We agree that an analysis of the single-molecule traces for transitions between the mobile and static states would help to achieve a more detailed understanding of the polymerization behavior of BacA. We believe that the dynamic formation, reorganization and disappearance of BacA-YFP foci observed by time-lapse analysis (Figure 4) indicates that BacA undergoes reversible polymerization in vivo. A deeper investigation of this aspect is beyond the scope of the present study and will be performed at a later point.

      1. The title of Fig. 3 says that BacA and BacD copolymerise, however, the data presented to confirm this conclusion is actually rather weak. First, the Alphafold prediction does not show the co-polymer, and second, the in vitro polymerisation experiments were only done with BacA in the absence of BacD. Accordingly, the only evidence that supports this is their colocalization in fluorescence microscopy. I suggest either weakening the statement or changing the title adds more evidence.

      To support the idea that BacA and BacD interact with each other, we have now added images of cells producing BacA-YFP or BacD-CFP individually (new Figure 3—figure supplement 1B,C). The results obtained show that Bac-YFP alone still forms filamentous structures, whereas BacD-CFP condenses into tight foci in the absence of its paralog. However, when produced together with BacA-YFP, the two proteins colocalize into filamentous structures, supporting the notion that they interact with each other. However, we agree that it is unclear whether BacA and BacD copolymerize into mixed protofilaments or whether they form distinct protofilaments that then interact laterally to form larger bundles. We have therefore replaced the term “co-polymerize” with “assemble” in the heading of this section.

      Finally, did the authors think about biochemical experiments to study the interaction between the cytoplasmic part of LmdC and the bactofilins? These could further support their model.

      We show the interaction between the cytoplasmic region of H. neptunium LmdC and BacA in Figure 9G,H (previously Figure 8D,E). For technical reasons, it was not possible to synthesize a peptide com-prising the corresponding region of R. rubrum LmdC, so that our in vitro analysis is limited to the H. neptunium proteins.

      To further support the notion that BacA interacts with the cytoplasmic region of LmdC, we have now analyzed the localization behavior of two LmdC variants with amino acid exchanges in the conserved cytoplasmic β-hairpin motif (new Figure 11). Both variants no longer colocalize with BacA and are no longer enriched at the inner cell curve. Interestingly, these exchanges also affect the enrichment of BacA at the inner cell curvature, suggesting that BacA needs to interact with LmdC for proper localization. It is tempting to speculate that BacA polymers have a preferred intrinsic curvature and that the activity of the BacA-LmdC complexes adjusts cell curvature in a manner that facilitates their association with the inner curve.

      Reviewer #1 (Recommendations for The Authors):

      We have the following specific recommendations for the improvement of the manuscript:

      1. Several places would benefit from additional quantitation of data:

      a. Figure 1 and supplements: can cell shape be quantified in a more specific way? (e.g. principle component analysis of shape as in https://onlinelibrary.wiley.com/doi/10.1111/mmi.13218). It looks as if BacD production may partially rescue the bacA shape phenotype?

      We have made considerable efforts to establish methods to quantify morphological changes and protein localization patterns in Hyphomonas neptunium. Since standard software packages, such as Oufti or MicrobeJ, are not able to reliably detect stalks and, thus, typically identify buds as separate cells, we have developed our own analysis software (BacStalk; Hartmann et al, 2020, Mol Microbiol), that is optimized for the detection of thin cellular extensions. However, while this software works very well with wild-type cells, it also fails to recognize amorphous cells with multiple, ill-defined extensions. Given these problems in cell segmentation, it is currently not possible to use principle component analysis to obtain a robust measure of the morphological defects of bactofilin mutants in H. neptunium.

      b. Figures 2-S2b, 7D and 9-S1b - can the area under the peaks be quantified and compared across strains? Visual examination of the spectra makes it difficult to discern differences.

      A direct comparison of the peak areas between strains is not possible, because the absolute values depend on the amount of peptidoglycan used in the muropeptide analyses. It is very difficult to precisely quantify peptidoglycan, which makes it challenging to use equal amounts of material from different strains in the reactions. However, the relative proportion of different muropeptide species, as provided in Figure 2—Dataset 1, faithfully reflects the composition of peptidoglycan and can easily compared between strains.

      c. Figure 9E,F, 9-S4d - BacA and LmdC localization in R. rubrum is very difficult to assess. It does not look linear/filamentous in most cells and is difficult to tell if it is associated with the inner curvature. Can you quantify the position of the signal along the short axis of the cell to better demonstrate that?

      We agree that a better quantification of the distribution of protein along the cell envelope of R. rubrum is required to support the conclusions drawn. To address this issue, we have now used line scans to measure the fluorescence intensities along the inner and outer curve of cells (n=200 per strain) and visualized the data in the form of demographs. The results clearly show an enrichment of BacA and LmdC at the inner curve in wild-type cells and a disruption of this pattern in various mutant backgrounds (new Figures 10F,G,J and 11D,E).

      1. Figure 2-S2A. Does ∆bacD grow better than wild-type? It would also be useful to add growth curves of the bacA complemented strains.

      In the case of H. neptunium growth curves are often misleading, because cells start to aggregate at the late exponential phase due to abundant EPS formation. The degree of cell aggregation also depends on the morphology of cells, because EPS production is limited to the mother cell body, which makes it challenging to compare morphologically distinct mutant strains. We have now performed growth assays for all H. neptunium deletion and complementation strains used in the study and limited the analysis of doubling times to the early and mid-exponential phase, in which cells do not yet form visible aggregates. The results obtained are now included in the new Figure 1F and Figure 1—figure supplement 2D. They show that the doubling times of the different bactofilin mutants are close to that of the wild-type strain.

      1. Figure 4BC: From the demographs provided, BacA and BacD appear to have different localization dynamics. BacD seems to stay at the base of the stalk, nearest the mother cell, whereas BacA migrates towards to bud? Also, "length" is misspelt in the panels.

      During the transition to bud formation, we indeed observe that the localization patterns of BacA and BacD are in many cases not fully superimposable, with BacD lagging behind BacA and forming transient additional clusters in the vicinity of the stalk base. Examples are now shown in Figure 4—figure supplement 4). This effect explains the distinct patterns in the demographs. We have now modified the text accordingly. We have also corrected the spelling of “length” in the figure.

      1. Can BacD polymerize on its own? It colocalizes with BacA in E. coli but that does not necessarily mean it co-polymerizes.

      Please see our response to a similar issue (point 2) raised by Reviewer #1.

      1. Lines 263-266. You use E. coli PG as a substrate for LmdC in vitro because "peptidoglycan from H. neptunium shows only a low degree of cross-linkage and hardly any pentapeptides." Does this not have relevance to the physiological significance of the observed activity? Or do you presume that LmdC activity (and/or that of other endopeptidases) is very high in H. neptunium so it is difficult to detect additional activity using HnPG as a substrate? It would be useful to clarify this logic in the text.

      DD-crosslinks are formed by all major peptidoglycan biosynthetic complexes, including the elongasome and the divisome, so that their general relevance to cell growth in H. neptunium is beyond doubt. The low degree of crosslinkage observed suggests that H. neptunium contains high endopeptidase activity, which cleaves crosslinks after their formation by DD-transpeptidases. We have now added the explanation “likely due to a high level of autolytic activity” to make this point clearer. Whether LmdC makes a major contribution to the low level of crosslinkage remains to be determined. However, our data suggest that it mostly acts in complex with BacA, so that it may only cleave peptidoglycan locally and not have a global effect global on cell wall composition. It would not possible to detect the DD-endopeptidase activity of LmdC using H. neptunium peptidoglycan as a substrate, because it has a low content of DD-linked peptide chains. To facilitate the in vitro activity assay, we therefore used highly crosslinked peptidoglycan from a mutant E. coli strain.

      1. Lines 268-269: Is there some explanation for why monomers do not increase on LmdC treatment? Here quantitation of peaks before and after treatment would allow the reader to more precisely interpret these data.

      The absolute peak sizes are not comparable, because there is some variation in the amount of peptido-glycan included in the assays (see also our comments on point 1b raised by Reviewer #1) and the integrated peak areas (which correspond to the amounts of muropeptide species produced) depend on both the height and the width of the peaks, which vary to some degree in different HPLC runs. The relevant measure to compare the muropeptide profiles is therefore the relative content of different muropeptide species in the different conditions. For clarification, we have now added the following sentence to the legend of Figure 8D: “A quantification of the relative abundance of different muropeptide species in each condition, based on a comparison of the relative integrated peak areas, is provided in Figure 8—Dataset 1.” The control reaction lacking LmdC only contains peptidoglycan diluted in buffer and thus provides insight into muropeptide composition of untreated peptidoglycan.

      1. Lines 280-283: It would be interesting to know if the transmembrane domain of LmdC is required for its localization since it is dispensable for binding BacA and since LmdC still localizes to foci without BacA.

      Given that it is currently not possible to localize LmdC in H. neptunium, we were not able to perform this analysis.

      1. Line 296: it is also possible that LmdC localizes with another protein and does not independently assemble into larger complexes.

      Since the localization pattern reported for LmdC in the ΔbacAD background is no longer valid, we have not discussed this aspect in the revised version of our manuscript. However, in general, we do not exclude the possibility that LmdC could interact with other peptidoglycan biosynthetic proteins.

      1. Line 304-306 and Fig 9: Is the domain organization of RrLmdC the same as for HnLmdC? It would be useful to include its domain organization as well. Also, please add amino acid numbering to Figure 9B.

      We have now added a schematic showing the domain organization of LmdC from R. rubrum (new Figure 10B). The protein is highly similar to its homolog from H. neptunium.

      1. Line 340-341: "In both cases, they functionally interact with LmdC-type DD-endopeptidases to promote local changes in the pattern of peptidoglycan biosynthesis." This conclusion is not experimentally supported. Since LmdC is essential and you could not make a depletion strain in H. neptunium, it was not shown that the interaction with LmdC is how BacA promotes changes in PG patterning. HADA/FDAA labeling was not performed in R. rubrum, and no global changes in PG chemistry were observed in bacA or lmdC mutants, so you cannot claim BacA or LmdC influences PG patterning there, either. Either soften this statement to a hypothesis or otherwise rephrase.

      To further corroborate a functional interaction between BacA and LmdC, we have now established an inducible CRISPRi system to deplete LmdC from H. neptunium cells (see also our comments on the public review of Reviewer #1). We observe that the loss of LmdC leads to a phenotype very similar to that observed for the ΔbacA(D) mutant, supporting the idea that BacA and LmdC act in the same path-way. We have now also performed localization studies of the elongasome component RodZ in H. nep-tunium, which demonstrate that the spatial distribution of elongasome complexes is affected in the absence of the bactofilin cytoskeleton in H. neptunium. Combined with the observation that LmdC is a catalytically active DD-endopeptidase and its absence leads to morphological defects, these results indicate that BacA, together with LmdC, induces local changes in pattern of peptidoglycan biosynthesis, both by affecting elongasome movement and, likely, by reducing peptidoglycan crosslinking in the cell envelope regions it occupies.

      1. Figure 9-S4: there is no panel C (change D to C).

      Corrected.

      1. Lines 344-355: No data is presented here to support the barrier model of bactofilin function. In addition, it is unclear why cells would take on amorphous shapes instead of extended rod shapes/filaments if elongasome function was not constrained on the longitudinal axis. It would be helpful to have more discussion of the potential mechanisms of LmdC function in H. neptunium in this section of the discussion since that is the emphasis of the results section.

      To support the barrier model, we have now compared the localization dynamics of the elongasome component RodZ in wild-type and ΔbacAD cells. The results show that RodZ is excluded from the stalk in the wild-type background, whereas it readily enters the stalk in the mutant cells, leading to the expansion of stalks into large, amorphous extensions. Consistent with these findings, HADA labeling is not observed within the stalks in wild-type cells, whereas it is readily observed in the enlarged stalk structures (pseudohyphae) formed in the mutant cells.

      The current model of MreB movement suggests that MreB filaments have an intrinsic curvature and thus preferentially align along regions of similar curvature, which is along the circumference of the cell in rod-shaped geometries. However, previous work has shown that MreB starts to move along randomly oriented trajectories as soon as cells lose their rod-shaped morphology and adopt more spherical shapes (Hussain et al, 2018, eLife). In line with these findings, our current and our previous work (Cserti et al, 2017, Mol Microbiol) indicate that the expansion of the ovoid H. neptunium mother cell prior to the onset of stalk biosynthesis as well as bud formation are mediated by the elongasome complex. Thus, the elongasome can clearly also give rise to shapes other than rods. Interestingly, however, the H. neptunium elongasome also appears to drive the formation of the rod-shaped stalk, possibly by moving around the circumference of the stalk base. Thus, species- or growth phase-dependent regulatory mechanisms or, potentially, differences in the spatial arrangement of the glycan strands within the peptido-glycan layer may result in different modes of elongasome movement and, thus, modulate the morphogenetic activity of elongasome complexes.

      1. Lines 395-397: It is also possible that LmdC positioning is dependent on cell morphology, rather than directly on BacA, since morphology is so distorted in bacA mutant cells.

      We provide several lines of evidence showing that LmdC and BacA functionally and physically interact (see above), making it highly unlikely that the two proteins are not associated with each other. How-ever, our previous (Figure 10I,J) and new (Figure 11) results suggest that the physical interaction with LmdC and/or or the cell shape-modulating activity of the complex are required for the proper localization of BacA at the inner curve of the cell. This finding may indicate the existence of a self-reinforcing cycle, in which the morphological changes induced by BacA-LmdC assemblies stimulate the recruitment of additional assemblies to their site of action.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents useful findings regarding the impact of forest cover and fragmentation on the prevalence of malaria in non-human primates. The evidence supporting the claims of the authors is, however, incomplete, as the sampling design cannot adequately address the geospatial issues that this study focuses on.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study as a concept is well designed, although there is still one issue I see in the methodology.

      I still have concerns with their attempts to combine the different scales of data. While the use of point data is great, it limits the sample size, and they have included the district to country level data to try and increase the sample size. The problem is that although they try to get an overall estimate at the district/state/country by taking 10 random sample points, which could be a method to get an estimate for the district/state/country. It would be a suitable method if the primates were evenly distributed across the district/state/country. The reality is that the primates are not evenly distributed across the district/state/country therefore the random point sampling is not a reasonable method to get an estimate of the environmental variables in relation to the macaques. For example if you had a mountainous country and you took 10 random points to estimate altitude, you would end up with a large number, but if all the animals of interest lived on the coast, your average altitude is meaningless in relation to the animals of interest as they are all living at low altitude. The fact that the model relies less on highly variable components and places more reliance on less variable components, is really not relevant as the district/state/country measurements have no real meaning in relation to the distribution of masques.

      A simple possible way forward could be to run the model without the district/state/country samples and see what the outcome is. If the outcome is similar then the random point method may be viable (but if it gives the same outcome as ignoring those samples then you don't need the district/state/country samples). If you get a totally different outcome then it should raise concerns about using the district/state/country samples.

      This paper is a really nice piece of work and is a valuable contribution but the district/state/country sample issue really needs to be addressed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A simple possible way forward could be to run the model without the district/state/country samples and see what the outcome is. If the outcome is similar then the random point method may be viable (but if it gives the same outcome as ignoring those samples then you don't need the district/state/country samples). If you get a totally different outcome then it should raise concerns about using the district/state/country samples.

      Thank you for your comments, and for the suggestions to address the issues identified in your main commentary by running an analysis on exclusively GPS geolocated data points. This was the original plan for analysis, but the available data identified in the literature review includes only 14 data points (macaque P. knowlesi prevalence surveys) with associated GPS coordinates. This was found to be too limited to obtain meaningful results from a regression analysis, and hence we then explored methods for utilising all available data to identify trends whilst accounting for spatial uncertainty in the analysis. As the point location only represents the location of capture and not the extent of the home range of the NHPs, we additionally feel there is value in exploring methods to encompass the wider surrounding habitat.

      We do appreciate the concerns you raise with the random point method being used to represent macaque survey sites when species of interest are not necessarily evenly distributed across an area. To investigate this, we ran sensitivity analysis on a subset of the dataset according to whether the points fall in areas of >50%, >75% or >90% predicted probability of macaque occurrence, with maps derived from published models of macaque suitability in Southeast Asia. For each of these thresholds, points that fall outside these areas were removed – such that, if a random point is located on a mountain range where there is 0 likelihood of macaque occurrence, it is excluded from the analysis. We found that restricting analysis to areas with highly probably macaque habitat still shows a robust effect of forest cover on NHP prevalence, and additionally that for the most conservative (>90%) habitat threshold there remains an effect of forest fragmentation on prevalence (SI Table S17c, Figure S15c). Given that using the full data set increases the uncertainty, as there is more variation in covariates between the replicates, this can be considered a more conservative approach to detecting an effect of environment as reported in the main findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1. A more thorough analysis of transition boundaries between different types of patterns would further strengthen the conclusions.

      We agree that the transition between different patterning regimes should be discussed more quantitatively in the manuscript. Specifically, we identified a highly sensitive parameter range where the disorder in the patterns rapidly increases as a function of the VEGF stimulus. We have improved our discussion of the transition between ‘orderedlike’ patterns and ‘disordered-like’ patterns in the main text as follows: “At relatively low VEGF levels, the patterns were mostly ordered, with small deviations from the expected ‘salt and paper’ geometry with a 25%-75% ratio of TipStalk (Fig. 2D). However, as the VEGF input increased, the fraction of Tips grew and the patterns became sharply more disordered over a relatively narrow range of magnitude of the VEGF input, which could be identified as a highly sensitive area separating more ‘ordered-like’ and ‘disordered-like’ patterns. Finally, increasing VEGF stimuli beyond the highly sensitive area further increased the disorder of the patterns, but with a lower VEGF sensitivity, over several more orders of magnitude of VEGF inputs”.

      Reviewer #2 (Recommendations For The Authors):

      Please refer to the Public Comments above for a broad review. Below, I provide specific concerns that could be addressed.

      Main comments

      1. Is the salt-and-pepper model observed for the case when there is no VEGF in the experiments? It would be good to confirm the same. If not, the analysis presented in Fig. 3 could be performed for this case and used as a baseline while referring to the data in Fig. 3.

      We thank the referee for the interesting suggestion. The pattern predicted by the model is not strictly salt-and-pepper in absence of VEGF, but the disorder quantified in terms of “incorrect” contacts between Tip cells is considerably lower (see for example the disorder quantification in supplementary figure 1C). We have included the Tip-Tip contact statistics for a case of VEGF=1 ng/ml (100-fold lower that the level used in Fig. 3 compare between model and experiment). In this case, there is clearly more spacing between Tip cells, thus demonstrating how high VEGF stimuli increase the probability of contacts between Tip cells. In the main text, we commented: “As a baseline comparison, the mathematical model with a 100-fold reduction of VEGF stimulus (1 ng/ml) exhibited a Tip-Tip distance statistics more closely comparable with the ‘salt-and-pepper’ model”.

      1. The authors mention in the Discussion (end of pg. 7) that ...a low level of exogeneous VEGF is essential to induce Delta-NOTCH signalling.. However, in the standard NOTCH signalling (Boareto et al.), we can get the salt-and-pepper pattern without any VEGF. Am I missing something? The authors may want to take a re-look.

      We appreciate the referee’s understanding of the mathematical model. The model used here still exhibits a bistable behavior between the low-Delta and high-Delta cell states even in the absence of VEGF input, as seen for example in the cell state distribution of Fig. 2B, and in agreement with the original model by Boareto et al. This behavior is reflective of the more general applicability of the model, as it describes Delta-NOTCH interactions in various systems. For endothelial cells, VEGF is indeed required to trigger this interaction, but this was not the primary focus of the paper, hence the original model was used. In the text referred to by the reviewer, we are discussing the role,of VEGF based in its known biological effects as well as modeling results. We anticipate that the future further adaptation of the model to,endothelial cells will refine its description of of cell interactions in the absence of VEGF.

      1. The size of cells (or spacing between cell nuclei) is highly variable (Fig. 3). Since it is known that the size of cell-cell junctions influences signalling, it would good to at least comment on the same, considering that the model in the paper consists of regular static hexagons. Similarly, it seems desirable to comment on expressing the distance between Tip cells (Fig. 3) in cell length units, when the cell lengths are so variable.

      We concur with the suggestion that our consideration of the cell-cell contact size in NOTCH signaling should be clarified in the manuscript.

      Sprinzak et al. reported in their 2017 article published in Developmental Cell that the cell-cell contact area does influence NOTCH Signaling. In this article, they found that NOTCH trans-endocytosis (TEC) for pairs with a larger contact width (25µm) is up to five times higher than for pairs with a smaller contact (2.5µm), as observed through the two-cell TEC assay. While TEC correlates with contact width across a range from 1 to 40µm, the values fluctuate significantly in the middle range, particularly when excluding extremely low cell-cell contact areas.

      In our experiments, we observed that the cell-cell contact area ranges from essentially infinitesimal corner-to-corner contact to roughly 50µm. We excluded the corner contacts, which might correspond to extremely low cell-cell contact areas, from the Tip-Tip distance measurements as depicted in Fig. 3B. We also made the assumption that variations in cell-cell contact size within tens of microns correlate weakly with the strength of NOTCH signaling. This assumption did not impede our effort to compare the overall trends with results from modeling using hexagonal cells, as shown in Figs 6 D&E. We have included this comment and the corresponding reference to elucidate our assumption in the results as follows: In our experiments, the observed cell-cell contact area varied, spanning from very low (cell corner-to-corner contact) up to approximately 50µm. Previous studies(14, 15) have clearly demonstrated the influence of the cell-cell contact area on NOTCH Signaling, but the values get nosy in the middle range, particularly when excluding extremely low cell-cell contact areas. Reflecting these findings, we excluded the corner contacts, which might correspond to extremely low cell-cell contact areas, from the Tip-Tip distance measurements as depicted in Fig. 3B. We also made an assumption that variations in cell-cell contact size within tens of microns correlate weakly with the strength of NOTCH signaling. This assumption did not impede our effort to compare the overall trends with results from modeling using hexagonal cells, as shown in Figs 3 D&E.

      1. The results presented in Fig. 6J are quite striking. However, the number of samples N = 10 and N = 11 seem somewhat low. How does one justify that the findings are not influenced by low number fluctuations?

      We acknowledge the reviewer's concerns regarding potential biases stemming from a limited number of samples. The analysis presented in Fig. 6J was specifically designed to complement and support the findings in Fig. 6H. In this context, the counts of sprout and mini-sprout dots correspond to the number of instances "including a sprout" and "including a mini-sprout."

      While the counts of sprouts and mini-sprouts in Fig. 6H might seem limited as highlighted by the reviewer, the statistical difference between the two groups was found to be significant. Nevertheless, we expanded our regions of interest to encompass neighboring cells, based on the rationale that the local environment might have closely interacting and similar features. The sample sizes in Figure 6J, represented as N=10 and N=11, equate to an examination of 70 cells and 77 cells, respectively. For instance, in the category "including a sprout," five out of ten groups indicated that all seven neighboring cells in a group exhibited fibronectin levels exceeding a given threshold, translating to 35 cells with fibronectin levels above this threshold. Given that the observed trends in distribution were consistently reasonable across the examinations of both 70 and 77 cells, we would like to state that we are confident in our results.

      1. It is written towards the end on pg. 5 that ... although all sprouts indeed formed from mini-sprouts, not all .... However, as can be seen from Fig. 4O, Sprouts can also be generated from Stalk cells. This should be corrected.

      Thank you for highlighting the discrepancy between our statement on page 5 and the observations in Fig. 4O. While all sprouts undergo a mini-sprout phase, the transition from Stalk to mini-sprout is not always be observed due to the limitations of our observational timeframe. We acknowledge this oversight and adjusted our statement to clarify that sprouts appearing to form directly from Stalks likely passed through an unobserved intermediate mini-sprout stage as follows: We found that all sprouts formed either directly from Stalks or from mini-sprouts, suggesting a non-observed transition from Stalk to mini-sprout due to observational timeframe limitations. Strikingly, however, not all minisprouts persisted and initiated sprout formation.

      1. No solid blue bars are shown in Fig. S2A as mentioned in the caption. Kindly correct.

      We apologize for the mistake. We have corrected the figure to show the blue bars depicting the experimental measurements for sprout distance probability.

      1. How are the high-Delta cells or high-NOTCH cells decided in experiments or simulations? Does it happen that Delta and NOTCH levels are comparable? In that case, what is done? This point could be clarified in the main manuscript or Materials and Methods.

      We agree with the reviewer that Tip cell definition should be clarified. In the model, we define a threshold level for cellular Delta to distinguish Tip and Stalk cells, which is now explained in the Methods section “Definition of Tip cells in the model”. As elaborated in the new section, Delta and NOTCH levels are never comparable due to the circuit’s bistable behavior. In experiments, Tip cells based on their key phenotypic characteristic — invasive migration into the surrounding collagen matrix rather than Delta or NOTCH levels. The details can be found in “Precise quantification of Tip cell spatial arrangement suggests disordered patterning in the engineered angiogenesis model” section and Figure 3A.

      Minor comments

      There are a good number of typos in the paper. The manuscript should be carefully checked and corrected for the same. Below, I provide a few instances.

      1. In the abstract towards the end, it should be "understanding" instead of "understating"

      2. On pg. 5, just before the beginning of the last paragraph, there is a typo "parodied" which should most likely be "provided"

      3. First paragraph on pg. 6 typo "spouts" instead of "Sprouts"

      4. Second paragraph on pg. 6, correctly write "testS"

      5. Near the beginning of pg. 8, should be "C. elegans" instead of "C. elegance"

      6. Figure 1 caption, towards the end, should be "Stalk" instead of "Salk"

      We sincerely appreciate your keen attention to detail. we have thoroughly reviewed the manuscript and made the necessary corrections, including those that you have highlighted.

      Reviewer #3 (Recommendations For The Authors):

      Major concern:

      The authors should discuss in more detail how their work can be used for a better understanding of the angiogenesis process in physiological conditions and in pathological conditions such as post-ischemic revascularization or tumor vascularization.

      We have included comments and the corresponding references to clarify the aspect the reviewer suggested: The results in this study can further inform our understanding of angiogenesis in physiological and pathophysiological conditions. In particular, in many circumstances, the levels of VEGF is determined by the degree of hypoxia, which can be highly elevated following oxygen supply interruption, e.g., in wound healing or ischemia, or due to progression of neoplastic growth. Our results suggest that in these cases, formation of sprouts can be dysregulated due to higher incidences of co-localizations of prospective Tip cells. In addition, since these conditions are frequently accompanied by altered synthesis of ECM, the sprout density can increase, which may lead to formation of denser and less developed vascular beds frequently observed as a result of tumor angiogenesis(42, 43). Our results thus suggest that the disorder and higher plasticity of the endothelial cell fate speciation at higher VEGF inputs can be a key contributor to some pathological states associated with persistently hypoxic conditions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      Ngoune et al. present compelling evidence that Slender cells are challenged to infect tsetse flies. They explore the experimental context of a recent important paper in the field, Schuster et al., that presents evidence suggesting the proliferative Slender bloodstream T. brucei can infect juvenile tsetse flies. Schuster et al. were disruptive to the widely accepted paradigm that the Stumpy bloodstream-form is solely responsible for tsetse infection and T. brucei transmission potential. Evidence presented here shows that in all cases, Stumpy form parasites are exponentially more capable of infecting tsetse flies. They further show that Slender cells do not infect mature flies.

      However, they raise questions of immature tsetse immunological potential and field transmission potential that their experiments do not address. Specifically, they do not show that teneral tsetse flies are immunocompromised, that tsetse flies must be immunocompromised for Slender infection nor that younger teneral tsetse infection is not pertinent to field transmission.

      Strengths:

      Experimental Design is precise and elegant, outcomes are convincing. Discussion is compelling and important to the field. This is a timely piece that adds important data to a critical discussion of host: parasite interactions, of relevance to all parasite transmission.

      Thank you

      Weaknesses:

      As above, the authors dispute the biological relevance of teneral tsetse infection in the wild, without offering evidence to the contrary. Statements need to be softened for claims regarding immunological competence or relevance to field transmission.

      We have modified the revised version to soften these claims (l.156 and l.159). Please, note that the limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 18-21. Our comment on the relevance to field transmission is simply based on field observations of the fly biology.

      Reviewer #2:

      Summary:

      Contrary to findings recently reported by Schuster S et al., this short paper shows evidence that the stumpy form of T. brucei is probably the most pre-adapted form to progress with the life cycle of this parasite in the tsetse vector.

      Strengths:

      One of the most important pieces of experimental evidence is that they conduct all fly infection experiments in the absence of metabolites like GlcNAc or S-glutathione; by doing so, the infection rates in flies infected with slender trypanosomes seem very low or non-existent. This, on its own, is a piece of important experimental evidence that the Schuster S et al findings may need to be revisited.

      Thank you

      Weaknesses:

      I consider that the authors should have included their own experiments demonstrating that the addition of these chemicals enhances the infection rates in flies receiving bloodmeals containing slender trypanosomes.

      The main purpose of this study is to assess the intrinsic infectivity of SL Vs. ST in teneral Vs. adult flies, not to reproduce the results obtained by Schuster et al.. We think that the suggested experiment is not necessary as L-Glutathion is well-known to enhance infection rates by reducing the fly immune response efficiency (Ref 24). Most of the experimental infections with procyclic or ST forms (even at low densities) published by our lab and others, especially for studying parasite stages in the salivary glands, were actually performed by complementing the infective meal with L-Glutathion for this reason.

      Reviewer #3:

      The dogma in the Trypanosome field is that transmission by Tsetse flies is ensured by stumpy forms. This has been recently challenged by the Engstler lab (Schuster et al.), which showed that slender forms can also be transmitted by teneral flies. In this work, the authors aimed to test whether transmission by slender forms is possible and frequent.

      For this, the authors repeated Tsetse transmission experiments but with some key critical differences relative to Schuster et al. First, they infected teneral and adult flies. Second, their infective meals lacked two components (N-acetylglucosamine and glutathione), which could have boosted the infection rates in the Schuster et al. work. In these conditions, the authors observed that most stumpy form infections with teneral and adult flies were successful while only 1 out of 24 slender-form infections was successful. Adult flies showed a lower infection rate, which is probably because their immune system is more developed.

      Given that in Tsetse-infested areas most transmission is likely ensured by adult flies, the authors conclude that the parasite stage that will have a significant epidemiologic impact on transmission is the stumpy form.

      Strengths:

      • This work tackles an important question in the field.

      • The Rotureau laboratory has well-known expertise in Tsetse fly transmission experiments.

      • Experimental setup is robust and data is solid.

      • The paper is concise and clearly written.

      Thank you

      Weaknesses:

      • The reason(s) for why this work has lower infection rates with slender forms than Schuster et al. remain unknown. The authors suggested it could be because of the absence of N-acetylglucosamine and/or glutathione, but this was not formally tested. Could another source of variation be the clone of EATRO1125 AnTat1.1 (Paris versus Munich origin)? To reduce the workload, such additional experiments could be done with just one dose of parasites.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could indeed explain the differences in infection rates observed in the two studies. However, the main purpose of this study is to assess the intrinsic infectivity of SL Vs. ST in teneral Vs. adult flies. Our study was designed to stand alone for providing a clear answer to this question, not to reproduce the results obtained by Schuster et al.. Hence, we don’t think that any additional experiments are required here.

      • The characterization of what is slender and stumpy is critical. The authors used PAD1 protein expression as the sole reporter. While this is a robust assay to confirm stumpy, an analysis of the cell cycle would have been helpful to confirm that slender forms have not initiated differentiation (Larcombe S et al. 2023, preprint).

      In this study, ST are indeed defined by their general morphology and by the expression of PAD1 proteins at the cell membrane as assessed by IFA. This is the simplest and most accurate ST proxy accessible by IFA. We do not think that monitoring in more details the cell cycle would provide key information here. If some SL forms had initiated differentiation in our experiments, then, the low infection rates observed with SL would have reinforced the fact that mostly mature PAD1+ ST are infectious for flies .

      • Statistical analysis is missing. Is the difference between adult and teneral infections statistically significant?

      An ANOVA statistical analysis was performed and a dedicated section was added to the revised version.

      For all conditions, MG infection rate comparisons between adult and teneral flies were statistically significant.

      Recommenda8ons for the authors:

      Reviewer #1:

      While some perceived outcomes pertaining to immunological competence and transmission relevance of teneral flies are overstated, the overall tone of the paper is inappropriately apologe7c. The authors obviously don't want to offend their colleagues but the current wri7ng style obscures meaning, making the paper a bit 'flowery' and difficult to read.

      Ngoune et al. have important outcomes that need to be stated more directly.

      Words such as 'unequivocally' are not appropriate to Schuster et al's outcomes. As your study shows, their findings are experimentally based, with inherent caveats, and are therefore sugges7ve, not demonstrated or proven.

      The word 'unequivocally' has been removed from the revision.

      Reviewer #3:

      The Engstler lab cul7vates AntTaT1.1 in methylcellulose (Munich clone, if I am not mistaken). The Rotureau lab uses the Paris AntTaT1.1 clone and uses no methylcellulose. Given that methylcellulose helps stumpy forma7on, it seems important to show that the results of this paper are reproducible with the Munich clone grown in the presence of methylcellulose.

      Differences between the strain clones and culture conditions could indeed explain the differences in infection rates observed in the two studies. However, the main purpose of this study is to assess the intrinsic infectivity of SL Vs. ST in teneral Vs. adult flies. Our study was designed to stand alone for providing a clear answer to this question, not to reproduce the results obtained by Schuster et al.. Hence, we don’t think that any additional experiments are required here.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of the reviewers’ discussion:

      • The development of MSI-1 as a post-transcriptional regulator of gene expression in Escherichia coli represents a valuable addition to the synthetic biology toolkit. MSI-1 has advantages over transcriptional regulators because it has the potential to target single genes in operons. Allosteric control of MSI-1 by oleic acid increases its versatility.

      Authors’ response: We thank the reviewers and editor for this evaluation.

      • We recommend that authors add experiments to test the mechanism of regulation by MSI-1 or soften their claims about translational regulation. We also recommend that the authors expand their discussion of other natural and synthetic regulatory systems that target translation.

      Authors’ response: In this revision, we have added new experimental results from RT-qPCR, bulk fluorometry, and flow cytometry assays to further support our conclusions. We have also enlarged the Introduction and Discussion.

      • Adding an experiment to quantify the effect of oleic acid with the most strongly regulated reporter construct (i.e., flow cytometry with redesign-3) would substantially increase the impact of the work.

      Authors’ response: We have done this experimental quantification (see the new Fig. 5d).

      Reviewer #1 (Public Review):

      The authors develop reporter constructs in E. coli where gene expression, presumably translation, is repressed by MSI-1. This is a potentially useful tool for synthetic biologists, with the advantage over transcriptional regulation that one gene in an operon could be targeted. That being said, an important caveat of translational regulation that is not addressed in the manuscript is the potential for downstream effects on RNA stability and/or transcription termination. The authors' MSI-1-regulated reporter constructs could also be useful for mechanistic studies of MSI-1.

      Authors’ response: We thank the reviewer for such appreciation of our work. Regarding the potential effects on RNA stability or transcription termination, we would like to highlight our results with the sfGFP-mScarlet bicistron (Fig. 6c), showing the specific regulation of sfGFP by MSI-1* and not of mScarlet. Anyway, for this revision we have conducted an RT-qPCR experiment to quantify the mRNA level of sfGFP to further support our conclusions (see the new Fig. S2).

      The author's initial construct design led to only weak regulation by MSI-1, presumably because the MSI-1 binding sites were not suitably positioned to repress translation initiation. A more rationally designed construct led to considerably greater repression. One weakness of the paper is that the authors did not use their redesigned construct that is more strongly repressed to demonstrate allosteric regulation by oleic acid using a comparable assay (e.g., flow cytometry) to that used in other experiments. The potential for allosteric regulation is a major strength of the MSI-1 system, so this is a significant gap. Similarly, the authors use the weakly regulated constructs to assess the effect of MSI-1 binding site mutations and for their mathematical modeling; these experiments would be better suited to the more strongly regulated construct.

      Authors’ response: For this revision, we have performed the flow cytometric quantification of the allosteric regulation by oleic acid in the redesigned-3 system (see the new Fig. 5d). Regarding the kinetic study, we focused on the reporter system with just one recognition motif for simplicity. A reporter system with two recognition motifs, thereby recruiting two different proteins, increases the complexity to distill the effect of point mutations.

      Reviewer #1 (Recommendations For The Authors):

      1. Figure 5. Panels c-f look at colonies on plates, with numbers from these data being difficult to compare with either the bulk fluorescence or single-cell fluorescence values shown in other figures. Supplementary Figure 8 shows data for single cells; these data would be more appropriate in Figure 5, with the plate-based data moving to the supplement. Moreover, measuring the effect of oleic acid on the redesign-3 reporter using flow cytometry would assess the impact of oleic acid on the most strongly regulated reporter; this would be the most impactful analysis.

      Authors’ response: We have redone Fig. 5 to include flow cytometry data (also for the system implemented with the redesign-3 reporter).

      1. Paragraph starting line 438. The authors should briefly discuss the potential for translational repression leading to reduced RNA stability, and in the case of rapid repression that impacts transcription-coupled translation, its impact on Rho-dependent transcription termination. These factors could alter the expression of neighboring genes.

      Authors’ response: As we have shown with the RT-qPCR experiment, the mRNA level of the target gene does not change in response to protein binding. We agree that mRNA stability could potentially be changed by using other RNA-targeting proteins. But in our view, a reduction of RNA stability is not a regulation of translation. We have added the following sentence in the Discussion: “The additional use of RNA-binding proteins able to alter mRNA stability might lead to the implementation of more complex circuits at the posttranscriptional level.”

      1. Figure 1. It would be informative to include a control where cells have an empty plasmid rather than a plasmid expressing MSI-1, to address leakiness of MSI-1 expression.

      Authors’ response: We have constructed a void plasmid as suggested and performed new bulk fluorometry assays. The new Fig. S8 shows the tight control of MSI-1* expression with the PLlac promoter. No apparent leakage is observed.

      1. Line 132. Where were the two sequences positioned with respect to each other than the start codon? It would be helpful to show the sequence in Figure 1.

      Authors’ response: The precise sequence is shown in the inset of Fig. 1b. The motif is placed just after the start codon.

      1. Line 135. The authors envisioned repression mechanism isn't clear from the text, specifically the meaning of "block the progression" and "initial phase". As far as I know, there is no precedent for RNA-binding proteins repressing translation in bacteria by preventing translation elongation. Presumably, repression in the context described here would be due to MSI-1 binding over the ribosome-binding site, although the predicted hairpin may also occlude binding of initiating 30S ribosomes in the absence of MSI-1 binding.

      Authors’ response: It is difficult to know the exact mode of action. In page 7, we have rewritten a sentence to have: “In this way, MSI-1* can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.”

      1. Figure 1e is overly complicated and hence is difficult to interpret. The key result is that mScarlet expression is unchanged as a function of lactose concentration. It is sufficient to show the inset graph as a supplementary figure panel and to conclude that regulation of sfGFP is at a post-transcriptional level. Similarly, the inset in Figure 4b is unnecessary.

      Authors’ response: The inset of Fig. 1e shows that the growth rate of the cells is almost constant when lactose varies. A change in growth rate will affect protein expression. The use of a two-reporter system, one regulated translationally and the other not, is instrumental to extract from fluorescence data estimates of transcription and translation rates. Of course, showing that mScarlet expression is almost constant when lactose varies would be sufficient, but we believe that performing a fine treatment of the data helps to better understand the regulatory system from a mathematical and mechanistic point of view. Therefore, despite increasing the complexity of the figure, we prefer to keep the representation of the Crick spaces (following Alon’s terminology, see our ref. 32). We have tried to carefully explain Fig. 1e in the text.

      1. Figure 1f and Figure 4c would be easier to interpret as two-dimensional plots.

      Authors’ response: We decided to use 3D plots to have more compact representations of the data in the main figures. The accompanying insets show the percentage of cells above the threshold, which helps to understand the regulatory effects. In any case, we have provided the corresponding 2D plots in Fig. S10.

      1. I don't think Figure 2e is relevant. The key result is shown in Figure 2f, i.e., the effect of mutations on regulation by MSI-1.

      Authors’ response: We agree with the reviewer that the key result is shown in panel f. However, we prefer to keep panel e in Fig. 2 because, even if negative, this result may incite further research. In addition, we avoid the rearrangement of the whole figure.

      1. Lines 311-313. Without additional evidence that the mutants are toxic, I suggest removing this text.

      Authors’ response: As suggested, we have removed that claim.

      Reviewer #2 (Public Review):

      Summary:

      Dolcemascolo and colleagues describe the use of the mammalian RNA-binding protein Musashi-1 (MSI-1) to implement translational regulation systems in E. coli. They perform detailed in vitro studies of MSI-1 and its binding to different RNA sequences. They provide compelling evidence of the effectiveness of the regulatory system in multiple circuits using different mRNA sequence motifs. They harness allosteric inhibition of MSI-1 by omega-9 monounsaturated fatty acids to demonstrate a fatty-acid-responsive circuit in E. coli.

      Strengths:

      The experimental results are compelling and the characterization of the binding between MSI-1 and different RNA sequences is thorough and performed via multiple complementary techniques. Several new useful circuit components are demonstrated.

      Authors’ response: We thank the reviewer for such appreciation of our work.

      Weaknesses:

      MSI-1 provides 8.6-fold downregulation of sfGFP with an optimized mRNA sequence. In some applications, a larger degree of repression may be required.

      Authors’ response: We agree with the reviewer in this point. We expect to conduct further research in the future to optimize the dynamic range of the system. We have added the following sentence in the Discussion: “Further work should be conducted to enhance the fold change of the regulatory module and engineer complex circuits with it.”

      Reviewer #2 (Recommendations For The Authors):

      Overall, I think this paper is very well done and quite thorough. I only have minor suggestions:

      • For Figures 1f and 4c, it is quite hard to interpret the fraction of cells above the threshold with the 3d perspective. It would be clearer to use a more standard 2d plot where the histograms are offset along the y-axis and the threshold is indicated by a vertical line.

      Authors’ response: We decided to use 3D plots to have more compact representations of the data in the main figures. The accompanying insets show the percentage of cells above the threshold, which helps to understand the regulatory effects. In any case, we have provided the corresponding 2D plots in Fig. S10.

      • For Figure 4b, the highlighting of different sequence regions in red3 appears to be offset by one base (e.g. AAU is highlighted rather than AUG).

      Authors’ response: This has been corrected.

      • For line 504, it seems that MSI-1 is used for two different proteins. A different name should be assigned to this 200-residue protein to avoid confusion with the other MSI-1.

      Authors’ response: We now use the term MSI-1h* for the human version of the protein.

      • The note (Page S12) that A_0 + A_R = alpha/delta only applies in steady-state conditions, which should be stated.

      Authors’ response: We have specified that.

      • It seems that some authors work for the companies that sell some of the instruments/consumables used for the assays, specifically switchSENSE and LigandTracer. This may be something that should be declared under Competing Interests for the paper.

      Authors’ response: We are sorry for having missed this point. We have included a Competing Interests section to state that “RAHR and WFV work for Dynamic Biosensors. GPR and JB work for Ridgeview Instruments”.

      Reviewer #3 (Public Review):

      Summary:

      In this work, the authors co-opt the RRM-binding protein Musashi-1 to act as a translational repressor. The novelty of the work is in the adoption of the allosteric RRM protein Musashi-1 into a translational reporter and the demonstration that RRM proteins, which are ubiquitous in eukaryotic systems, but rare in prokaryotic ones, may act effectively as post-translational regulators in E. coli. The extent of repression achieved by the best design presented in this work is not substantially improved compared to other synthetic regulatory schemes developed for E. coli, even those that similarly regulate translation (eg. native PP7 repression is approximately 10-fold, Lim et al. J. Biol. Chem. 2001 276:22507-22513). Furthermore, the mechanism of regulation is not established due to missing key experiments. The work would be of broader interest if the allosteric properties of Musashi-1 were more effective in the context of regulation. Unfortunately, the authors do not demonstrate that fatty acids can completely de-repress expression in the experimental system used for most of their assays, nor do they use this ability in their provided application (NIMPLY gate).

      Authors’ response: For this revision, we have performed the flow cytometric quantification of the allosteric regulation by oleic acid in the redesigned-3 system, showing substantial de-repression of the system with the biochemical compound. We have redone Fig. 5 and modified the Results section accordingly. Aligned with the reviewers and editor, we believe that this new result helps to improve our manuscript.

      Strengths:

      The first major achievement of this work is the demonstration that a eukaryotic RRM protein may be used to posttranscriptionally regulate expression in bacteria. In my limited literature search, this appears to be the first engineering attempt to design an RBP to directly regulate translation in E. coli, although engineered control of translation via other approaches including alterations to RNA structure or via trans-acting sRNAs have been previously described (for review see Vigar and Wieden Biochim Biophys. Acta Gen. Subj. 2017, 1861:3060-3069). Additionally, several viral systems (e.g. MS2 and PP7) have been directly co-opted to work in a similar fashion in the past (utilized recently in Nguyen et al. ACS Synthetic Biol 2022, 11:1710-1718).

      Authors’ response: We thank the reviewer for such appreciation of our work.

      The second achievement of this work is the demonstration that the allosteric regulation of Musashi-1 binding can be utilized to modulate the regulatory activity. However, the liquid culture demonstration (Suppl. Fig 8) shows that this is not a very effective switch, with de-repressed reporter activity showing substantial change but not approaching un-repressed activity. This effect is stronger when colonies are grown on a solid medium (Fig. 5).

      Authors’ response: As we have previously indicated, the flow cytometric quantification of the allosteric regulation by oleic acid in the redesigned-3 system in liquid culture showed substantial de-repression with the biochemical compound. It is now stated in the text the following: “Nevertheless, the system implemented with the redesign-3 reporter displayed a better dynamic behavior in response to lactose and oleic acid. In particular, the percentage of cells in the ON state increased from 0 (with 1 mM lactose) to 71% upon addition of 20 mM oleic acid (Fig. 5d).” This new result helps to improve our manuscript.

      Weaknesses:

      In this work, the authors codon optimize the mouse Musashi-1 coding sequence for expression in E. coli and demonstrate using an sfGFP reporter that an engineered Musashi-1 binding site near the translational start site is sufficient to enable a modest reduction in reporter gene expression. The authors postulate that the reduction in expression due to inhibition of ribosome translocation along the transcript (lines 134/135), as an expression of a control transcript (mScarlet) driven by the same promoter (Plac) but without the Musashi-1 recognition site does not demonstrate the same repression. However, the situation could be more complex. Other possibilities include inhibition of translation initiation rather than elongation, as well as accelerated mRNA decay of transcripts that are not actively translated. The authors do not present any measurements of sfGFP mRNA levels.

      Authors’ response: In page 7, we have rewritten a sentence to have: “In this way, MSI-1* can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.” In addition, for this revision we have conducted an RT-qPCR experiment to quantify the mRNA level of sfGFP to further support our conclusions (see the new Fig. S2). As shown, there is no change in the mRNA level upon inducing the system with lactose.

      In subsequent sections of the work, the authors create a series of point mutations to assess RNA-protein binding and assess these via both a sfGFP reporter and in vitro binding assays (switchSENSE). Ultimately, it is difficult to fully rationalize and interpret the behavior of these mutants in the context provided. The authors do identify a relationship between equilibrium constant (1/KD) and fold-repression. However, it is not clear from the narrative why this relationship should exist. Fold-repression is one measure of regulator efficacy, but it is an indirect measure determined from unrepressed and repressed expression. It is not clear why unrepressed expression (in the absence of the protein) is expected to be a function of the equilibrium constant.

      Authors’ response: A mathematical derivation from mass action kinetics on why the fold change scales with 1/KD is provided in Note S2. It is the ratio between the unrepressed and repressed expression (i.e., fold change) what scales with 1/KD, but not the expression of a particular state. This kind of relationship has been previously established in the case of transcription regulation [see e.g. Garcia & Phillips, PNAS (2011), our ref. 39]. Our mathematical modeling results expand previous work by providing a single picture from which to analyze transcription and translation regulation.

      Subsequent rational redesign of the Musashi-1 binding sequence to produce three alternative designs shows that fold-repression may be improved to approximately 8.6-fold. However, the rationalization of why the best design (red3) achieves this increase based on either the extensive modelling or in vitro measured binding constants is not well articulated. Furthermore, this extent of regulation is approximately that which can be achieved from the PP7 system with its native components (Lim et al. J. Biol. Chem. 2001 276:22507-22513).

      Authors’ response: In the case of translation control, the regulation is more challenging because the target is quickly degraded, especially in bacteria (in contrast to transcription control, where the target is stable). This is acknowledged in the manuscript. Even though, it is possible to engineer synthetic circuits with sRNAs or RNA-binding proteins with sufficient dynamic range. We expect to conduct further research in the future to optimize the dynamic range of the system. We have added the following sentence in the Discussion: “Further work should be conducted to enhance the fold change of the regulatory module and engineer complex circuits with it.” Regarding the articulation of the results for the mutants and mathematical model, see our responses in the following questions.

      The application provided for this regulator (NIMPLY gate), is not an inherently novel regulatory paradigm, and it does not capitalize on the allosteric properties of Musashi-1, but rather treats Musashi-1 as a non-allosteric component of a regulatory circuit.

      Authors’ response: The NIMPLY gate refers to lactose and aTC as inputs. Considering oleic acid as an additional input will lead to a more complex logic. In the last Results section, we wanted to show that the post-transcriptional mechanism engineered with Musashi-1 can be useful specifically regulate a gene within an operon, to implement combinatorial regulation (i.e., coupling transcription and translation control), and to reduce protein expression noise. To these ends, the allosteric ability of the Musashi-1 was not so determinant. In this regard, it would be true that such fine regulatory effects might be achieved as well with non-allosteric RNA-binding proteins, such as MS2CP or PP7CP.

      Reviewer #3 (Recommendations For The Authors):

      1. In the introduction the authors should adequately address the native bacterial mechanisms that allow posttranscriptional regulation in bacteria as well as better discuss previous examples of translational repressors.

      Authors’ response: We have added the following paragraph in the Introduction: “Even though bacteria do not appear to exploit proteins to regulate translation in a gene-specific manner, it is worth noting that some bacteriophages do follow this mechanism to modulate their infection cycle. These are the cases, e.g., of the coat proteins of the phages MS2 (infecting Escherichia coli) or PP7 (infecting Pseudomonas aeruginosa), which regulate the expression of the cognate phage replicases through protein-RNA interactions [18]. However, one limitation for synthetic biology developments is that such phage proteins are not allosteric. At the post-transcriptional level, bacteria mostly rely on a large palette of cis- and trans-acting non-coding RNAs to either activate or repress protein expression, resulting in the regulation of translation initiation, mRNA stability, or transcription termination, and even allowing sensing small molecules [1,15]. Thus, there should be efforts to replicate this functional versatility with proteins in bacteria.”

      1. Given the location of the Musashi-1 binding site in the sfGFP reporter, it may be blocking translation initiation, rather than blocking the progression of the ribosome once attached (line 134/135). The schematic in Fig 1a. is also not overly clear in describing the differences in mechanisms between eukaryotic and prokaryotic systems described in the text.

      Authors’ response: In page 7, we have rewritten a sentence to have: “In this way, MSI-1 can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.” In page 14, we have added the following sentence: “In this way, MSI-1 can also block the RNA component of the 30S ribosomal subunit.”

      1. The authors did not directly examine mRNA levels of their reporter to establish translational regulation. In many cases, inhibition of translation is accompanied by an increased degradation rate in bacterial systems. The authors do not seem to recognize this as a possible amplifier in their system, relying exclusively on normalization via another transcript produced from the same promoter (mScarlet).

      Authors’ response: For this revision we have conducted an RT-qPCR experiment to quantify the mRNA level of sfGFP to further support our conclusions (see the new Fig. S2). As shown, there is no change in the mRNA level upon inducing the system with lactose.

      1. The results presented for mutations 1-5 are not consistent with the author's models for what is occurring. In particular, mutant 1 displays a reduction in reporter production in the absence of Musashi-1, but the production in the presence does not change from the unaltered sequence. The claim that mutation 1 (in the UAG binding site) results in less binding and ultimately in less regulation is not substantiated since this loss of regulation is due to a reduction in unrepressed expression rather than an increase in expression when Musashi-1 is present.

      Authors’ response: We respectfully disagree with this appreciation. In the case of mutant 1, if the Musashi protein recognized the target mRNA with the same affinity as in the original scenario, the red bar would be much lower. Because the Musashi protein hardly recognizes the mutant-1 mRNA, the blue and red bars are quite similar. To clarify this point, we have added the following text in the manuscript: “Despite that mutation substantially reduced sfGFP expression in absence of MSI-1*, the presumed repressed state upon addition of lactose did not change much, suggesting the difficulty of the protein for targeting the mutated mRNA.”

      1. Given point 5 above, it is not clear to me why one would expect the 1/KD to be predictive fold-repression in the presence and absence of the repressor. I would rather see the relationship described as predictive in Fig. 2f (fold change vs. 1/KD) rather than the non-linear relationship. It is difficult to qualitatively evaluate the fit quality with the way the data are currently presented.

      Authors’ response: Note S2 provides a mathematical derivation from mass action kinetics on why the fold change scales with 1/KD. The R2 value that we provide for the fitting corresponds to the linear regression between fold and 1/KD, as specified in the figure legend. However, we think that the representation of fold vs. KD in log scale is more illustrative in this case.

      1. It is not clear what conclusion is determined from the computational modeling, or how this work contributes to the narrative presented. It does not seem like what is learned from these experiments is utilized for novel designs. Furthermore, several of the assumptions within the model may be problematic including the high rate of "elongation leakage" described and the lack of justification for RNA degradation rates utilized.

      Authors’ response: The mathematical modeling was performed to rationalize our experimental data. Our idea was more to recapitulate the observed dynamics than to guide the design of new systems. Our model might be exploited to this end in further research, as the reviewer suggests. Besides, elongation leakage is a concept that applies to both transcription and translation regulation systems, and it is not more than the ability of the RNA polymerase or ribosome to elongate even if there is a protein bound to the nucleic acid. This parameter can be set to 0 in the model if appropriate. Moreover, we cite the paper by Bernstein et al., PNAS (2002), our ref. 38, to justify that in E. coli the average mRNA half-life is about 5 min (i.e., degradation rate of 0.14 min-1).

      1. The data presented in Figure 4 are not presented in a consistent way. While it would be somewhat redundant, including the 0 and 1 mM lactose data for red3 in Figure 4a would be helpful for comparison purposes.

      Authors’ response: We have added the requested bar plot in Fig. 4a.

      1. The presence of additional Musashi-1 sites upstream of the start codon in red3, and their impact on impact on the fold-repression may support an inhibition of the translation initiation model rather than an inhibition of elongation.

      Authors’ response: In page 7, we have rewritten a sentence to have: “In this way, MSI-1 can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.” In page 14, we have added the following sentence: “In this way, MSI-1 can also block the RNA component of the 30S ribosomal subunit.”

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      1. Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      2. Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      We thank the reviewer for acknowledging the contributions and strengths of our work! Weaknesses:

      1. Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      The novelty of this work is primarily manifested in four key aspects. Firstly, although we agree with the reviewer that we did employ several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with “less accurate” predictive structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predictive structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a Tools and Resources article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      1. Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets were disparate, it was difficult to compare or rank all these methods fairly using a single test set. As described in the “GPSite outperforms state-of-the-art methods” section, 358 out of 375 proteins in our protein-protein binding site test set share >30% sequence identity with the training sequences of PeSTo. To address this, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation. This is quite common in this field. For instance, in the study of PeSTo [Nat Commun 2023], the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set. Based on the reviewer’s suggestion, in the revised version of the manuscript, we intend to include other comparative methods alongside PeSTo on the new test set or retrain our model directly on PeSTo's training set for comparison, which should enhance the completeness of our results.

      1. Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. In the meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet [Nat Methods 2022] and PeSTo [Nat Commun 2023] directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors. In our revised manuscript, we will provide more details for the binding site definitions to avoid any potential ambiguity.

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we will carry out the corresponding revisions in the next version of the manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet [Nat Methods 2022] and PeSTo [Nat Commun 2023], tend to directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we will improve our manuscript by incorporating additional ablation studies regarding the effects of predicted structures and language model representations. Besides, we will refine the Discussion section to focus more on the achievements of this work and its potential applications including protein engineering. A comprehensive point-by-point response to the reviewer’s recommendations will be provided alongside the revised manuscript. This will ensure that all concerns and suggestions are adequately addressed.

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      1. The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      2. Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      3. GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      4. The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      1. One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      We thank the reviewer for the valuable suggestion. Empirically, it takes about 30 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes less than 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we will include comprehensive runtime comparisons between the GPSite webserver and other existing servers in the revision to illustrate the practicality and efficiency of our method.

      1. Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 2 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets. Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite.

      1. To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions.

      1. To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      We thank the reviewer for the valuable recommendation. We will perform such analysis in the revised manuscript.

      1. The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predictive binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge.

      As for the cross-interaction-type performance raised by the reviewer, we will include such analysis in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines genetically barcoded rabies viruses with spatial transcriptomics in vivo in the mouse brain to decode connectivity of neural circuits. The data generated by the combination of these approaches in this new way is mostly convincing as the authors provide validation and proof-of-concept that the approach can be successful. While this new combination of established techniques has promise for elucidating brain connectivity, there are still some nuances and caveats to the interpretations of the results that are lacking especially with regards to noting unexpected barcodes either due to unexpected/novel connections or unexpected rabies spread.

      In this revised manuscript, we added a new control experiment and additional analyses to address two main questions from the reviewers: (1) How the threshold of glycoprotein transcript counts used to identify source cells was determined, and (2) whether the limited long-range labeling was expected in the trans-synaptic experiment. The new experiments and analyses validated the distribution of source cells and presynaptic cells observed in the original barcoded transsynaptic tracing experiment and validated the choice of the threshold of glycoprotein transcripts. As the reviewers suggested, we also included additional discussion on how future experiments can improve upon this study, including strategies to improve source cell survival and minimizing viral infection caused by leaky expression of TVA. We also provided additional clarification on the analyses for both the retrograde labeling experiment and the trans-synaptic tracing experiment. We modified the Results and Discussion sections on the trans-synaptic tracing experiment to improve clarity to general readers. Detailed changes to address specific comments by reviewers are included below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this preprint, Zhang et al. describe a new tool for mapping the connectivity of mouse neurons. Essentially, the tool leverages the known peculiar infection capabilities of Rabies virus: once injected into a specific site in the brain, this virus has the capability to "walk upstream" the neural circuits, both within cells and across cells: on one hand, the virus can enter from a nerve terminal and infect retrogradely the cell body of the same cell (retrograde transport). On the other hand, the virus can also spread to the presynaptic partners of the initial target cells, via retrograde viral transmission.

      Similarly to previously published approaches with other viruses, the authors engineer a complex library of viral variants, each carrying a unique sequence ('barcode'), so they can uniquely label and distinguish independent infection events and their specific presynaptic connections, and show that it is possible to read these barcodes in-situ, producing spatial connectivity maps. They also show that it is possible to read these barcodes together with endogenous mRNAs, and that this allows spatial mapping of cell types together with anatomical connectivity.

      The main novelty of this work lies in the combined use of rabies virus for retrograde labeling together with barcoding and in-situ readout. Previous studies had used rabies virus for retrograde labeling, albeit with low multiplexing capabilities, so only a handful of circuits could be traced at the same time. Other studies had instead used barcoded viral libraries for connectivity mapping, but mostly focused on the use of different viruses for labeling individual projections (anterograde tracing) and never used a retrograde-infective virus.

      The authors creatively merge these two bits of technology into a powerful genetic tool, and extensively and convincingly validate its performance against known anatomical knowledge. The authors also do a very good job at highlighting and discussing potential points of failure in the methods.

      We thank the reviewer for the enthusiastic comments.

      Unresolved questions, which more broadly affect also other viral-labeling methods, are for example how to deal with uneven tropism (ie. if the virus is unable or inefficient in infecting some specific parts of the brain), or how to prevent the cytotoxicity induced by the high levels of viral replication and expression, which will tend to produce "no source networks", neural circuits whose initial cell can't be identified because it's dead. This last point is particularly relevant for in-situ based approaches: while high expression levels are desirable for the particular barcode detection chemistry the authors chose to use (gap-filling), they are also potentially detrimental for cell survival, and risk producing extensive cell death (which indeed the authors single out as a detectable pitfall in their analysis). This is likely to be one of the major optimisation challenges for future implementations of these types of barcoding approaches.

      As the reviewer suggested, we included additional discussion about tropism and cytotoxicity in the revised Discussion. Our sensitivity for barcode detection is sufficient, since we estimated (based on manual proofreading) that most barcoded neurons had more than ten counts of a barcode in the trans-synaptic tracing experiment. The high sensitivity may potentially allow us to adapt next-generation rabies virus with low replication, such as the third generation ΔL rabies virus (Jin et al, 2022, biorxiv) in future optimizations.

      Overall the paper is well balanced, the data are well presented and the conclusions are strongly supported by the data. Impact-wise, the method is definitely going to be useful for the neurobiology research community.

      We thank the reviewer for her/his enthusiasm.

      Reviewer #2 (Public Review):

      Although the trans-synaptic tracing method mediated by the rabies virus (RV) has been widely utilized to infer input connectivity across the brain to a genetically defined population in mice, the analysis of labeled pre-synaptic neurons in terms of cell-type has been primarily reliant on classical low-throughput histochemical techniques. In this study, the authors made a significant advance toward high-throughput transcriptomic (TC) cell typing by both dissociated single-cell RNAseq and the spatial TC method known as BARseq to decode a vast array of molecularly labeled ("barcoded") RV vector library. First, they demonstrated that a barcoded-RV vector can be employed as a simple retrograde tracer akin to AAVretro. Second, they provided a theoretical classification of neural networks at the single-cell resolution that can be attained through barcoded-RV and concluded that the identification of the vast majority (ideally 100%) of starter cells (the origin of RV-based trans-synaptic tracing) is essential for the inference of single-cell resolution neural connectivity. Taking this into consideration, the authors opted for the BARseq-based spatial TC that could, in principle, capture all the starter cells. Finally, they demonstrated the proof-of-concept in the somatosensory cortex, including infrared connectivity from 381 putative pre-synaptic partners to 31 uniquely barcoded-starter cells, as well as many insightful estimations of input convergence at the cell-type resolution in vivo. While the manuscript encompasses significant technical and theoretical advances, it may be challenging for the general readers of eLife to comprehend. The following comments are offered to enhance the manuscript's clarity and readability.

      We modified the Results and Discussion sections on the trans-synaptic tracing experiment to improve clarity to general readers. We separated out the theoretical discussion about barcode sharing networks as a separate subsection, explicitly stated the rationale of how different barcode sharing networks are distinguished in the in situ trans-synaptic tracing experiment, and added additional discussion on future optimizations. Detailed descriptions are provided below.

      Major points:

      1. I find it difficult to comprehend the rationale behind labeling inhibitory neurons in the VISp through long-distance retrograde labeling from the VISal or Thalamus (Fig. 2F, I and Fig. S3) since long-distance projectors in the cortex are nearly 100% excitatory neurons. It is also unclear why such a large number of inhibitory neurons was labeled at a long distance through RV vector injections into the RSP/SC or VISal (Fig. 3K). Furthermore, a significant number of inhibitory starter cells in the somatosensory cortex was generated based on their projection to the striatum (Fig. 5H), which is unexpected given our current understanding of the cortico-striatum projections.

      The labeling of inhibitory neurons can be explained by several factors in the three different experiments.

      (1) In the scRNAseq-based retrograde labeling experiment (Fig. 2 and Fig. S3), the injection site VISal is adjacent to VISp. Because we dissected VISp for single-cell RNAseq, we may find labeled inhibitory neurons at the VISp border that extend short axons into VISal. We explained this in the revised Results.

      (2) In the in situ sequencing-based retrograde labeling experiment (Fig. 3,4), the proximity between the two injection sites VISal and RSP/SC, and the sequenced areas (which included not only VISp but also RSP) could also contribute to labeling through local axons of inhibitory neurons. Furthermore, because we also sequenced midbrain regions, inhibitory neurons in the superior colliculus could pick up the barcodes through local axons. We included an explanation of this in the revised Results.

      (3) In the trans-synaptic tracing experiment, we speculate that low level leaky expression from the TREtight promoter led to non-Cre-dependent expression in many neurons. To test this hypothesis, we first performed a control injection in which we saw that the fluorescent protein expression were indeed restricted to layer 5, as expected from corticostriatal labeling. Based on the labeling pattern, we estimated that about 12 copies of the glycoprotein transcript per cell would likely be needed to achieve fluorescent protein expression. Since many source cells in our experiment were below this threshold, these results support the hypothesis that the majority of source cells with low level expression of the glycoprotein were likely Cre-independent. Because these cells could still contribute to barcode sharing networks, we could not exclude them as in a conventional bulk trans-synaptic tracing experiment. In future experiments, we can potentially reduce this population by improving the helper AAV viruses used to express TVA and the glycoprotein. We included this explanation in Results and more detailed analysis in Supplementary Note 2, and discussed potential future optimizations in the Discussion. This new analysis in Supplementary Note 2 is also related to the Reviewer’s question regarding the threshold used for determining source cells (see below).

      1. It is unclear as to why the authors did not perform an analysis of the barcodes in Fig. 2. Given that the primary objective of this manuscript is to evaluate the effectiveness of multiplexing barcoded technology in RV vectors, I would strongly recommend that the authors provide a detailed description of the barcode data here, including any technical difficulties or limitations encountered, which will be of great value in the future design of RV-barcode technologies. In case the barcode data are not included in Fig. 2, I would suggest that the authors consider excluding Fig. 2 and Fig. S1-S3 in their entirety from the manuscript to enhance its readability for general readers.

      In the single-cell RNAseq-based retrograde tracing, all barcodes recovered matched to known barcodes in the corresponding library. We included a short description of these results in the revised manuscript.

      1. Regarding the trans-synaptic tracing utilizing a barcoded RV vector in conjunction with BARseq decoding (Fig. 5), which is the core of this manuscript, I have a few specific questions/comments. First, the rationale behind defining cells with only two rolonies counts of rabies glycoprotein (RG) as starter cells is unclear. Why did the authors not analyze the sample based on the colocalization of GFP (from the AAV) and mCherry (from the RV) proteins, which is a conventional method to define starter cells? If this approach is technically difficult, the authors could provide an independent histochemical assessment of the detection stringency of GFP positive cells based on two or more colonies of RG.

      In situ sequencing does not preserve fluorescent protein signals, so we used transcript counts to determine which cells expressed the glycoprotein. We have added new analyses in the Results and in Supplementary Note 2 to determine the transcript counts that were equivalent to cells that had detectable BFP expression. We found that BFP expression is equivalent to ~12 counts of the glycoprotein transcript per cell, which is much higher than the threshold we used. However, we could not solely rely on this estimate to define the source cells, because cells that had lower expression of the glycoprotein (possibly from leaky Cre-independent expression) may still pass the barcodes to presynaptic cells. This can lead to an underestimation of double-labeled and connected-source networks and an overestimation of single-source networks and can obscure synaptic connectivity at the cellular resolution. We thus used a very conservative threshold of two transcripts in the analysis. This conservative threshold will likely overestimate the number of source cells that shared barcodes and underestimate the number of single-source networks. Since this is a first study of barcoded transsynaptic tracing in vivo, we chose to err on the conservative side to make sure that the subsequent analysis has single-cell resolution. Future characterization and optimization may lead to a better threshold to fully utilize data.

      Second, it is difficult to interpret the proportion of the 2,914 barcoded cells that were linked to barcoded starter cells (single-source, double-labeled, or connected-source) and those that remained orphan (no-source or lost-source). A simple table or bar graph representation would be helpful. The abundance of the no-source network (resulting from Cre-independent initial infection of the RV vector) can be estimated in independent negative control experiments that omit either Cre injection or AAV-RG injection. The latter, if combined with BARseq decoding, can provide an experimental prediction of the frequency of double-labeled events since connected-source networks are not labeled in the absence of RG.

      We have added Table 2, which breaks down the 2,914 barcoded cells based on whether they are presynaptic or source cells, and which type of network they belong to. We agree with the reviewer that the additional Cre- or RG- control experiments in parallel would allow an independent estimate of the double labeled networks and the no-source networks. We have included added a discussion of possible controls to further optimize the trans-synaptic tracing approach in future studies in the Discussion.

      Third, I would appreciate more quantitative data on the putative single-source network (Fig. 5I and S6) in terms of the distribution of pre- and post-synaptic TC cell types. The majority of labeling appeared to occur locally, with only two thalamic neurons observed in sample 25311842 (Fig. S6). How many instances of long-distance labeling (for example, > 500 microns away from the injection site) were observed in total? Is this low efficiency of long-distance labeling expected based on the utilized combinations of AAVs and RV vectors? A simple independent RV tracing solely detecting mCherry would be useful for evaluating the labeling efficiency of the method. I have experienced similar "less jump" RV tracing when RV particles were prepared in a single step, as this study did, rather than multiple rounds of amplification in traditional protocols, such as Osakada F et al Nat Protocol 2013.

      We imaged an animal that was injected in parallel to assess labeling (now included in Supplementary Note 2 and Supp. Fig. S5). The labeling pattern in the newly imaged animal was largely consistent with the results from the barcoded experiment: most labeled neurons were seen in the vicinity of the injection site, and sparser labeling was seen in other cortical areas and the thalamus. We further found that most neurons that were labeled in the thalamus were about 1 mm posterior to the center of the injection site, and thus would not have been sequenced in the in situ sequencing experiment (in which we sequenced about 640 µm of tissue spanning the injection site).

      In addition, we found that the bulk of the cells that expressed mCherry from the rabies virus only partially overlapped with the area that contained cells co-expressing BFP with the rabies glycoprotein. Moreover, very few cells co-expressed mCherry and BFP, which would be considered source cells in a conventional mono-synaptic tracing experiment. The small numbers of source cells likely also contributed to the sparseness of long-range labeling in the barcoded experiment.

      These interpretations and comparisons to the barcoded experiment are now included in Supplementary Note 2.

      Reviewer #3 (Public Review):

      The manuscript by Zhang and colleagues attempts to combine genetically barcoded rabies viruses with spatial transcriptomics in order to genetically identify connected pairs. The major shortcoming with the application of a barcoded rabies virus, as reported by 2 groups prior, is that with the high dropout rate inherent in single cell procedures, it is difficult to definitively identify connected pairs. By combining the two methods, they are able to establish a platform for doing that, and provide insight into connectivity, as well as pros and cons of their method, which is well thought out and balanced.

      Overall the manuscript is well-done, but I have a few minor considerations about tone and accuracy of statements, as well as some limitations in how experiments were done. First, the idea of using rabies to obtain broader tropism than AAVs isn't really accurate - each virus has its own set of tropisms, and it isn't clear that rabies is broader (or can be made to be broader).

      As the reviewer suggested, we toned down this claim and stated that rabies virus has different tropism to complement AAV.

      Second, rabies does not label all neurons that project to a target site - it labels some fraction of them.

      We meant to say that retrograde labeling is not restricted to labeling neurons from a certain brain region. We have clarified in the text.

      Third, the high rate of rabies virus mutation should be considered - if it is, or is not a problem in detecting barcodes with high fidelity, this should be noted.

      Our analysis showed that sequencing 15 bases was sufficient to tolerate a small number of mismatches in the barcode sequences and could distinguish real barcodes from random sequences (Fig. 4A). Thus, we can tolerate mutations in the barcode sequence. We have clarified this in the text.

      Fourth, there are a number of implicit assumptions in this manuscript, not all of which are equally backed up by data. For example, it is not clear that all rabies virus transmission is synaptic specific; in fact, quite a few studies argue that it is not (e.g., detection of rabies transcripts in glial cells). Thus, arguments about lost-source networks and the idea that if a cell is lost from the network, that will stop synaptic transmission, is not clear. There is also the very real propensity that, the sicker a starter cell gets, the more non-specific spread of virus (e.g., via necrosis) occurs.

      We agree with the reviewer that how strictly virus transmission is restricted to synapses remains a hotly debated question in the field, and this question is relevant not only to techniques based on barcoded rabies tracing, but to all trans-synaptic tracing experiments. A barcoding-based approach can generate single-cell data that enable direct comparison to other data modalities that measure synaptic connectivity, such as multi-patch and EM. These future experiments may provide additional insights into the questions that the reviewer raised. We have included additional discussion about how non-synaptic transmission of barcodes because of the necrosis of source cells may affect the analysis in the Discussion.

      Regarding the scenario in which the source cell dies, we agree with the reviewer and have clarified in the revised manuscript.

      Fifth, in the experiments performed in Figure 5, the authors used a FLEx-TVA expressed via a retrograde Cre, and followed this by injection of their rabies virus library. The issue here is that there will be many (potentially thousands) of local infection events near the injection site that TVA-mediated but are Cre-dependent (=off-target expression of TVA in the absence of Cre). This is a major confound in interpreting the labeling of these cells. They may express very low levels of TVA, but still have infection be mediated by TVA. The authors did not clearly explore how expression of TVA related to rabies virus infection of cells near the rabies injection site. A modified version of TVA, such as 66T, should have been used to mitigate this issue. Otherwise, it is impossible to determine connectivity locally. The authors do not go to great lengths to interpret the findings of these observations, so I am not sure this is a critical issue, but it should be pointed out by the authors as a caveat to their dataset.

      We agree with the reviewer that this type of infection could potentially be a major contributor to no-source networks, which were abundant in our experiment. Because small no-source networks were excluded from our analyses, and large no-source networks were only included for barcodes with low frequency (i.e., it would be nearly impossible statistically to generate such large no-source networks from independent infections), we believe that the effect of independent infections on our analyses were minimized. We have added a control experiment in Fig S5 and Supplementary Note 2, which further supported the hypothesis that there were many independent infections. We also included additional discussion about how this can be assessed and optimized in future studies in the Discussion.

      Sixth, the authors are making estimates of rabies spread by comparison to a set of experiments that was performed quite differently. In the two studies cited (Liu et al., done the standard way, and Wertz et al., tracing from a single cell), the authors were likely infecting with a rabies virus using a high multiplicity of infection, which likely yields higher rates of viral expression in these starter cells and higher levels of input labeling. However, in these experiments, the authors need to infect with a low MOI, and explicitly exclude cells with >1 barcode. Having only a single virion trigger infection of starter cells will likely reduce the #s of inputs relative to starter neurons. Thus, the stringent criteria for excluding small networks may not be entirely warranted. If the authors wish to only explore larger networks, this caveat should be explicitly noted.

      In the trans-synaptic labeling experiment, we actually used high rabies titer (200 nL, 7.6e10 iu/mL) that was comparable to conventional rabies tracing experiments. We did not exclude cells with multiple barcodes (as opposed to barcodes in multiple source cells), because we could resolve multiple barcodes in the same cell and indeed found many cells with multiple barcodes. We have clarified this in the text.

      Overall, if the caveats above are noted and more nuance is added to some of the interpretation and discussion of results, this would greatly help the manuscript, as readers will be looking to the authors as the authority on how to use this technology.

      In addition to addressing the specific concerns of the reviewer as described above, we modified the Results and Discussion sections on the trans-synaptic tracing experiment to improve clarity to general readers and expanded the discussion on future optimizations.

      Reviewer #1 (Recommendations For The Authors):

      The scientific problem is clearly stated and well laid out, the data are clearly presented, and the experiments well justified and nicely discussed. It was overall a very enjoyable read. The figures are generally nice and clear, however, I find the legends excessively concise. A bit too often, they just sort of introduce the title of the panel rather than a proper explanation of what it is depicted. A clear case is for example visible in Fig 2, where the description of the panels is minimal, but this is a general trend of the manuscript. This makes the figures a bit hard to follow as self-contained entities, without having to continuously go back to the main text. I think this could be improved with longer and more helpful descriptions.

      We have revised all figure legends to make them more descriptive.

      Other minor things:

      In the cDNA synthesis step for in-situ sequencing, I believe the authors might have forgotten one detail: the addition of aminoallyl dUTP to the RT reaction. If I recall correctly this is done in BARseq. The fact that the authors crosslink with BS-PEG on day 2, makes me suspect they spike in these nucleotides during the RT but this is not specified in the relevant step. Perhaps this is a mistake that needs correction.

      The RT primers we used have an amine group at 5’, which directly allows crosslinking. Thus, we did not need to spike in aminoallyl dUTP in the RT reaction. We have clarified this in the Methods.

      Reviewer #2 (Recommendations For The Authors):

      Throughout the manuscript, there are frequent references to the "Methods" section for important details. However, it can be challenging to determine which specific section of the Methods the authors are referring to, and in some cases, a thorough examination of the entire Methods section fails to locate the exact information needed to support the authors' claims. Below are a few specific examples of this issue. The authors are encouraged to be more precise in their references to the Methods section.

      In the revised manuscript, we numbered each subsection of Methods and updated pointers and associated hyperlinks in the main text to the subsection numbers.

      • On page 7, line 14, it is unclear how the authors compared the cell marker gene expression with the marker gene expression in the reference cell type.

      We have clarified in the revised manuscript.

      • On page 7, line 33, the authors note that some barcodes may have been missed during the sequencing of the rabies virus libraries, but the Methods section lacked a convincing explanation on this issue (see my point 2 above).

      We included a separate subsection on the sequencing of rabies libraries and the analysis of the sequencing depth in the Methods. In this new subsection, we further clarified our reasoning for identifying the lack of sequencing depth as a reason for missing barcodes, especially in comparison to sequencing depth required for establishing exact molecule counts used in established MAPseq and BARseq techniques with Sindbis libraries.

      • On page 9, line 44, the authors state that they considered a barcode to be associated with a cell if they found at least six molecules of that barcode in a cell, as detailed in the Methods section. However, the rationale behind this level of stringency is not provided in the Methods.

      We initially chose this threshold based on visual inspection of the sequencing images of the barcoded cells. Because the labeled cell types were consistent with our expectations (Fig. 4E-G), we did not further optimize the threshold for detecting retrogradely labeled barcoded cells.

      • I have noticed that some important explanations of figure panels are missing in the legends, making it challenging to understand the figures. Below are typical examples of this issue.

      In addition to the examples that the reviewer mentioned below, we also revised many other figure panels to make them clear to the readers.

      • In Fig. 2, "RV into SC" in panel C does not make sense, as RV was injected into the thalamus. There is no explanation of the images in this panel C.

      We have corrected the typo in the revision.

      • In Fig. 3, information on the endogenous gene panel for cell type classification (Table S3) could be mentioned in the legend or corresponding text.

      We now cite Table S3 both in Fig 3 legend and in the main text. We also included a list of the 104 cell type marker genes we used in Table S3.

      • In panel J, it is unclear why the total number of BC cells is 2,752, and not 4,130 as mentioned in the text.

      This is a typo. We have corrected this in the revision. The correct number (3,746) refers to the number of cells that did not belong to either of the two categories at the bottom of the panel, and not the total number of neurons. To make this clear, we now also include the total number of barcoded cells at the top of the panel.

      • In Fig. 4, the definitions of "+" and "−" symbols in panels K and L are unclear. Also, it seems that the second left column of panel K should read "T −."

      We corrected the typo in K, further clarified the “Area” labels, and changed the “S” label in 4K to “−”. This change does not change the original meaning of the figure: when considering the variance explained in L4/5 IT neurons, considering the subclass compositional profile is equivalent to not using the compositional profiles of cell types, because L4/5 IT neurons all belong to the same subclass (L4/5 IT subclass). Although operationally we simply considered subclass-level compositional profiles when calculating the variance explained, we think that changing this to “−” is clearer for the readers.

      • In Fig. 5, panel E is uninterpretable.

      We revised the main text and the figure to clarify how we manually proofread cells to determine the QC thresholds for barcoded cells. These plots showed a summary of the proofreading. We also revised the figures to indicate that they showed the fraction of barcoded cells that were considered real after proofreading. In the revised version, we moved these plots to Fig. S5.

      • In Fig. S1, I do not understand the identity of the six samples on the X-axis of panel A (given that only two animals were described in the main text) and what panel B shows, including the definition of map_cluster_conf and map_cluster_corr.

      In the revised Fig. S1, we made it more explicit that the six animals include both animals used for retrograde tracing (2 animals) and those used for trans-synaptic tracing (4 animals). We updated the y axis labels to be more readable and cited the relevant Methods section for definitions.

      • In Fig. S2, please provide the definitions of blue and red dots and values in panel A, as well as the color codes and size of the circles in panel B. My overall impression from panel B is that there is no significant difference between RV-infected and non-infected cells. The authors should provide more quantitative and statistical support for the claim that "RV-infected cells had higher expression of immune response-related genes."

      We toned down the statement to “Consistent with previous studies […], some immune response related genes were up-regulated in virus-infected cells compared to non-infected cells.” Because the main point of the single-cell RNAseq analysis was that rabies did not affect the ability to distinguish transcriptomic types, the change in immune response-related genes was not essential to the main conclusions. We clarified the red and blue dots in panel A and changed panel B to show the top up-regulated immune response-related genes in the revised manuscript.

      • In Fig. S3, the definitions of the color code and circle size are missing.

      We have added the legends in Fig. S3.

    1. Author Response

      We thank the reviewers for their detailed and constructive criticisms of our work. They raise many important questions (such as the issue of defining context) that we have also been thinking about extensively and they provide new and insightful avenues that have the potential to meaningfully improve the manuscript. We also appreciate that they commented on the novelty and importance of this work. Going forward, we will address the methodological concerns raised as best as we can and thereby hope to make the evidence for our conclusion more compelling

    1. Author Response

      eLife assessment

      This study provides direct evidence showing that Kv1.8 channels underly several potassium currents in the two types of sensory hair cells found in the mouse vestibular system. This is an important finding because the nature of the channels underpinning the unusual potassium conductance gK,L in type I hair cells has been under scrutiny for many years. Although most of the experimental evidence is compelling and the analysis is rigorous, the evidence supporting some of the claims related to Kv1.4 channels is incomplete. The study will be of interest to cell and molecular biologists and auditory neuroscientists.

      We are thankful to the editor and reviewers for their thorough assessment of our work and insightful feedback. Our responses to the comments and suggestions are below.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors provide a thorough demonstration of the role that one particular type of voltage-gated potassium channel, Kv1.8, plays in a low voltage-activated conductance found in type I vestibular hair cells. Along the way, they find that this same channel protein appears to function in type II vestibular hair cells as well, contributing to other macroscopic conductances. Overall, Kv1.8 may provide especially low input resistance and short time constants to facilitate encoding of more rapid head movements in animals that have necks. Combination with other channel proteins, in different ratios, may contribute to the diversified excitability of vestibular hair cells.

      Strengths:

      The experiments are comprehensive and clearly described, both in the text and in the figures. Statistical analyses are provided throughout.

      Weaknesses:

      None.

      Reviewer #2 (Public Review):

      The focus of this manuscript was to investigate whether Kv1.8 channels, which have previously been suggested to be expressed in type I hair cells of the mammalian vestibular system, are responsible for the potassium conductance gK,L. This is an important study because gK,L is known to be crucial for the function of type I hair cells, but the channel identity has been a matter of debate for the past 20 years. The authors have addressed this research topic by primarily investigating the electrophysiological properties of the vestibular hair cells from Kv1.8 knockout mice. Interestingly, gK,L was completely abolished in Kv1.8-deficient mice, in agreement with the hypothesis put forward by the authors based on the literature. The surprising observation was that in the absence of Kv1.8 potassium channels, the outward potassium current in type II hair cells was also largely reduced. Type II hair cells express the largely inactivating potassium conductance gK,A, but not gK,L. The authors concluded that heteromultimerization of non-inactivating Kv1.8 and the inactivating Kv1.4 subunits could be responsible for the inactivating gK,A. Overall, the manuscript is very well written and most of the conclusions are supported by the experimental work. The figures are well described, and the statistical analysis is robust.

      My only comment relates to the statement regarding the results providing "evidence" that Kv1.4 form heteromultimers with Kv1.8 channels (see Discussion). The only data I can see from the results is that Kv1.4 channels are expressed in the membrane of type II hair cells, which is not sufficient evidence for the above claim. Is the distribution of Kv1.8 and Kv1.4 overlapping in type II hair cells? Have the authors attempted to perform some pharmacological studies on Kv1.4? For example, would gK,A be completely blocked by a Kv1.4 antagonist? Addressing at least some of these questions would strengthen your argument.

      Author response: With respect to the “evidence” for heteromultimerization of Kv1.4 and Kv1.8: We agree that there is not conclusive evidence but have pulled together reasons to suggest that the fast inactivation of Kv1.8-dependent gA in type II hair cells reflects a contribution from Kv1.4 subunits. The reasons we note are mostly from other sources: 1) Kv1.4 subunits are the only Kv1 alpha subunits known to make channels with intrinsic rapid inactivation (Bertoli et al., 1994); 2) Kv1.4 is highly expressed in type II hair cells, but not type I hair cells, in mouse utricle (McInturff et al., Biol. Open., 2018; Jan et al., Cell Reports, 2021; Orvis et al., Nat. Methods, 2021); 3) previous work from M. Correia and colleagues suggested Kv1.4 as the likely source of A-current in pigeon vestibular hair cells; 4) some rat type II hair cells show comparatively strong Kv1.4-like immunoreactivity (our Fig. 5). While we consider heteromultimerization of Kv1.4 and Kv1.8 alpha subunits a plausible explanation consistent with available data from different sources, we agree that the question is not at all settled, and indeed raise the possibility that KV beta subunits, which are also differentially expressed by type I and II hair cells, play a role. Experiments to definitively advance or refute this hypothesis are beyond the scope of this paper.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Martin et al. describes the contribution of a Kv channel subunit (Kv1.8, KCNA10) to voltage-dependent K+ conductances and membrane properties of type I and type II hair cells of the mouse utricle. Previous work has documented striking differences in K+ conductances between vestibular hair cell types. In particular, amniote type I hair cells are known to express a non-typical low-voltage-activated K+ conductance (GK,L) whose molecular identity has been elusive. K+ conductances in hair cells from 3 different mouse genotypes (wildtype, Kv1.8 homozygous knockouts, and heterozygotes) are examined here and whole-cell patch-clamp recordings indicate a prominent role for Kv1.8 subunits in generating GK,L. Results also interestingly support a role for Kv1.8 subunits in type II hair cell K+ conductances; inactivating conductances in null mice are reduced in type II hair cells from striola and extrastriola regions of the utricle. Kv1.8 is therefore proposed to contribute as a pore-forming subunit for 3 different K+ conductances in vestibular hair cells. The impact of these conductances on membrane responses to current steps is studied in the current clamp. Pharmacological experiments use XE991 to block some residual Kv7-mediated current in both hair cell types, but no other pharmacological blockers are used. In addition, immunostaining data are presented and raise some questions about Kv7 and Kv1.8 channel localization. Overall, the data present compelling evidence that the removal of Kv1.8 produces profound changes in hair cell membrane conductances and sensory capabilities. These changes at hair cell level suggest vestibular function would be compromised and further assessment in terms of balance behavior in the different mice would be interesting.

      Strengths:

      This study provides strong evidence that Kv1.8 subunits are major contributors to the unusual K+ conductance in type I hair cells of the utricle. It also indicates that Kv1.8 subunits are important for type II hair cell K+ conductances because Kv1.8-/- mice lacked an inactivating A conductance and had reduced delayed rectifier conductance compared to controls. A comprehensive and careful analysis of biophysical profiles is presented of expressed K+ conductances in 3 different mouse genotypes. Voltage-dependent K+ currents are rigorously characterized at a range of different ages and their impact on membrane voltage responses to current input is studied. Some pharmacological experiments are performed in addition to immunostaining to bolster the conclusions from the biophysical studies. The paper has a significant impact in showing the role of Kv1.8 in determining utricular hair cell electrophysiological phenotypes.

      Weaknesses:

      1. From previous work it is known that GK,L in type I hair cells has unusual ion permeation and pharmacological properties that differ greatly from type II hair cell conductances. Notably GK,L is highly permeable to Cs+ as well as K+ ions and is slightly permeable to Na+. It is blocked by 4-aminopyridine and divalent cations (Ba2+, Ca2+, Ni2+), enhanced by external K+, and modulated by cyclic GMP. The question arises, if Kv1.8 is a major player and pore-forming subunit in type I and type II cells (and cochlear inner hair cells as shown by Dierich et al. 2020) how are subunits modified to produce channels with very different properties? A role for Kv1.4 channels (gA) is proposed in type II hair cells based on previous findings in bird hair cells and immunostaining for Kv1.4 channels in rat utricle presented here in Fig. 6. However, hair cell-specific partner interactions with Kv1.8 that result in GK,L in type I hair cells and Cs+ impermeable, inactivating currents in type II hair cells remain for the most part unexplored.

      Author response: Our results raise the question of how Kv1.8/Kcna10 is regulated to produce gK,L in type I hair cells, which has different properties from the Kv1.8 conductance expressed heterologously (Lang et al., Am. J. Physiol. Renal Physiol., 2000; Ranjan et al., Front. Cell. Neurosci., 2019; Dierich et al., Cell Reports, 2020) and the Kv1.8 conductance inferred in inner hair cells (Dierich et al., 2020). We lay out several possibilities in the Discussion, but testing these suggestions is beyond the scope of the present paper.

      The relatively high Cs+ permeability of gK,L (0.31 pCs/pK, Rüsch & Eatock, J. Neurophysiol., 1996; Rennie & Correia, J. Membr. Biol., 2000) suggests there is something different about the selectivity filter and pore region of gK,L relative to most Kv1 family members. Although the intrinsic Cs+ permeability of heterologously expressed Kv1.8 is not reported. While we note that the pore region in Kv1.8 differs from other Kv1 subunits by a single amino acid (a glycine instead of alanine at position 411 – placed by AlphaFold in the pore helix of hKCNA10, Jumper et al., Nature, 2021), the effect of this difference is not known. A separate study is needed to determine why gK,L has a high Cs+ permeability relative to other Kv channels.

      For type II hair cells, the Cs+ permeability of Kv currents has not been fully characterized. Internal Cs+ does appear to reduce outward current more effectively in type II hair cells (Lang & Correia, J. Neurophysiol., 1989; Sokolowski et al., Dev. Biol., 1993) than in type I hair cells (Rüsch & Eatock, J. Neurophysiol., 1996; Rennie & Correia, J. Membr. Biol., 2000).

      With respect to cochlear inner hair cells, note that the assignment of Kv1.8 by Dierich et al. (2021) to a delayed rectifier in cochlear inner hair cells (IHCs) was based on inference – that is, existing inner ear expression databases show that Kv1.8 is expressed in IHCs, and heterologous Kv1.8 channels have a current resembling that observed in IHCs after block of multiple other K channels. We agree with Dierich et al. that Kv1.8 is an attractive candidate for the residual conductance in cochlear IHCs based on comparison with its properties in heterologous expression data. Together their study and our study suggest that Kv1.8 takes on quite different voltage dependence depending on the hair cell environment, and it will be an interesting challenge to sort out the reasons.

      1. Data from patch-clamp and immunocytochemistry experiments are not in close alignment. XE991 (Kv7 channel blocker) decreases remaining K+ conductance in type I and type II hair cells from null mice supporting the presence of Kv7 channels in hair cells (Fig. 7). Also, Holt et al. (2007) previously showed inhibition of GK,L in type I hair cells (but not delayed rectifier conductance in type II hair cells) using a dominant negative construct of Kv7.4 channels. However, immunolabelling indicates Kv7.4 channels on the inner face of calyx terminals adjacent to hair cells (Fig. 5). Some reconciliation of these findings is needed.

      Author response: Our pharmacology with XE991 suggests a small but significant population of Kv7 channels in type I and II hair cells (Fig 7). With the immunogold technique, Kharkovets et al. (PNAS, 2000) and Hurley et al. (J. Neurosci., 2006) counted significant Kv7.4 particles in type I hair cells, although the particles occurred at much greater density in the postsynaptic calyx membrane facing the hair cell. These results lead us to propose that the Kv7 channel we identified pharmacologically includes the Kv7.4 subunit, possibly in combination with other Kv7 subunits (Lysakowski et al., J. Neurosci., 2011). By this argument, the absence of clear hair cell staining in the confocal images of Fig. 5A is likely to reflect differences in methods, which include the use of different mouse strains, different sensitivities of immunogold vs. confocal imaging, and different antibodies.

      Holt et al. (J. Neurosci., 2007) indeed saw inhibition of gK,L in hair cells grown in organotypic cultures of the neonatal mouse utricle after viral expression of a dominant negative Kv7.4 construct. However, other studies show that Kv7 antagonists do not block gK,L (Hurley et al., J. Neurosci., 2006), and the Jentsch group, which first proposed Kv7.4 as a likely candidate for gK,L (Kharkovets et al., PNAS, 2000), ultimately showed that knocking out Kv7.4 and Kv7.5 expression failed to eliminate gK,L (Spitzmaul et al., J. Biol. Chem., 2013). Together, these results suggest that in Holt et al. (2007), the inhibition of gK,L by transfection with the dominant negative KCNQ4 construct may have occurred through unintended interactions with native gK,L channels. The young age of the neonatal cultured and transfected utricles raises the possibility of a developmental effect – that functional Kv7 channels are needed for the developmental transition to a Kv1.8 conductance. Consistent with this idea is the observation that Kv7 current is present in neonatal hair cells, where it is a relatively large proportion of Kv current in type I HCs before they acquire gK,L (Hurley et al., J. Neurosci., 2006). Alternatively, the overexpression of nonfunctional Kv7.4 channels in virally-transfected hair cells may have inhibited or delayed gK,L acquisition through a more general effect on membrane proteins.

      1. Strong immunosignal appears in the cuticle plates of hair cells in addition to signal in basal regions of hair cells and supporting cells. Please provide a possible explanation for this.

      Author response: There is significant non-specific staining of apical cell surfaces and supporting cell membranes in addition to specific staining of hair cell basolateral membranes. We infer non-specific staining when immunolabeling is present in the knockout tissue, as it is for the apical surfaces and supporting cell membranes—compare Fig. 5B.3 (control tissue) with Fig. 5B.4 (Kv1.8 null mutant). Non-specific immunostaining can occur with polyclonal antibodies (specific to several epitopes) if the antibodies are not affinity-purified, but we used an affinity-purified antibody. The apical surfaces are reputed to be “sticky” (susceptible to non-specific staining) but the non-specific labeling in the basal parts of supporting cells is more puzzling. One possibility is that the Kv1.8 antibody weakly recognized closely related Kv1.1 channels, which are more strongly expressed in supporting cells than hair cells (Scheffer et al., J. Neurosci., 2015).

      1. A previous paper reported that a vestibular evoked potential was abnormal in Kv1.8-/- mice (Lee et al. 2013) as briefly mentioned (lines 94-95). It would be very interesting to know if any vestibular-associated behaviors and/or hearing loss were observed in the mice populations. If responses are compromised at the sensory hair cell level across different zones, degradation of balance function would be anticipated and should be elucidated.

      Author response: We agree; some of these questions are the subject of another paper in preparation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for overseeing the assessment of our manuscript, “Comprehensive mutagenesis maps the effect of all single codon mutations in the AAV2 rep gene on AAV production". We would also like to thank the reviewers for their feedback. We have carried out the suggested experiments that we feel are most central to our conclusions and summarized the revisions to the manuscript below.

      We appreciate the reviewers’ suggestion with regards to testing different rAAV genomes. We have measured the effect of Rep variants on the production of rAAV containing three additional genomes: a 4.4 kb single-stranded genome, a 3.9 kb single-stranded genome, and a 2.1 kb self-complementary genome (Figures 5C and 5D). The DNase-resistant particles titers - reported as a percent of wild-type Rep titers - are relatively consistent across these three constructs as well as the 5.0 kb single-stranded genome previously tested.

      We agree with the reviewers that measurement of the relative transduction efficiency of rAAV produced with different Rep variants is an important experiment to conduct. To address this, we transduced HEK293T cells with rAAVs, containing a luciferase genome, which were produced using two different Rep variants. When a constant volume of purified rAAV was used for transduction, we observed that the rAAV produced with the S110R Rep variant resulted in higher transduction than rAAV produced with wild-type Rep (as measured by luciferase signal). While we tested only a small number of variants, these results indicate that at least one of the Rep variants we identified can increase not only the viral genome titer but also the titer of transducing particles.

      To generate this transduction data, we produced additional rAAV preps using S110R and Q439T Rep variants. In the previous version of this manuscript, we used the Q439T variant to produce rAAV and noted a 10% increase in the ratio of viral genomes: capsids as determined by comparison of qPCR and capsid ELISA titers. However, a similar increase was not observed in the more recent experiment discussed above. We attribute this discrepancy to changes in the plasmid quantification methods used for transfection. Previously, we quantified plasmids using a fluorometric assay (Qubit); in our more recent experiments, we used qPCR to quantify plasmids for transfection. qPCR provides a more accurate measurement of plasmid concentration due to the specific nature of the primers and probes used, which may account for the subtle shift in quantification. While outside the scope of the current work, it will also be interesting to further investigate the proportion of full capsids using additional Rep variants and more direct methods, such as cryoEM or analytical ultracentrifugation.

      We agree with the reviewers’ observation that there are differences in the production fitness values for synonymous variants. However, the variation in production fitness values between synonymous variants is smaller than that between non-synonymous variants. We conducted the following analysis to clarify this point. We calculated two mean centered fitness values for each codon variant in the WT AAV2 library. The “positional mean centered fitness value” was determined using the production fitness values of all variants at a given amino acid position and describes how far a given fitness value diverges from the mean fitness value for that position. The “synonymous codon mean centered fitness value” was determined using the production fitness values of all synonymous variants at a given position and describes how far a given fitness value diverges from the mean fitness value for all its synonymous codon variants. We then plotted both mean centered fitness values versus amino acid position (Figure S8).

      The distribution of mean centered selection values is narrower when calculated at the synonymous codon level as opposed to the position level. This indicates that, in general, synonymous variants have more tightly distributed production fitness values than non-synonymous variants. This observation precludes us from conducting a more thorough analysis of the effects of synonymous codons on AAV production. (Although, there is at least one instance where clear differences between synonymous codons can be observed (Figure S9C and Figure S9D).) We agree with the reviewers that synonymous variants almost certainly influence aspects of AAV production, such as genome replication, transcriptional regulation, mRNA stability, and protein expression. However, our assay measures the aggregate effect of rep variants on all steps in the AAV production process and is likely unable to detect the effects of synonymous variants on specific steps in this process if those steps are not rate-limiting. We have updated the discussion section to include an explanation of the above.

      The X-axes in Figures 5B and 5D have been updated to plot s’ instead of percent WT titer. We have also added asterisks to indicate significance in Figures 5A and 5C. Thank you for these suggestions.

      We agree with Reviewer 3 that it would be interesting to sequence barcodes from the mRNA pool. The 20 bp barcodes are located upstream of the polyA site and should be present in mRNA transcripts. Something to consider is that AAV2 transcripts expressed from all three promoters (p5, p19, and p40) are polyadenylated at the same site (Stutika et al., 2016). As such, in our WT AAV2 library, barcode representation in the mRNA pool would indicate the aggregate effect of a rep variant on the levels of all AAV2 transcripts. In the pCMV-Rep78/68 library, only two AAV2 transcripts are generated - a spliced and unspliced version of the p5 product. Sequencing of barcodes present in the mRNA pool could be informative regarding the effect of rep variants on combined Rep78/68 expression levels. However, we feel that this experiment is outside the scope of the current work.

      We were also surprised at the number of novel functional Rep variants that were identified in our library. As the reviewer pointed out, optimal rAAV production likely does not equate to optimal fitness of naturally occurring AAV in the endogenous host. Naturally occurring AAV has both a latent and a lytic cycle and the Rep proteins play a role in both these processes (Pereira et al., 1997; Surosky et al., 1997). rAAV production, however, is primarily analogous to the lytic cycle of naturally occurring AAV. In their endogenous hosts, AAV must balance the effect of any mutations on fitness in both the lytic and latent contexts while we assay specifically for production fitness. We additionally attribute this finding to the relatively small number of AAV serotypes, for which rep sequences are available. We have added a discussion of the above to the manuscript.

      Finally, in response to feedback from other researchers, we determined which amino acid substitutions resulted in production fitness values that were significantly different from that of wild-type (Figure S4). These results further emphasized the importance of the origin-binding domain; most statistically significant beneficial substitutions clustered here. Additionally, we noted that the majority of substitutions in the zinc-finger domain resulted in production fitness changes that were not significant. This lines up with previous work indicating that the zinc-finger domain is dispensable for rAAV production. We have added a discussion of these results to the main text.

      We again thank the reviewers for their suggestions; we feel that incorporation of their suggestions has strengthened support for our conclusions and enhanced the utility of this work for others in the field.

      References Pereira, D. J., McCarty, D. M., & Muzyczka, N. (1997). The adeno-associated virus (AAV) Rep protein acts as both a repressor and an activator to regulate AAV transcription during a productive infection. Journal of Virology, 71(2), 1079–1088. https://doi.org/10.1128/jvi.71.2.1079-1088.1997

      Stutika, C., Gogol-Döring, A., Botschen, L., Mietzsch, M., Weger, S., Feldkamp, M., Chen, W., & Heilbronn, R. (2016). A Comprehensive RNA Sequencing Analysis of the Adeno-Associated Virus (AAV) Type 2 Transcriptome Reveals Novel AAV Transcripts, Splice Variants, and Derived Proteins. Journal of Virology, 90(3), 1278–1289. https://doi.org/10.1128/JVI.02750-15

      Surosky, R. T., Urabe, M., Godwin, S. G., McQuiston, S. A., Kurtzman, G. J., Ozawa, K., & Natsoulis, G. (1997). Adeno-associated virus Rep proteins target DNA sequences to a unique locus in the human genome. Journal of Virology, 71(10), 7951–7959. https://doi.org/10.1128/jvi.71.10.7951-7959.1997

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors were trying to investigate whether viral IBs are involved in antagonizing IFN-I production during EBOV trVLPs infection. They found that IRF3 is hijacked and sequestered into EBOV IBs after viral infection, thereby leading to the spatial isolation of IRF3 with TBK1 and IKKε. In such a progress, the activity of IRF3 is suppressed and downstream IFN-I induction is inhibited. The authors designed many experiments, such as the PLA that examined the colocalization, to support their conclusions. However, necessary negative controls were missed in several assays. More key index is needed to be examined in several assays.

      The paper is well organized and most data in this paper could support the conclusions, while there are several issues that need to be further solved.

      1. In Figure 2-4, authors should examine the expression of downstream IFNs as well as the phosphorylation and nuclear localization of IRF3 to further prove the suppression of IRF3 activity by infecting with trVLPs.

      Response: The inhibitory effect of trVLPs infection on the phosphorylation of IRF3 S396 and SeV-induced IRF3 nuclear localization was determined by immunoprecipitation (Figure 3D) and immunofluorescence (Figure 4A and 4B), respectively. In addition, we demonstrated that IFN-β transcription was inhibited more potently by EBOV viral inclusion bodies compared with VP35 alone (Figure 7B and 7C).

      Moreover, EBOV viral inclusion bodies were demonstrated to inhibit the transcription of IFN downstream genes (e.g., CXCL10, ISG15 and ISG56) more potently than VP35 alone (new Figure 7D-F).

      1. In Figure 5, to better prove the conclusion that EBOV NP and VP35 play an important role in sequestering IRF3 in IBS, authors should add the "NP+VP35+VP30" and "NP+VP35+VP24" groups to reperform the assay.

      Response: According to the reviewer’s suggestion, VP24 or VP30 was added to the “VP35+NP” group, and the results showed that the “NP+VP35+VP24” and “NP+VP35+VP30” groups exhibited little, if any, effect on the distribution of IRF3 compared with the “NP+VP35” group (new Figure 5 - figure supplement 2A-B).

      1. In Figure 6f, the expression of STING should be examined by immunostaining to show the knockdown efficiency in trVLPs-infected cells.

      Response: As suggested by the reviewer, immunostaining was performed to visually detect the effect of STING knockdown on the IRF3 distribution during trVLPs infection (new Figure 6F).

      Reviewer #2 (Public Review):

      The manuscript by Zhu et al explored molecular mechanisms by which Ebola virus (EBOV) evades host innate immune response. EBOV has a number of means to shut down the type I interferon induction (by viral VP35 protein) and block type I interferon action (by viral VP24 protein). This study reported a new mechanism that inclusion body (IB) used for viral replication sequesters IRF3, a key transcription factor involved in the interferon signaling, resulting in blockade of downstream type I interferon gene transcription. This finding is potentially interesting and may provide a new insight into EBOV's evasion of innate immunity. However, there are some flaws in the experimentations and analyses that need to be addressed.

      1. Most of experiments were performed by transfection of trVLP plasmids, which is very different from virus infection. The conclusions should be examined and verified in the context of virus infection.

      Response: As suggested by the reviewer, the effects of IRF3 depletion on live Ebola virus replication were examined as described in the revised manuscript. Consistent with the results obtained after trVLPs infection, IRF3 depletion exerted little, if any, effect on viral replication (new Figure 7H), which supports the notion that, upon EBOV infection and the formation of inclusion bodies, IRF3 has little, if any, transcription activation activity after sequestration by inclusion bodies.

      1. Fig 1 - VP35 displayed a classical IB staining only in Panel A, while much less so in Panel C and not in panel B. It seemed that the VP35 staining images were chosen in a way towards the authors' favor. The statistical analysis of co-localization of VP35 and IRF3, TBK1 or IKKe should be performed to draw the conclusion. Another concern is that IKKe is normally lowly expressed under a rest condition and becomes induced only when the interferon signaling is activated. It seemed to be expressed at a high level even when the interferon signaling is blocked in Panel C. The authors should comment on this discrepancy.

      Response: Ebola virus inclusion bodies show variations in both shape and size. According to the reviewer’s suggestion, the colocalization of TBK1 or IKKε and VP35 is shown in new figures (new Figure 1C and 1E), and quantitatively analyzed by the fluorescence intensity using ImageJ software (new Figure 1B, 1D and 1F).

      1. Fig 2 - Was this experiment done by transfection or infection? The description of result is not consistent with the figure legend. The labeling was also not consistent between panel A and B. I would suggest performing Western blot to analyze the expression level of IRF3.

      Response: We apologize for the incorrect description of the data. Ebola virus trVLPs were initially produced based on transfection but also involved the viral infection process. The use of “transfection” in the figure and figure legends has been changed to “infection” in the revised manuscript. As suggested by the reviewer, Western blotting was performed to analyze the IRF3 expression levels at different time points after trVLPs infection (new Figure 2D).

      1. Fig 3 and 4 - As VP35 is well known for its highly efficient blockade of type I interferon activation, how would the authors differentiate the effect of VP35 alone from the sequestration of IRF3 in IBs in these experiments?

      Response: Previous studies have found that VP35, rather than NP, inhibits the expression of interferon, and the “VP35+NP” treatment, which induces IRF3 sequestration, showed inhibited IFN-β luciferase activity much more potently than VP35 expression alone (Figure 7B).

      1. Fig 3 - PolyIC can activate both RLR and TLR signaling pathways. Can the author comment on which pathway it activates in this experiment?

      Response: In this study, the effect of poly(I:C) was consistent with the results observed with SeV, which indicated that poly(I:C) may mainly activate the RLR signaling pathway. A discussion was added to the revised manuscript.

      1. The authors demonstrated that VP35 interacts with STING and recruit the latter to IBs. How would this affect the function of STING given that STING plays essential roles in cGAS/cGAMP pathway?

      Response: This study unexpectedly showed that VP35 can recruit IRF3 into viral inclusion bodies through STING, but whether it regulates the cGAS-STING pathway remains to be further investigated. Related discussion was added to the revised manuscript.

      1. It is difficult to follow the logics of Fig 7. The expression level of each viral protein should be determined. Ideally, a mutation in VP35 that disrupts its ability to antagonize the interferon signaling but still allows for the IB formation can be used to assess the relative contribution of IB sequestering IRF3.

      Response: As suggested by the reviewer, a series of VP35 mutants were constructed, but we failed to obtain a VP35 mutant that contains a mutation that disrupts the ability of the protein to antagonize interferon signaling but still allows IB formation. Instead, coexpression of “NP+VP35+VP30+L”, which induces IBs formation, inhibited IFN-I more potently than the expression of VP35 alone (Figure 7B). IRF3 knockout inhibited poly(I:C)-induced IFN-I production but had little, if any, effect on poly(I:C)-induced IFN-I production in the “NP+VP35+VP30+L” group (Figure 7C). IRF3 knockout in the cells did not significantly affect viral replication, but overexpression of activated IRF3 (IRF3/5D), instead of wild-type IRF3, inhibited viral replication (new Figure 7G-H). These results collectively suggested that almost all IRF3 in cells was hijacked and sequestered into IBs in the Ebola virus-infected cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      RESPONSE TO REVIEWERS:

      Reviewer #1 (Recommendations For The Authors):

      I think the manuscript of this excellent work can be improved, especially in writing (including a suggestion in the title) and presentation (Figure 6); Also some additional specific experiments and analyses could be important, as I suggest below,

      1. For the title, perhaps a shorter "The acetylase activity of Cdu1 protects Chlamydia effectors from degradation" would be better to convey the major significance of this work. Of course, Cdu1 must regulate the function of InaC, IpaM and CTL0480. But perhaps it is speculative to think that egress is the major function of these effectors as their activity on other host cell processes during the cycle could eventually impact the extrusion process indirectly.

      Although we concur with the insights provided by reviewer 1, we wish to underscore that a significant breakthrough presented in our study revolves around the regulation of Chlamydia exit by Cdu1. Consequently, we believe that this noteworthy discovery should be incorporated into the title.

      1. For the writing:

      a. The description of ubiquitination and DUBs could be synthesized to the essential, so that space is gained to explain things that then come a bit out of the blue in the results (what are Incs, the specific functions of InaC, IpaM, and CTL0480 - at least place the citations in lines 110-112 next to the corresponding Incs -, Cdu2, etc - see specifics below)

      In lines 182-196 of the revised manuscript, we have incorporated additional contextual information concerning the roles of Incs, along with descriptions of the functions of InaC, IpaM, and CTL0480.

      b. In the Results, there is a lot of Chlamydia- and maybe lab-specific jargon that could be significantly simplified for the more general reader. I detail some suggestions below in the specific issues.

      We have improved the readability of our manuscript for a general audience by removing Chlamydia-specific terminology from the entire text and figures.

      1. For the figures:

      a. Figure 6, this figure could be reorganized: why two graphs in panel D? If detailed quantifications were done, perhaps in panel B just zoom on the examples of Golgi distributed/compacted? And again the labelling Rif-R L2, L2 pBOMB, M407 p2TK2, etc, simplify?

      Figure 6 has undergone restructuring. The representative images have been relocated to Supplemental Figures 5 and 6, while we have introduced sample images demonstrating F-actin assembly and Golgi repositioning. Furthermore, the quantification of Golgi dispersal has been streamlined into a single panel. Additionally, we have simplified the labeling of the strains utilized in the study.

      b. Figure 3, in the labelling, WT, inaC null, cdu1::GII wouldn't be enough? Leave the details to the legend and/or M&M.

      We have simplified the labeling of Ct strains in Figure 3.

      c. Figure 3C, these arrowheads should not be so symmetric (small arrows instead?) and it is unclear that the indicated cells do not show CTL0480.

      We have substituted arrowheads with small arrow symbols and have also revised the Figure to incorporate a new representative image that prominently illustrates the absence of CTL0480 at the inclusion membrane of some cdu1::GII inclusions within infected Hela cells at 36 hpi.

      1. Experiments:

      a. In Figure 7, at least extrusion should be analysed also with the Cdu1-deficient strain expressing Ac-deficient Cdu1 and the inaC and ipaM phenotypes should be complemented.

      We have conducted additional experiments to analyze extrusion production in Hela cells infected with a cdu1 null strain expressing the acetylase-deficient Cdu1 variant. We have incorporated the relevant data into revised Figure 7, where the impact of this strain on extrusion production and size is presented. Additionally, we updated Supplemental Figure 8 to include data illustrating the number of inclusions produced by this strain. We have also addressed these new results in the revised manuscript (lines 424-432). We are currently complementing inaC and ipaM mutant strains with various InaC and IpaM constructs that will be used in a follow up manuscript.

      b. Does overexpression of InaC, IpaM, or CTL0480 in a cdu1-null background prevent the degradation of these Incs and suppress the defects of cells infected by the cdu1 mutant (F-actin, Golgi, MYPT1)? This would show that the multiple phenotypes displayed by cells infected by the cdu1 null mutant are indeed related to the decreased levels of InaC, IpaM and CTL0480.

      We opted not to include data from the overexpression of these effectors in a cdu1-null background due to an unexpected decrease in shuttle plasmid load during overexpression. This development prompted concerns regarding the potential detrimental effects of overexpressing these effectors in the absence of Cdu1. Data supporting this observation are not included in this report.

      c. Figures 3A and 3B should be quantified (it says it is from 3 independent experiments). It would be important to have a relative perspective of how much Cdu1 protects these Incs over time (for InaC, it would also be nice to have the 36 and 48 hpi time-point). This is in contrast with the microscopy data in Figure 5, which illustrates very clear effects, and the quantification is a bit redundant.

      In Figure 3, we have incorporated a new Western Blot image showing endogenous InaC protein levels in Hela cells following infection with both WT Ct and cdu1::GII strains at 24, 36, and 48 hours post-infection (hpi). Additionally, we have quantified the Western Blot signals for both InaC and IpaM, and these results are also presented in Figure 3. The quantification of MYPT1 recruitment has been relocated to a supplementary figure. We have also included details regarding the methodology employed for the quantification of Western Blot signals in the Materials and Methods section.

      d. What is the subcellular localization of InaC, IpaM, CTL0480 and Cdu1 when analysed by transfection? Does Cdu1 bind to of InaC, IpaM, CTL0480 in infected cells? If this was attempted and unsuccessful it should be mentioned.

      In transfected HEK cells, InaC, IpaM, CTL0480, and Cdu1 all exhibit cytoplasmic localization with a diffuse pattern (data not shown). Despite our efforts, we encountered challenges in observing co-immunoprecipitation of Cdu1 with all three Incs in infected Hela cells at 24 hpi, We have duly acknowledged this limitation in our findings, as reflected in line 221-226 of the revised manuscript.

      1. Specific issues:

      2. Line 87, "propagule" is really needed to describe the EB?

      The EB is the infectious form of Chlamydia species that spreads within the host to renew its life cycle; thus, "propagule" is a suitable term to characterize the EB.

      • Exocytosis implies fusion with the plasma membrane so "inclusion is exocytosed" (line 91) is not entirely correct.

      In line 91 of the revised manuscript, we referred to extrusion as the exit of an intact inclusion from the host cell and omitted the use of "exocytosed" to describe this process.

      • Line 126, "a Ct L2 (LGV L2 434 Bu) background". Maybe "a Ct cdu1-null strain" would be enough and leave the detail for Materials and Methods.

      In line 128 of the revised manuscript, we omitted "(LGV L2 434 Bu)" to avoid using jargon that may be unfamiliar to readers not well-versed in Chlamydia terminology.

      • Line 138, in the previous Pruneda et al, Nature Microbiol 2018, the title of figure 4 is "ChlaDUB deubiquitinase activity is required for C. trachomatis Golgi fragmentation", so why raise this hypothesis? And why in the end is the acetylation activity of Cdu1 that promotes Golgi distribution? I think this related with infection vs transfection experiments but it deserved to be briefly explained/discussed.

      In lines 140-142 of the revised manuscript, we provide clarification that the DUB activity of Cdu1 is required for Golgi fragmentation in transfected cells. This observation supports our initial hypothesis suggesting that the DUB activity of Cdu1 is also required for Golgi distribution in infected cells, and our rationale for identifying targets of its DUB activity.

      • Lines 147-155, what is the relevance of this non-ubiquitinated proteins that come along? Couldn't this be synthesized?

      We have included a discussion on non-ubiquitinated proteins, as they could potentially encompass proteins that interact with those protected by Cdu1. This perspective provides supplementary insights into the roles of proteins targeted for ubiquitination in the absence of Cdu1. The results of this analysis have been succinctly summarized in a single paragraph within the initial manuscript (lines 151-159 of the revised manuscript).

      • Line 170, I think it is the first time that "Type 3 secretion"; perhaps explain in the introduction.

      Type 3 secretion systems have been extensively characterized and discussed in the literature, and we anticipate that the majority of our readers are well-acquainted with this secretory mechanism.

      • Line 184, I think it is the first time "microdomains" are mentioned; perhaps mention in the introduction.

      The definition of "microdomains" has been provided in line 191 of the revised manuscript.

      • Figure 2, as it stands the analysis with truncated Cdu1 proteins adds little to the work. Binding to the Incs seems to be affected when the TM domain is not present, but it still binds. And this is in a transfection context.

      The results depicted in Figure 2, involving truncated Cdu1 proteins, illustrates that Cdu1 is capable of interacting with InaC, IpaM, and CTL0480 even in the absence of infection. This finding serves as evidence suggesting that all three Incs could potentially serve as direct targets for Cdu1 activity. As a result, we prefer to keep these findings in the manuscript.

      • Line 219, "late stages of infection", this is shown (albeit not completely quantified) for IpaM and CTL0480, but not for InaC.

      In the revised Figure 3, we show InaC protein levels at 24, 36, and 48 hours post-infection, and we have incorporated quantitative data for both InaC and IpaM protein levels in the context of Hela cells infected with both WT L2 and cdu1::GII strains. This updated figure serves to emphasize the pivotal role of Cdu1 in safeguarding all three Incs during the late stages of infection.

      • Line 233, "pBOMB-MCI backbone" - is this needed in the Results section? And this refers to Figure 4 while pBOMB appear already in Fig. 3.

      We have removed “pBOMB-MCI backbone” in the revised manuscript.

      • Line 236, should be cdu1 endogenous promoter.

      In line 265 of the revised manuscript we have replaced Cdu1 with cdu1 (italicized).

      • Line 263, WT.

      In line 293 of the revised manuscript we replaced “wild type” with “WT”.

      • Line 277, IncA instead of "the Inc protein IncA".

      In the manuscript we wanted to emphasize that IncA is also an inclusion membrane protein, therefore we have included “the Inc protein IncA” in the revised manuscript to avoid any confusion.

      • How does the data in Figure 5 relates to the relatively few proteins ubiquitinated in cells infected with cdu1-mutant Ct? These Ub-labelling corresponds to ubiquitinated InaC, IpaM and CTL0480?

      The findings presented in Figure 5 demonstrate that the acetylase activity of Cdu1 plays a crucial role in enabling Ct to block all ubiquitination events taking place on or in proximity to the periphery of the inclusion membrane. This encompasses Cdu1 targets that might not have been identified through our proteomic analysis.

      • Lines 299-301, "M923 inclusions", there is certainly a clear way to write this.

      In lines 326-327 and 332-332 of the revised manuscript, we have clarified that “M923” is an incA null strain to provide clarification.

      • Line 309, is "peripheries" correct?

      We have changed “peripheries” with “periphery” in the revised manuscript (line 360).

      • Line 312, "Rif-R L2" and "M407" - can this be simplified?

      In the revised manuscript, "Rif-R L2" was substituted with "WT L2" in lines 363 and 382, while "M407" was exchanged with "an inaC null strain" in lines 311, 367, and 368. These same replacements were applied to the Figures and their corresponding legends for consistency.

      • Lines 308-321, and 326-335, these % are all approximate figures and this should be made clear.

      In lines 364-395 of the revised manuscript we have stated that all percentages are approximate values.

      • Fig. S1, kb and not k.b; what's the "+ control"; and is not really possible to have a PCR that works for the *? 3 kb is not that long.

      In the updated Figure S1, we have corrected "k.b" to "kb". In the legend of Figure S1, we have clarified that the + control corresponds to the cdu2 locus. Moreover, we could not cleanly amplify a 3 kb PCR product from bacteria in whole cell lysates of infected mammalian cells (Vero cells).

      • Fig. S2, kb and not k.b, bp and not b.p

      In the updated Figure S2, we have corrected “k.b” with “kb” and “b.p” with “bp”.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1 describes an affinity-based purification and mass spectrometric identification of differentially ubiquitinated proteins (host and chlamydial). Through different permutations of combinations of infection (mock, wild type, and Cdu1 mutant), three effectors, IpaM, InaC, and CTL0480, were identified as putative targets of Cdu1. The authors used a high-stringency cutoff, which could explain identification of only three targets. Having said this, the localization of Cdu1 to the inclusion membrane would be expected to also narrow down the number of targets. Interestingly, Cdu2, another deubiquitinase remained active in these experiments, which could have affected identification of Cdu1 targets. The authors addressed this issue by referring to previously reported structural studies. A somewhat glaring omission is the lack of reference to NF-kB as a substrate of ChlaDub1/Cdu1. In experiments by Le Negrate et al., ChlaDub1 ectopic overexpression in cells led to the deubiquitination of IkB-alpha, thus inhibiting the nuclear translation of NF-kB. Based on the inclusion membrane localization of Cdu1 during infection, is the identification of IkB an artifact of overexpression of Cdu1, or is it still a bona fide Cdu1 target?

      We conducted experiments using our cdu1 null strain to investigate whether IκBα could be a target of Cdu1 activity. While our findings are intriguing and relevant, it is not feasible to determine, at this stage, whether our findings result from a direct or indirect consequence of Cdu1 localizing to the inclusion membrane. Consequently, these findings extend beyond the scope of the current manuscript. We plan to explore the implications of our observations more deeply in a subsequent manuscript, where we intend to provide a more comprehensive and mechanistic analysis based on these preliminary findings. Additionally, we have referenced the potential targeting of IκBα by Cdu1 in lines 100-101 and 166-171 of the revised manuscript.

      Figure 2 demonstrates the individual interaction of the identified effectors with Cdu1. Interaction at the inclusion membrane is inferred from colocalization studies, while protein-protein interaction is monitored using ectopic overexpression of tagged versions of Cdu1 and the individual effectors. This is somewhat of a weakness of the manuscript because the mechanism of action of Cdu1 towards its target hinges on protein-protein interaction.

      Despite our efforts, we encountered challenges in co-immunoprecipitating endogenous Cdu1 with all three Incs in infected Hela cells at 24 hpi. There are multiple technical reasons as to why these interactions, which are predicted to be transient, will not be captured by bulk affinity approaches such as immunoprecipitations, especially when the starting materials are present in very low abundance. We acknowledged these limitations in our findings, as reflected in lines 221-226 of the revised manuscript.

      Figure 3 provides the first evidence in this paper of the importance of the inferred interaction of Cdu1 with the three effectors. The authors show that the loss of cdu1 has stability consequences on the three effectors. This figure would benefit from quantifying InaC- or IpaM-positive inclusions in the same manner done with CTL0480. The timepoint-dependent effect of Cdu1 loss of function is intriguing. Do InaC and IpaM retention at the inclusion show the same timepoint-dependent characteristic?

      In the revised Figure 3, we have incorporated InaC protein levels at 24, 36, and 48 hours post-infection. Additionally, we have included quantitative data representing both InaC and IpaM protein levels in HeLa cells infected with both WT L2 and cdu1::GII strains. The quantification of CTL0480 localization to cdu1::GII inclusions has been moved to a supplementary figure.

      This updated figure illustrates that the absence of Cdu1 has a time-dependent impact on both InaC and IpaM. However, it is noteworthy that the kinetics of degradation for these two proteins diverge significantly.

      For Figure 7, the authors should consider monitoring timing of inclusion extrusion to gain additional insight into the functional interactions between the effectors. For example, the loss of CTL0480 leads to increased extrusion, implying a role in delaying or suppressing extrusion. In a time-course experiment, a CTL0480 mutant could exhibit an earlier occurrence of inclusion extrusion.

      One of the principal discoveries of this study is that Cdu1, InaC, IpaM, and CTL0480 collaborate to facilitate optimal extrusion of Ct from host cells. These findings represent a significant contribution to our understanding of how Chlamydia controls its exit from infected cells. We are currently in the process of expanding on these results. A forthcoming follow-up manuscript will provide more detailed and comprehensive exploration of these findings.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments.

      a. I have some concerns related to the time point chosen for mass spec analysis and potential caveats and alternative interpretations. This work was done relatively early (24 hours) compared to the most convincing Cdu1 functions that occur later, thus this may limit the authors global understanding of protein changes. For example, the known substrate of Cdu1, Mcl-1 was not identified but this is altered relatively late during infection. Thus, the surprise that minimal host proteins are altered in ubiquitination may be partially driven by the timing of the assay. This should be more clearly discussed as a caveat.

      In the revised manuscript (lines 166-171), we have acknowledged that there might be additional targets of Cdu1 that remain unidentified, primarily due to the specific time point we utilized in our study.

      b. Another caveat to these studies is while the loss of Cdu1 alters different effectors stability and function and extrusion size, these changes do not modulate bacterial growth in cells. The authors speculate that regulating extrusion size may alter interactions with innate cells to drive dissemination. However, a previous study found defects in an animal model using a Cdu1 transposon mutant found decreased bacterial load in the genital tract. It is also possible that redundancy of effectors may mask importance in growth of Cdu1, but the authors strongly argue against redundancy of Cdu1 and Cdu2 so this weakens the authors argument here. These concepts and published data should be more directly discussed in the context of the authors proposed extrusion model and the role in driving Chlamydia growth and pathogenesis.

      In our revised manuscript (lines 460-466) we propose that while we do not observe any growth impairments during Ct growth in the absence of Cdu1 in HeLa cells, the reduction in bacterial loads observed in murine models of infection with an independent cdu1 mutant strain (cdu1::Tn) may potentially be linked to defects in extrusion production or alterations in Cdu1-dependent regulation of extrusion size.

      c. Recent studies have found that IFNg activation can result in dramatic changes in ubiquitination to pathogen containing vacuoles. While some of these are blocked by the newly found GarD, it seems possible that Cdu1 may also play a role (and perhaps use its deubiquinating activity) to further protect the inclusion. In light of published results showing that Cdu1 mutants have lower IFU burst size only in IFNg activated cells, this may be an important caveat in the current studies. This should be more directly addressed in the current manuscript.

      We have incorporated two experimental findings indicating that the presence of Cdu1 is not required for Ct to defend itself against IFN cellular immunity in human cells. These recent discoveries are now presented in the updated Figure 5 and detailed in lines 338-355 of the revised manuscript.

      d. On lines 433-434 the authors claim that Cdu1 is atypical since it is not encoded with the metaeffector/target pairs. However, this is an oversimplification of what is known about metaeffectors. For example, there are meta-effector/effector pairs that are not encoded together in Legionella (see table 1 DOI: https://doi.org/10.3390/pathogens10020108). Thus, the discussion should be adjusted. It seems Cdu1 is the first meta-effector found in Chlamydia, and maybe this should be highlighted more strongly rather than its uniqueness in this aspect of meta-effector/effector functions.

      In lines 488-489 of the revised manuscript, we have removed the assertion that Cdu1 functions as an atypical metaeffector and emphasized that it represents the initial discovery of a metaeffector within Ct.

    1. Author Response

      eLife assessment

      This important work describes the first high-resolution structure of HGSNAT, a lysosomal membrane protein required for the degradation of heparan sulfate (HS). Through careful structural analysis, this work proposes potential reasons why certain mutations in HGSNAT lead to lysosomal storage disorders and outlines the enzyme's catalytic mechanism. The experimental evidence presented provides incomplete support for the proposed molecular mechanism of the HS acetylation reaction and the impact of disease-causing mutations.

      We thank the editors and reviewers for taking the time to provide a critical assessment of our manuscript. We appreciate the input and suggestions to improve the analysis. Included here are only our provisional responses. We will address the concerns raised in more detail and incorporate them in the revised version of the manuscript.

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoAbound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      We thank the reviewer for their encouraging and positive overall assessment of our work.

      1. The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      Thank you for highlighting this concern. The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript; the acetyl-CoA that we see in the structure was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We are working on establishing a robust assay to study this complicated and compartmentalized acetyl transfer assay.

      1. In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, like a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      The acetyl-CoA bound structure presented in the paper does not conclusively support a potential for isomerization and conformational dynamics. We agree with the reviewer that the reaction schematic presented in Figure 5 is speculative. We acknowledge in the discussion that our structure represents only a single step of the reaction, and defining the precise mechanism of acetyl transfer needs additional work. However, we will reword the discussion and change Figure 5 to address this concern raised by multiple reviewers.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for the degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function, and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occurs has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as nonsense mutations, splicesite variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles and a density assigned to acetylCoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The description of the architecture of HGSNAT is the highlight of the paper since this corresponds to the first description of the structure of a member of the transmembrane acyl transferase (TmAT) superfamily. The high resolution of an HGSNAT bound to acetyl-CoA is an important leap in our understanding of the HGSNAT mechanism. The density map is of high quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC. The description of the dimeric interphase is a novel result, and future studies are left to confirm the importance of oligomerization for function.

      We thank the reviewer for their time and for highlighting both the quality and novelty of the structure presented in this work.

      Weaknesses:

      Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution.

      Thank you for pointing out this concern. The proposed mechanistic model in Figure 5 is a hypothesis based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al, (1983), Miekle et al., (1995) and Fan et al., (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. Especially because our structure presents only a single step of the reaction, which does not conclusively support either ping-pong or random-order bi-substrate reactions. We will rephrase this section of our discussion and edit Figure 5 to address this concern.

      The authors discuss that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case.

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We agree that mutagenesis, catalysis, and structural evidence for the same are not currently available. We are pursuing a more thorough exploration of the role of both H269 (previous studies) and N258 (from this study) on the stability and function of HGSNAT.

      In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other.

      We agree with the reviewer’s observation that the structure doesn’t indeed support one reaction mechanism or another. We are pursuing the structural and kinetic characterization of HGSNAT in the presence of other co-substrates and multiple pHs that are required to address this concern thoroughly.

      Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions.

      We are working on assessing the impact of specific mutations on the stability of HGSNAT and will add them to the revised version of the manuscript. We thank the reviewer for this suggestion.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer and define the architecture of the alpha- and beta- GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating the building of a high-confidence atomic model. Importantly, the density of the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The open-to-lumen structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      We thank the reviewer for their positive assessment of the data presented in this work. We really appreciate and agree with the reviewer's comment that the “structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional studies.”

      Weaknesses:

      While the structural data for the open-to-lumen state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      We agree with the reviewers’ assessment and are heavily invested in pursuing the structures of all the steps of acetyl transfer by HGSNAT.

      A potentially significant weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      We thank the reviewer for this comment. While the proteolytic cleavage of the protein remains debated, we find no evidence of such an event in our purification (SDS-PAGE and SEC). Studies like Durand et al., (2010) and Fan et al., (2011) suggest that even the ER retained monomeric HGSNAT is active. Because we see acetyl-CoA (co-substrate) bound to the protein in our structure, we surmise that proteolysis is not necessary for function, at least not for substrate binding. However, we are working towards the structural and kinetic characterization of recombinant α- and β-HGSNAT construct to explore the role of proteolysis on HGSNAT stability and function.

    1. Author Response

      We are delighted that eLife has assessed our study as a valuable contribution as well as appreciating the importance of working on asymptomatic reservoirs of P. falciparum in high transmission where not just children, but adolescents and adults harbor multiclonal infections. The constructive public reviews will serve to improve our manuscript.

      Detailed responses to referees’ comments and a revised manuscript are forthcoming. Here we make a provisional response to three key areas addressed by the referees:

      (1) census population size

      Referee 1 raises important questions although we respectfully disagree on the terminology we have adopted (of “census”) and on the unclear utility of the proposed quantity.

      We consider the quantity a census in that it is a total enumeration or count of the infections in a given population sample and over a given time period. In this sense, it gives us a tangible notion of the size of the parasite population, in an ecological sense, distinct from the formal effective population size used in population genetics. Given the low overlap between var repertoires of parasites (as observed in monoclonal infections), the population size we have calculated translates to a diversity of strains or repertoires. But our focus here is in a measure of population size itself. The distinction between population size in terms of infection counts and effective population size from population genetics has been made before for pathogens (see for example Bedford et al. 2011 for the seasonal influenza virus and for the measles virus) and is a clear one in the ecological literature for non-pathogen populations (Palstra et al. 2012).

      Both referees 1 and 2 point out that census population size will be sensitive to sample size. We completely agree with the dependence of our quantity on sample size. We used it for comparisons across time of samples of the same depth, to describe the large population size characteristic of high transmission, and persistent across the IRS intervention. Of course, one would like to be able to use this notion across studies that differ in sampling depth.

      Here, referee 1 makes an insightful and useful suggestion. It is true that we can use mean MOI, and indeed there is a simple map between our population size and mean MOI (as we just need to divide or multiply by sample size). We can do even more, as with mean MOI we can presumably extrapolate to the full sample size of the host population, or the population size of another sample in another location. What is needed for this purpose is a stable mean MOI relative to sample size. We can show that indeed in our study mean MOI is stable in that way, by subsampling to different depths of our original sample. We will include in the revision discussion of this point and result, which allows an extrapolation of the census population size to the whole population of hosts in the local area. We’ll also clarify the time denominator, as given the typical duration of infections, we expect our population size to be representative of a per-generation measure.

      Referee 2 suggests we adopt the term “census count” but as a census in our mind is a count we prefer to use “census”.

      Referee 3 considers the genetic data tracking parasite MOI and census changes gives the same result as prevalence which tracks infected hosts. Respectfully, we disagree and will provide an expanded response.

      (2) the importance of lineages (in response to referee 2)

      We do not think that lineages moving exclusively through a given type of host or “patch” is a requirement for enumerating the size of the total infections in such a subset. It is true that what we have is a single parasite population, but we are enumerating for the season the respective size in host classes (children and adults). This is akin to enumerating subsets of a population in ecological settings.

      We are also not clear on the concept of lineage for these highly recombinant parasites as we struggle to find highly related repertoires. In fact, we see the use of the var fingerprinting methodology as a means to capture changes in strain or var repertoires dynamics as a result of changing transmission conditions.

      (3) var methodology

      Comments and queries were made by all three referees about aspects of var methodology, including the Bayesian approach. These will be addressed in our full response.

      Here we respond to a very good point made by referee 2: “Thinking about the applicability of this approach to other studies, I would be interested in a larger treatment of how overlapping DBLa repertoires would impact MOIvar estimates. Is there a definable upper bound above which the method is unreliable? Alternatively, can repertoire overlap be incorporated into the MOI estimator?”

      There is no predefined threshold one can present a priori. Intuitively, the approach to estimate MOI would appear to breakdown as overlap moves away from extremely low, and therefore, for locations with lower transmission intensity. Interestingly, we have observed that this is not the case in our paper by Labbé et al. 2023 where we used model simulations in a gradient of three transmission intensities, from high to low. The original varcoding method performed well across the gradient. This may arise from a nonlinear and fast transition from low overlap to high overlap that is accompanied by the MOI transitioning quickly from primarily multiclonal (MOI > 1) to monoclonal (MOI = 1). This issue needs to be investigated further, including ways to extend the estimation to explicitly include the distribution of DBL repertoire overlap.

      References: Bedford T, Cobey S, Pascual, M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol 11, 220. https://doi.org/10.1186/1471-2148-11-220

      Labbé F, He Q, Zhan Q, Tiedje KE, Argyropoulos DC, Tan MH, Ghansah A, Day KP, Pascual M. 2023. Neutral vs . non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19 :e1010816. doi:doi.org/10.1101/2022.06.27.49780

      Palstra FP, Fraser DJ. 2012. Effective/census population size ratio estimation: a compendium and appraisal. Ecol Evol. Sep;2(9):2357-65. doi:10.1002/ece3.329.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study isolated extracellular vesicles (EV) from healthy controls (HCs) and Parkinson patients (PwP), using plasma from the venous blood of non-fasting people. Such EVs were characterized and validated by the presence of markers, their size, and their morphology. The main aim of the manuscript is to correlate the presence of synaptic proteins, namely SNAP-25, GAP-43, and SYNAPTOTAGMIN-1, normalized with HSP70, with the clinical progression of PwP. Changes in synaptic proteins have been documented in the CSF of Alzheimer's and Parkinson's patients. The demographics of participants are adequately presented.

      • One important limiting, as well as puzzling aspect, is the fact that authors did not find differences between groups at the beginning of the study nor after one year, after age and sex adjustment.

      Response: Thanks for your comments. We acknowledge your observation that the absence of a discernible difference in plasma EV synaptic protein levels between the PD and control subjects constitutes a significant limitation of our study. This outcome could be attributed to the fact that the controls were recruited from the neurology outpatient clinic, representing a group that could be considered "sub-healthy." Moreover, these individuals are not exempt from aging-related neurodegenerative processes. Considering that our PD subjects are in the early stages of the disease (with a mean disease duration of less than 3 years) and that synaptic dysfunction is a broader indicator rather than specific to PD, these factors could collectively contribute to the lack of distinction between the PD and control groups.

      However, our primary intention was also to explore the potential of plasma EV synaptic proteins as predictive markers for disease progression in PD. In this regard, we have identified their applicability within the current PD cohort. We are committed to conducting further follow-up with these study subjects over an extended duration to delve deeper into these findings.

      We revised the following statement in the discussion part to address this issue as following “Additionally, synaptic dysfunction is a frequently observed phenomenon in several neurological diseases, and it is not exclusive to PD. Consequently, the HC group in our current study may have included individuals with coexisting neurological conditions, potentially explaining the lack of a significant difference between the PD group and the HCs. However, this approach also illuminates the significance of synaptic dysfunction in the advancement of PD. This insight can be invaluable for monitoring disease progression, particularly in the context of clinical trials focused on disease modification.”

      • Tables in general are hard to follow. Specifically, Table 2 does not convey a clear message nor in the text of the Table itself, and the per 100% of change needs to be explained in the corresponding legend.

      Response: Thanks for your comment. In Table 2, our aim was to demonstrate the association between the change of plasma EV synaptic proteins with the change of clinical severity, and presented as coefficient (p value). We apologize for any prior ambiguity in the main text's description of these results and have since made revisions to enhance clarity.

      Regarding the "per 100% change," this is due to the quantification of plasma EV synaptic proteins being based on a semi-quantitative Western blot method. Each measurement was normalized by the average baseline plasma synaptic protein levels of healthy controls (HCs). The term "per 100% change" denotes the increase or decrease in plasma EV synaptic protein abundance relative to the average baseline levels observed in healthy controls. We apologize for any confusion caused and removed this term. In addition, we rephrased the statement to ensure better understanding and readability in the Table legend of revised manuscript as following “The association between the change of plasma EV synaptic proteins abundance (between baseline and follow-up) with the change of clinical severity in motor and cognitive domains (between baseline and follow-up) in people with Parkinson’s disease. A generalized linear model was employed and the data was presented as coefficient (p value).”

      • It is only when PwP were classified as a first quartile that a significantly greater deterioration was found. However, in the case of tremor, the top 25% had values going from 0.46-0.47 to 0.32-0.35, whereas the lower three quarters went from 0.33-0.34 to 0.27-0.28 depending on the protein analyzed. This needs to be clarified in the text.

      Response: Thanks for your comments. As per the unified Parkinson's disease rating score (UPDRS), a higher score indicates greater severity of symptoms. Regarding tremor, we observed a general trend of improvement in both groups. PwP with elevated baseline plasma EV proteins had a trendy of worse tremor score at baseline, and the improvement was significantly better than the rest of PwP. This improvement seems to contradict the progressive nature of PD, and one possible explanation could be the alleviation of symptoms due to medication usage. The assessment of motor symptoms took place within the hospital setting, where we refrained from requesting patients to withhold their anti-PD medications due to concerns about safety issues such as falls. Consequently, certain motor symptoms might have been effectively controlled by the anti-PD medication. Traditionally, symptoms like tremor and rigidity (as reflected by the akinetic rigidity score) respond well to medications, while postural instability and gait disturbance (PIGD) are less responsive. In our cohort, we noted an improvement in tremor scores and stability in akinetic rigidity (AR) scores. Conversely, PD patients with higher baseline plasma EV synaptic protein levels exhibited notable progression in PIGD scores. These findings have been documented in the results section and discussed comprehensively within the revised manuscript as following “On the other hand, the evaluation of motor symptoms occurred in a hospital setting where we did not ask patients to stop taking their anti- PD medications due to safety concerns like the risk of falls. As a result, specific motor symptoms, particularly tremor and AR, which are more sensitive to medication compared to PIGD, may have been effectively managed by the anti-PD medications. This could potentially explain the improvement in tremor observed between the baseline and one-year follow-up, especially among PwP with elevated baseline plasma EV synaptic proteins.”

      • Table 3 is hard to read and some of the values seem repetitive, especially for tremor, AR, and PIGD. It looks as if Figure 2 represents the same information as Table 3.

      Response: Thanks for your information. We have ensured the accuracy of the results presented in Table 2. While some of the entries may appear similar, they do indeed possess distinct differences.

      To enhance readability, we streamlined the information in Table 3 by removing the p-values from the intra-group comparisons between baseline and the 1-year follow-up within each domain. We retained the original p-values for trend related to the inter-group comparisons for changes. Detailed information has been relocated to the supplementary section of the revised manuscript. In Figure 2, we illustrated the relationship between baseline plasma extracellular vesicle (EV) synaptic protein levels and the clinical assessment parameters during follow-up in patients with Parkinson's disease (PwP). This portrayal is distinct from the information depicted in Table 3.

      If you had concerns about the resemblance between Table 3 and Figure 3, please note that the values in Table 3 represent raw scores, while the values in Figure 3, namely the estimated marginal means, are the "adjusted" scores for UPDRS-II and PIGD at baseline and follow-up. These adjustments encompass age, sex, and disease duration. We sincerely apologize for any lack of clarity in our previous description and have since revised it accordingly.

      • The text and figure legends are not helpful in guiding the reader to understand the presented information.

      Response: Thanks for your comments and we apologized for the unclear statement. We revised the figure legend and the main text for better understanding of the readers.

      Reviewer #2 (Public Review):

      Hong and collaborators investigated variations in the amount of synaptic proteins in plasma extracellular vesicles (EV) in Parkinson's Disease (PD) patients on one-year follow-up. Their findings suggest that plasma EV synaptic proteins may be used as clinical biomarkers of PD progression.

      • It is a preliminary study using semi-quantitative analysis of synaptic proteins.

      Response: Thanks for your comments. The present study represents the initial phase of our investigation into the role of plasma EV synaptic proteins within our PD cohort. Our findings have revealed the potential predictive significance of these synaptic proteins in relation to PD progression. We are committed to conducting further follow-up with these study subjects over an extended period.

      Furthermore, it's important to acknowledge that the semi-quantitative approach employed to assess protein abundance was a limitation of this study. This limitation stems from the low concentration of plasma EV synaptic proteins, which restricts the feasibility of utilizing techniques such as ELISA or other quantitative methods for protein assessment. We have duly acknowledged this limitation within the scope of the present study as following “Semiquantitative assessment of plasma EV synaptic protein (SNAP-25, GAP-43, and synaptotagmin-1) levels was performed using western blot analysis. The lack of absolute values limits further clinical application.”

      Moving forward, we intend to adopt alternative EV isolation methods that enable the extraction of a larger abundance of plasma EV proteins, facilitating more accurate quantitative assessments. In addition, a longer longitudinal follow-up is warranted to clearly assess the prognostic efficacy of plasma EV synaptic proteins in PwP, which we had mentioned in the manuscript.

      • The authors have a cohort of PD patients with clinical examination and a know-how on EV purification. Regarding this latter part, they may improve their description of EV purification. EV may be broken into smaller size EV after freezing. Does it explain the relatively small size in their EV preparation? Do the authors refer to the MISEV guidelines for EV purity?

      Response: Thanks for your comments. In the previous manuscript, we provided a relatively detailed account of the procedures related to EV isolation and validation (https://doi.org/10.1096/fj.202100787R). In the revised manuscript, we added some information about the principle of the EV isolation kit, and the validation antibody as following “Plasma EVs were isolated from 1 mL of plasma by exoEasy Maxi Kit (Qiagen, Valencia, CA, USA), a membrane-based affinity binding step to isolate exosomes and other EVs without relying on a particular epitope, in accordance with the manufacturer’s instructions and storaged in the −80。C freezer. The isolated plasma EVs were then eluted and stored. Usually, 400 μL of eluate is obtained per mL of plasma. The isolated plasma EVs were validated according to the International Society of Extracellular Vesicles guidelines, which include1.markers, including the presence of CD63 (ab59479, Abcam, Cambridge, UK), CD9(ab92726, Abcam, Cambridge, UK), tumor susceptibility gene 101 protein (GTX118736, GeneTex, CA, USA) and negative of cytochrome c (ab110325; Abcam, Cambridge, UK) 2. Physical characterization through the nanoparticle tracking analysis, which demonstrated the majority of the size of EV are mainly within 50-100nm 3. The morphology from the electron microscopy analysis. The validation had been described previously [29-31]. “

      It's important to note that our primary focus was on exosomes, the smallest subtype of EVs. Through nanoparticle tracking analysis, we observed that the majority of isolated EVs fell within the diameter range of 50-150nm, exhibiting significant surface marker (i.e. CD63 and CD9) expression. Moreover, electron microscopy confirmed their vesicular morphology. These meticulously validated EVs were promptly analysed post-isolation.

      However, we acknowledge that the plasma obtained from study participants might have undergone freezing prior to EV isolation. This freezing process has the potential to diminish the yield rate of EVs and result in some degree of fragmentation. We have duly included this issue as a limitation in our revised manuscript as following “The final technical issue in the present study was the relatively small size of the isolated EVs. Despite the primary focus on isolating exosomes, which are the smallest type of EVs, it's important to consider that the presence of small-sized EVs could potentially be attributed to EV fragmentation that occurs during the freezing and thawing processes.”

      • Regarding synaptic protein quantification, the choice of western blotting may not be the best one. ELISA and other multiplex arrays are available. How the authors do justify their choice?

      Response: Thanks for your comments. We appreciate your input regarding the semi-quantitative western blot analysis not being the most optimal approach. Owing to the limited quantity of isolated plasma EVs and the significant protein abundance of synaptic proteins within these EVs, we did explore the use of an ELISA assay. However, it's worth noting that for a specific subset of the samples, the readout obtained was lower than the lower limit of detection of the ELISA kit. In response, we have incorporated this point as limitation within the discussion section of the revised manuscript as following “Semiquantitative assessment of plasma EV synaptic protein (SNAP-25, GAP-43, and synaptotagmin-1) levels was performed using western blot analysis. The lack of absolute values, i.e. from the results of enzyme-linked immunosorbent assay, limits further clinical application.”

      • Do the authors try to sort plasma EV by membrane-associated neuronal EV markers using either vesicle sorting or immunoprecipitation?

      Response: Thanks for your comments. The current study did not specifically isolate neuron-derived extracellular vesicles (EVs), potentially introducing some bias to the results. However, it's important to note that synaptic proteins, such as SNAP-25, exhibit a high degree of neuron-specific expression, with a predominant presence in the brain (as indicated by https://www.proteinatlas.org/ENSG00000132639-SNAP25/tissue). Given this context, the limitation of not analyzing neuron-derived EVs could be mitigated to some extent. In response, we have incorporated this point as limitation within the discussion section of the revised manuscript as following “Furthermore, this study evaluated the overall plasma EVs rather than specifically focusing on neuron-derived exosomes, potentially introducing a bias towards somatic-origin EVs. Nonetheless, it is worth noting that synaptic proteins primarily originate from neurons. Even when considering neuron-derived exosomes, it's important to recognize that they are not exclusively derived from the brain, which can lead to contamination from the peripheral nervous system.”

      • Many technical aspects may be improved. Such technical questions weakened the authors' conclusions.

      Response: Thanks for your comments. We recognize that the aforementioned issues represent limitations of our current study. In response, we have incorporated these points as limitations, including the semi-quantitative assessments, the isolation of total but not neuron-derived exosomes in the plasma, and the short follow-up time within the discussion section of the revised manuscript.

      • The discussion is pretty long to justify the data. It may be shortened by adding some information in the introduction.

      Response: Thanks for your comments. We have repositioned a statement from the second paragraph of the discussion to the introduction. This adjustment serves to enrich the background understanding of the link between synaptic dysfunction and neurodegenerative diseases.

    1. Author Response

      Reviewer #1 (Public Review)

      The manuscript by Singh et al proposes a new theoretical model for the phenomenon of planar cell polarity (PCP). The new model is simulating the emergence of the subcellular polarity of the Fat-Ds pathway, based on the interactions of the protocadherins Fat and Ds at the boundary between cells and in response to external gradients. Several mathematical models for PCP have been previously developed focusing on different aspects of PCP, including non-autonomy domineering (Amonlirdviman et al.), the effect of stochasticity on polarity (Burak et al.), gradient sensing (Mani et al), formation of molecular bridges (Fisher et al.) to name a few. The current modeling approach suggests a new model, based on a relatively simple set of equations for membrane Fat and Ds and their interactions, both in 1D (line of cells) and in 2D (hexagonal array). The equations are relatively simple on one hand, allowing performing tractable computational analysis as well as analytical approximations, while on the other hand allowing tracking membrane protein levels, which is what is measured experimentally. It has been previously shown that achieving polarity requires local feedback that amplify complexes in one orientation at the expense of complexes in the opposite orientation (e.g. Mani et al.). Interestingly, the current manuscript shows that a simple assumption, that Fat-DS complexes are stabilized when bound is sufficient to induce PCP when concentrations are high enough. The authors use the model to show how it captures several experimental observations, as well as to analyze the sensitivity to noise, the response to gradients, and the response to local perturbations (mutant clones). The manuscript is clear and the analysis is mostly coherent and sensible (although some parts need to be clarified, see below). The main issue I have with the manuscript is that it mostly describes how it captures different features that were mostly explained in previous models. I do think the authors should do more with their model to explain features that were not explained by other models, and/or generate non-trivial predictions that can be tested experimentally.

      We thank the reviewer for the positive feedback and valuable comments We have comprehensively modified the manuscript by including new results and detailing the specific model prediction and their potential experimental tests to address the concerns.

      Reviewer #2 (Public Review):

      The setting of planar cell polarity in epithelial tissues involves a complex interplay of chemical interactions. While local interactions can spontaneously give rise to cell polarity, planar cell polarity also involves tissue scale gradients whose effects are not clear. To understand their role, the authors built a minimal mechanistic model in considering two atypical cadherins, Fat (Ft) and Dachsous (Ds) which can associate at cell-cell interfaces to form hetero-dimers in which monomers belong to adjacent cells. This association can be seen as a local interaction between cells and is also sensitive to overall concentration gradients. From their model which appears to capture diverse experimental observations, the authors conclude that tissue-scale gradients provide to planar cell polarity a directional cue and some robustness to cellular stochasticity. While this model comes after similar works reaching similar predictions, the quality of this model is in its simplicity, its convenience for experimental testing, and the diversity of experimental observations it recapitulates.

      A strength of this work is to recapitulate many experimental observations made on planar cell polarity. It, for example, seems to capture the response of tissues to perturbations such as local downregulation of some important proteins, and the polarity patterns observed in the presence of noise in synthesis or cell-to-cell heterogeneity. It also gives a mechanistic description of planar cell polarity, making its experimental interpretation simple. Finally, the simplicity of the model facilitates its exploration and makes it easily testable because of the reduced amount of free model parameters.

      A weakness of this work is that it comes after several models with similar hypotheses and similar predictions.

      Another weakness is that some conclusions of this work rely on visual appreciation rather than quantification. This is particularly true for what concerns 2D patterns. An argument of the authors is for example that their model reproduces a variety of known spatial patterns, but the comparison with experiments is only visual and would be more convincing in being more quantitative.

      We are grateful to the reviewer for a critical evaluation of the manuscript and for giving important suggestions. We have incorporated all the comments and revised the manuscript accordingly by including quantitative analysis of all the results presented.

      Reviewer #3 (Public Review):

      Using theory, the authors study mechanisms for establishing planar cell polarity (PCP) through local and global modules. These modules refer to the interaction between neighbouring cells and tissue-wide gradients, respectively. Whereas local interactions alone can lead to tissue-wide alignment PCP, a global gradient can set the direction of PCP and maintain the pattern in presence of noise. In contrast, the authors argue that a global gradient can only generate PCP to an extent that is proportional to the gradient magnitude.

      The authors formulate a discrete model in one and two spatial dimensions that describe the assembly dynamics of PCP proteins on membranes. The number of proteins per cell remains constant. Additive noise is introduced to account for stochasticity in the attachment/detachment kinetics of proteins. Furthermore, ’quenched’ noise is introduced to account for variations of protein numbers between cells. The authors perform simulations of the stochastic discrete model in various situations. In addition, they derive a continuum description to perform some analytical computations.

      The strength of this analysis relies clearly on showing that simple dynamics can lead to tissue-wide PCP even in absence of a gradient in protein expression. A number of phenomena observed in tissues are qualitatively reproduced. In two spatial dimensions, they find swirling patterns that resemble patterns found in tissues when a global gradient is absent. The model also captures qualitative effects due to the down-regulation of one of the PCP proteins in a certain region of the tissue.

      The main weak point is that, from a physical point of view, the findings are not particularly surprising. Furthermore, some assumptions underlying the model, need some more justification. This holds notably for the question, of why additive noise is appropriate to account for the effect of stochasticity in the attachment-detachment dynamics of the proteins. Finally, the authors consider a situation that they consider to be one of the most interesting features of PCP, namely, the formation of PCP in the presence of a region with a down-regulated PCP protein and in presence of a gradient. Unfortunately, the effect is not very clear and the data provided remains limited.

      We thank the reviewer for the valuable comments are critique of the work. We have considered all the concerns and revised the manuscript comprehensively. In particular, we have elaborated the sections on model assumptions and added new figures/figure-panels to quantitatively present the model predictions. We have also revised the details of the one-dimensional continuum theory for PCP which, we feel, presents a detailed quantitative picture of PCP and its dependence on model parameters.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Leiba et al. aim at establishing the developing zebrafish embryo as a suitable infection model to study Salmonella persistence in vivo. Under environmental stress (ex: macrophage phagosomes) a proportion of bacteria switch to a slow/arrested growth state conferring increased resistance to antibiotic treatments. Persisters are getting increasingly linked to infection relapses. Understanding how persistent infections emerge and bacteria survive in an organism for long time without replicating before switching back to a replicative state is essential. Zebrafish represents an alternative model to mice offering the possibility to image the whole organism and capture persistency with an amazing spatio-temporal resolution.

      In this paper, the authors demonstrate that persistent infections of Salmonella can be reproduced in the developing zebrafish. The kinetics of infection have been well characterized and shows a very nice heterogeneity between animals demonstrating the complex host-pathogen interactions (Fig 1). From the perspective of persistence, the presence of Salmonella survivors to host clearing is reported until 14dpi demonstrating the possibility to induce persistent infection in this model. Through the manuscript, the authors have used a variety of state-of-the-art technics illustrating the flexibility of this model including microscopy and imaging of specific immune populations, various transgenic animals and selective depletion of macrophages or neutrophils to assess their relative contributions. Overall, the conclusions of the authors are well supported by the presented data. This said, the authors should strengthen the conclusions of the paper by providing a better characterization of the infection.

      Major comments:

      1) Figure 1: What is the general life-spam of the fish?

      The general life-span of the zebrafish is approximately 3 years on average. Persistent infection is determined by the existence of a fraction of bacteria that endure over an extended period (after 96 hpi). Further, we observed Salmonella persistence for 14 days. In figure 1, we don’t think that the information of the general life-span of the zebrafish is critical.

      2) Figure 2: It would be nice to clearly state what infection scenario we are looking at. Have the authors studied "high proliferation", "infected" or "cleared" zebrafish?

      In Figure 2 we have studied the "infected" group. Both "high proliferation" and "cleared" larvae were excluded from the analysis. This is now clearly stated in the legend of Figure 2.

      3) Figure 3 and 4: It would be very informative if the authors can tell us what proportion of Salmonella is associated with macrophages and neutrophils. From panel C and D (Figure 3) and Figure 4 C and D and Suppl Fig 1, it seems that a lot of bacteria are extracellular. Maybe an EM image of the tissue would help to understand if the bacteria is "all" intracellular or intracellular.

      We apologize for any misunderstanding regarding the presence of intra- and extracellular bacteria depicted in Figure 3 C and D, Figure 4 C and D and Figure 3 -Suppl Fig 1. These figures illustrate infection experiments conducted in single-reporter larvae, limiting our analysis to bacteria associated with a single cell type. Figure 3G and Figure 4E-G, the panels depict infection experiments carried out in dual-reporter larvae, showing bacteria associated or not with macrophages and neutrophils. The present study aimed to establish the role of neutrophils and macrophages in the control of early and persistent Salmonella infection but further studies will focus on the exact localization of Salmonella during the course of the infection and, despite being a challenging technique for zebrafish, electron microscopy could be of great interest, allowing to visualize any type of cells (to determine if all bacteria are intracellular) at high resolution.

      4) Figure 3 and 4: It would be very useful if the authors can tell us if the intracellular bacteria are mainly found individually (like in Figure 3C) or does host cells harbor many intracellular bacteria. Looking at figure 4G: it is not clear to me how many intracellular bacteria can be counted on this image.

      This is an interesting suggestion. At present, an accurate quantification of the intracellular bacteria on microscopy 3D-datasets is challenging because bacteria aggregate inside the cells. At 4 hpi, single bacteria can occasionally be observed outside leukocytes, while most of infected macrophages harbored several intracellular bacteria (bacteria aggregates). To compare the levels of intracellular bacterial between acute and persistent stages, we measured the size of E2Crimson-positive (E2Crimson+) events. At 5 hpi, the median volume of E2Crimson+ events was lower than that at 4 dpi. The size distribution analysis of E2Crimson+ events indicated a higher representation of smaller volumes (0.5-1.5 m3 and 1.5-10 m3) at 5 hpi compared to 4 dpi, a stage during which very large E2Crimson+ events were observed (between 100-1000 m3, with some exceeding 1000 m3). This observation suggests an elevated presence of intracellular bacteria within the cells during persistent stages and that intracellular bacteria are predominantly observed as multiple rather than as solitary entities. This analysis has been incorporated in new Figure 5.

      5) Figure 3 and 4: The authors should also perform an experiment with a Salmonella strain harboring a growth reporter to quantify the amount of replicating and non-replicating bacteria. This experiment is not absolutely necessary for the story, but if possible, it would provide a very nice add-up to the story and impact to the paper.

      We welcome the reviewers’ suggestion, which we have indeed considered and planning to carry on in the future, along with experimented more oriented on the bacterial side.

      6) Figure 6: The authors should provide in suppl. the flow cytometry scatter plots used to delineate the different subpopulations.

      We agree with the reviewer that the flow cytometry scatter plots used to delineate the different subpopulations were missing and are now incorporated in new Fig 7 - figure supplement 2.

      7) Figure 6: A specific characterization of macrophages harboring Salmonella persisters at 4dpi is missing. As shown by the authors in Figure 6, the tnfa- populations of macrophages at 4dpi are very similar for both infected and non-infected larvae. Persisters should indeed reside within tnfa- macrophages but they should also induce a specific signature through the actions of Salmonella effectors. Measuring this signature will allow a direct comparison with published data in mice and assess how accurately the zebrafish model recapitulates the manipulation of macrophages by Salmonella

      We agree with the reviewer that a specific characterization of macrophages harboring persistent Salmonella at 4 dpi is missing. However due to the technical limitation inherent to the model (limited recovery of infected cells following FACS sorting), we were not able to specifically sort infected macrophages at 4 dpi.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper combines a number of cutting-edge approaches to explore the role of a specific mouse retinal ganglion cell type in visual function. The approaches used include calcium imaging to measure responses of RGC populations to a collection of visual stimuli and CNNs to predict the stimuli that maximally activate a given ganglion cell type. The predictions about feature selectivity are tested and used to generate a hypothesized role in visual function for the RGC type identified as interesting. The paper is impressive; my comments are all related to how the work is presented.

      We thank the reviewer for appreciating our study and for the interesting comments.

      Is the MEI approach needed to identify these cells?

      To briefly summarize the approach, the paper fits a CNN to the measured responses to a range of stimuli, extracts the stimulus (over time, space, and color) that is predicted to produce a maximal response for each RGC type, and then uses these MEIs to investigate coding. This reveals that G28 shows strong selectivity for its own MEI over those of other RGC types. The feature of the G28 responses that differentiate it appears to be its spatially-coextensive chromatic opponency. This distinguishing feature, however, should be relatively easy to discover using more standard approaches.

      The concern here is that the paper could be read as indicating that standard approaches to characterizing feature selectivity do not work and that the MEI/CNN approach is superior. There may be reasons why the latter is true that I missed or were not spelled out clearly. I do think the MEI/CNN approach as used in the paper provides a very nice way to compare feature selectivity across RGC types - and that it seems very well suited in this context. But it is less clear that it is needed for the initial identification of the distinguished response features of the different RGC types. What would be helpful for me, and I suspect for many readers, is a more nuanced and detailed description of where the challenges arise in standard feature identification approaches and where the MEI/CNN approaches help overcome those challenges.

      Thank you for the opportunity for clarification. In fact, the MEI (or an alternative nonlinear approach) is strictly necessary to discover this selectivity: as we show above (response #1 to editorial summary), the traditional linear filter approach does not reveal the color opponency. We realize that this fact was not made sufficiently clear in the initial submission. In the revised manuscript, we now include this analysis. Moreover, throughout the manuscript, we added explanations on the differences between MEIs and standard approaches and more intuitions about how to interpret MEIs. We also added a section to the discussion dedicated to explaining the advantages and limitations of the MEI approach.

      Interpretation of MEI temporal structure

      Some aspects of the extracted MEIs look quite close to those that would be expected from more standard measurements of spatial and temporal filtering. Others - most notably some of the temporal filters - do not. In many of the cells, the temporal filters oscillate much more than linear filters estimated from the same cells. In some instances, this temporal structure appears to vary considerably across cells of the same type (Fig. S2). These issues - both the unusual temporal properties of the MEIs and the heterogeneity across RGCs of the same type - need to be discussed in more detail. Related to this point, it would be nice to understand how much of the difference in responses to MEIs in Figure 4d is from differences in space, time, or chromatic properties. Can you mix and match MEI components to get an estimate of that? This is particularly relevant since G28 responds quite well to the G24 MEI.

      One advantage of the MEI approach is that it allows to distinguish between transient and sustained cells in a way that is not possible with the linear filter approach: Because we seek to maximize activity over an extended period of time, transient cells need to be repetitively stimulated whereas sustained cells will also respond in the absence of multiple contrast changes. In the revised manuscript, we add a section explaining this, together with Figure 3-supplement 2, illustrating this point by showing that oscillations disappear when we optimize the MEI for a short time window. The benefit of a longer time window lies in the increased discriminability between transient and sustained cells, which is also shown in the new supplementary figure.

      Regarding the heterogeneity of MEIs, this is most likely due to heterogeneity within the RGC group: “The mixed non-direction-selective groups G17 and G31 probably contain more than one type, as supported by multiple distinct morphologies and genetic identities (for example, G31,32, Extended Data Fig. 5) or response properties (for example, G17, see below)” (Baden et al. Nature 2016). We added a paragraph in the Results section.

      Concerning the reviewer’s last point: We agree that it is important to know whether the defining feature - i.e., the selectivity for chromatic contrast - is robust against variations in other stimulus properties. New electrophysiological data included in the manuscript (Fig. 6e,f) offers some insights here. We probed G28/tSbC cells with full-field flashed stimuli that varied in chromatic contrast. Despite not matching the cell’s preferred spatial and temporal properties, this stimulus still recovered the cell’s preference for chromatic contrast. While we think it is an interesting direction to systematically quantify the relative importance of temporal, spatial and chromatic MEI properties for an RGC type’s responses, we think this is beyond the scope of this manuscript.

      Explanation of RDM analysis

      I really struggled with the analysis in Figure 5b-c. After reading the text several times, this is what I think is happening. Starting with a given RGC type (#20 in Figure 5b), you take the response of each cell in that group to the MEI of each RGC type, and plot those responses in a space where the axes correspond to responses of each RGC of this type. Then you measure euclidean distance between the responses to a pair of MEIs and collect those distances in the RDM matrix. Whether correct or not, this took some time to arrive at and meant filling in some missing pieces in the text. That section should be expanded considerably.

      We appreciate the reviewer’s efforts to understand this analysis and confirm that they interpreted it correctly. However, we decided to remove the analysis. The point we were trying to make with this analysis is that the transformation implemented by G28/tSbC cells “warps” stimulus space and increases the discriminability of stimuli with similar characteristics like the cell’s MEI. We now make this point in a - we think - more accessible manner by the new analysis about the nonlinearity of G28/tSbC cell’s color opponency (see above).

      Centering of MEIs

      How important is the lack of precise centering of the MEIs when you present them? It would be helpful to have some idea about that - either from direct experiments or using a model.

      In the electrophysiological experiments, the MEIs were centered precisely (now Fig. 5 in revised manuscript) and these experiments yielded almost identical results to the 2P imaging experiments, where the MEIs were presented on a grid to approach the optimal position for the recorded cells. Additionally, all model simulations work with perfectly centered MEIs. We hence conclude that our grid-approach at presenting stimuli provided sufficient precision in stimulus positioning.

      We added this information to the revised manuscript.

      Reviewer #2 (Public Review):

      This paper uses two-photon imaging of mouse ganglion cells responding to chromatic natural scenes along with convolutional neural network (CNN) models fit to the responses of a large set of ganglion cells. The authors analyze CNN models to find the most effective input (MEI) for each ganglion cell as a novel approach to identifying ethological function. From these MEIs they identify chromatic opponent ganglion cells, and then further perform experiments with natural stimuli to interpret the ethological function of those cells. They conclude that a type of chromatic opponent ganglion cell is useful for the detection of the transition from the ground to the sky across the horizon. The experimental techniques, data, and fitting of CNN models are all high quality. However, there are conceptual difficulties with both the use of MEIs to draw conclusions about neural function and the ethological interpretations of experiments and data analyses, as well as a lack of comparison with standard approaches. These bear directly both on the primary conclusions of the paper and on the utility of the new approaches.

      We thank the reviewer for the detailed comments.

      1) Claim of feature detection.

      The color opponent cells are cast as a "feature detector" and the term 'detector' is in the title. However insufficient evidence is given for this, and it seems likely a mischaracterization. An example of a ganglion cell that might qualify as a feature detector is the W3 ganglion cell (Zhang et al., 2012). These cells are mostly silent and only fire if there is differential motion on a mostly featureless background. Although this previous work does not conduct a ROC analysis, the combination of strong nonlinearity and strong selectivity are important here, giving good qualitative support for these cells as participating in the function of detecting differential motion against the sky. In the present case, the color opponent cells respond to many stimuli, not just transitions across the horizon. In addition, for the receiver operator characteristic (ROC) analysis as to whether these cells can discriminate transitions across the horizon, the area under the curve (AUC) is on average 0.68. Although there is not a particular AUC threshold for a detector or diagnostic test to have good discrimination, a value of 0.5 is chance, and values between 0.5 and 0.7 are considered poor discrimination, 'not much better than a coin toss' (Applied Logistic Regression, Hosmer et al., 2013, p. 177). The data in Fig. 6F is also more consistent with a general chromatic opponent cell that is not highly selective. These cells may contribute information to the problem of discriminating sky from ground, but also to many other ethologically relevant visual determinations. Characterizing them as feature detectors seems inappropriate and may distract from other functional roles, although they may participate in feature detection performed at a higher level in the brain.

      The reviewer apparently uses a rather narrow definition of a feature detector. We, however, argue for a broader definition, which, in our view, better captures the selectivities described for RGCs in the literature. For example, while W3 cells have been quite extensively studied, one can probably agree on that so far only a fraction of the possible stimulus space has been explored. Therefore, it cannot be excluded that W3 cells respond also to other features than small dark moving dots, but we (like the reviewer) still refer to it as a feature detector. Or, for instance, direction-selective (DS) RGCs are commonly considered feature detectors (i.e., responsive to a specific motion direction), although they also respond to flashes and spike when null-direction motion is paused (Barlow & Levick J Physiol 1965).

      The G28/tSbC cells’ selectivity for full-field changes in chromatic contrast enables them to encode ground-sky horizon transitions reliably across stimulus parameters (e.g., see new Fig. 7i panel). This cell type is thus well-suited to contribute to detecting context changes, as elicited by ground-sky transitions.

      Therefore, we think that the G28/tSbC RGC can be considered a feature detector and as such, could be used at a higher level in the brain to quickly detect changes in visual context (see also Kerschensteiner Annu Rev Vis Sci 2022). Still, their signals may also be useful for other computations (e.g., defocus, as discussed in our manuscript).

      Regarding the ROC analysis, we acknowledge that an average AUC of .68 may seem comparatively low; however, this is based on the temporally downsampled information (i.e., by way of Ca2+ imaging) gathered from the activity of a single cell. A downstream area would have access to the activity of a local population of cells. This AUC value should therefore be considered a lower bound on the discrimination performance of a downstream area. We now comment on this in the manuscript.

      2) Appropriateness of MEI analysis for interpretations of the neural code.

      There is a fundamental incompatibility between the need to characterize a system with a complex nonlinear CNN and then characterizing cells with a single MEI. MEIs represent the peak in a complex landscape of a nonlinear function, and that peak may or may not occur under natural conditions. For example, MEIs do not account for On-Off cells, On-Off direction selectivity, nonlinear subunits, object motion sensitivity, and many other nonlinear cell properties where multiple visual features are combined. MEIs may be a useful tool for clustering and distinguishing cells, but there is not a compelling reason to think that they are representative of cell function. This is an open question, and thus it should not be assumed as a foundation for the study. This paper potentially speaks to this issue, but there is more work to support the usefulness of the approach. Neural networks enable a large set of analyses to understand complex nonlinear effects in a neural code, and it is well understood that the single-feature approach is inadequate for a full understanding of sensory coding. A great concern is that the message that the MEI is the most important representative statistic directs the field away from the primary promise of the analysis of neural networks and takes us back to the days when only a single sensory feature is appreciated, now the MEI instead of the linear receptive field. It is appropriate to use MEI analyses to create hypotheses for further experimental testing, and the paper does this (and states as much) but it further takes the point of view that the MEI is generally informative as the single best summary of the neural code. The representation similarity analysis (Fig. 5) acts on the unfounded assumption that MEIs are generally representative and conveys this point of view, but it is not clear whether anything useful can be drawn from this analysis, and therefore this analysis does not support the conclusions about changes in the representational space. Overall this figure detracts from the paper and can safely be removed. In addition, in going from MEI analysis to testing ethological function, it should be made much more clear that MEIs may not generally be representative of the neural code, especially when nonlinearities are present that require the use of more complex models such as CNNs, and thus testing with other stimuli are required.

      The reviewer correctly characterizes MEIs as representing the peak in a nonlinear loss landscape that, in this case, describes the neurons’ tuning. As such, the MEI approach is indeed capable of characterizing nonlinear neuronal feature selectivities that are captured by a nonlinear model, such as the CNN we used here. We therefore disagree with the suggestion that MEIs should not be used “when nonlinearities are present that require the use of more complex models such as CNNs”. It is unclear what other “analysis of neural networks” the reviewer refers to. One approach to analyze the predictive neural network are MEIs.

      We also want to clarify that, while the reviewer is correct in stating that the MEI approach as used here only identifies a single peak, this does not mean that it cannot capture neuronal selectivities for a combination of features, as long as this combination of features can be described as a point in high-dimensional stimulus space. In fact, this is demonstrated in our manuscript for the case of G28/tSbC cell’s selectivity for large or full-field, sustained changes in chromatic contrast (a combination of spatial, temporal, and chromatic features). While approaches similar to the one used here generate several diverse exciting inputs (Ding et al. bioRxiv 2023) and could therefore also fully capture On-Off selectivities, we pointed out the limitation of MEIs when describing On-Off cells in the manuscript (both original and revised).

      Regarding the reviewer’s concern that “[...] the message that the MEI is the most important representative statistic [...] takes us back to the days when only a single sensory feature is appreciated”. It was certainly not our intention to proclaim MEIs as the ultimate representation of a cell’s response features and we have clarified this in the revised manuscript. However, we also think that (i) in applying a nonlinear method to extract chromatic, temporal, and spatial response properties from natural movie responses, we go beyond many characterizations that use linear methods to extract spatial or temporal only, achromatic response properties from static, white-noise stimuli. This said, we agree that (ii) expanding around the peak is desirable, and we do that in an additional analysis (new Fig. 6); but that reducing complexity to a manageable degree (at least, at first) is useful and even necessary when discovering novel response properties.

      Concerning the representational similarity analysis (RSA): the point we were trying to make with this analysis is that the transformation implemented by G28 “warps” stimulus space and increases the discriminability of stimuli with similar characteristics like the cell’s MEI. We now made this point in a more accessible fashion through the above-mentioned analysis, where we extended the estimate around the peak. We therefore agree to remove the RSA from the paper.

      In the revised manuscript, we (a) discuss the advantages and limitations of the MEI approach in more detail (in Results and Discussion; see also our reply #1) and (b) replaced the RSA analysis.

      3) Usefulness of MEI approach over alternatives. It is claimed that analyzing the MEI is a useful approach to discovering novel neural coding properties, but to show the usefulness of a new tool, it is important to compare results to the traditional technique. The more standard approach would be to analyze the linear receptive field, which would usually come from the STA of white noise measurement, but here this could come from the linear (or linear-nonlinear) model fit to the natural scene response, or by computing an average linear filter from the natural scene model. It is important to assess whether the same conclusion about color opponency can come from this standard approach using the linear feature (average effective input), and whether the MEIs are qualitatively different from the linear feature. The linear feature should thus be compared to MEIs for Fig. 3 and 4, and the linear feature should be compared with the effects of natural stimuli in terms of chromatic contrast (Fig. 6b). With respect to the representation analysis (Fig. 5), although I don't believe this is meaningful for MEIs, if this analysis remains it should also be compared to a representation analysis using the linear feature. In fact, a representation analysis would be more meaningful when performed using the average linear feature as it summarizes a wider range of stimuli, although the most meaningful analysis would be directly on a broader range of responses, which is what is usually done.

      We agree that the comparison with a linear model is an important validation. Therefore, we performed an additional analysis (see also reply #1, as well as Fig. 6 and corresponding section in the manuscript) which demonstrates that an LN model does not recover the chromatic feature selectivity. This finding supports our claims about the usefulness of the MEI approach over linear approaches.

      Regarding the comment on the representation analysis, as mentioned above, we consider it replaced by the analysis comparing results from an LN model and a nonlinear CNN.

      4) Definition of ethological problem. The ethological problem posed here is the detection of the horizon. The stimuli used do not appear to relate to this problem as they do not include the horizon and only include transitions across the horizon. It is not clear whether these stimuli would ever occur with reasonable frequency, as they would only occur with large vertical saccades, which are less common in mice. More common would be smooth transitions across the horizon, or smaller movements with the horizon present in the image. In this case, cells which have a spatial chromatic opponency (which the authors claim are distinct from the ones studied here) would likely be more important for use in chromatic edge detection or discrimination. Therefore the ethological relevance of any of these analyses remains in question.

      It is further not clear if detection is even the correct problem to consider. The horizon is always present, but the problem is to determine its location, a conclusion that will likely come from a population of cells. This is a distinct problem from detecting a small object, such as a small object against the background of the sky, which may be a more relevant problem to consider.

      Thank you for giving us the opportunity to clear these things up. First, we would like to clarify that we propose that G28/tSbC cells contribute to detecting context changes, such as transitions across the horizon from ground to sky, not to detecting the horizon itself. We acknowledge that we were not clear enough about this in the manuscript and corrected this. To back-up our hypothesis that G28 RGCs contribute to detecting context changes, we performed an additional simulation analysis, which is described in our reply #3 (see above).

      5) Difference in cell type from those previously described. It is claimed that the chromatic opponent cells are different from those previously described based on the MEI analysis, but we cannot conclude this because previous work did not perform an MEI analysis. An analysis should be used that is comparable to previous work, the linear spatiotemporal receptive field should be sufficient. However, there is a concern that because linear features can change with stimulus statistics (Hosoya et al., 2005), a linear feature fit to natural scenes may be different than those from previous studies even for the same cell type. The best approach would likely be presenting a white noise stimulus to the natural scenes model to compute a linear feature, which still carries the assumption that this linear feature from the model fit to a natural stimulus would be comparable to previous studies. If the previous cells have spatial chromatic opponency and the current cells only have chromatic opponency in the center, there should be both types of cells in the current data set. One technical aspect relating to this is that MEIs were space-time separable. Because the center and surround have a different time course, enforcing this separability may suppress sensitivity in the surround. Therefore, it would likely be better if this separability were not enforced in determining whether the current cells are different than previously described cells. As to whether these cells are actually different than those previously described, the authors should consider the following uncited work; (Ekesten Gouras, 2005), which identified chromatic opponent cells in mice in approximate numbers to those here (~ 2%). In addition, (Yin et al., 2009) in guinea pigs and (Michael, 1968) in ground squirrels found color-opponent ganglion cells without effects of a spatial surround as described in the current study.

      First of all, we did not intend to claim to have discovered a completely new type of color-opponent tuning in general; what we were trying to say is that tSbC cells display spatially co-extensive color opponency, a feature selectivity previously not described in this mouse RGC type, and which may be used to signal context changes as elicited by ground-sky transitions.

      Concerning the reviewer’s first argument about a lack of comparability of our results to results previously obtained with a different approach: We think that this is now addressed by the new analysis (new Fig. 6), where we show why linear methods are limited in their capability to recover the type of color opponency that we discovered with the MEI approach.

      Regarding the argument about center-surround opponency, we agree that “if the previous cells have spatial chromatic opponency and the current cells only have chromatic opponency in the center, there should be both types of cells in the current data set”. We did not focus on analyzing center-surround opponency in the present study, but from the MEIs, it is visible that many cells have a stronger antagonistic surround in the green channel compared to the UV channel (see Fig. 4a, example RGCs of G21, G23, G24; Figure 3-supplement 1 example RGCs of G21, G23, G24, G31, G32). Importantly, the MEIs shown in Fig. 4a were also shown in the verification experiment, and had G28 RGCs preferred this kind of stimulus, they would have responded preferentially to these MEIs, which was not the case (Fig. 4f).

      It should also be noted here that, while the model’s filters were space-time separable, we did not impose a restriction on the MEIs to be space-time separable during optimization. However, we analyzed only the rank 1 components of the MEIs (see Methods section Validating MEIs experimentally). since our analysis focused on aspects of retinal processing not contingent on spatiotemporal interactions in the stimulus.

      In summary, we are convinced that our finding of center-opponency in G28 is not an artifact of the methodology.

      We discuss this in the manuscript and add the references mentioned by the reviewer to the respective part of the Discussion.

      Reviewer #3 (Public Review):

      This study aims to discover ethologically relevant feature selectivity of mouse retinal ganglion cells. The authors took an innovative approach that uses large-scale calcium imaging data from retinal ganglion cells stimulated with both artificial and natural visual stimuli to train a convolutional neural network (CNN) model. The resulting CNN model is able to predict stimuli that maximally excite individual ganglion cell types.

      The authors discovered that modeling suggests that the "transient suppressed-by-contrast" ganglion cells are selectively responsive to Green-Off, UV-On contrasts, a feature that signals the transition from the ground to the sky when the animal explores the visual environment. They tested this hypothesis by measuring the responses of these suppressed-by-contrast cells to natural movies, and showed that these cells are preferentially activated by frames containing ground-to-sky transitions and exhibit the highest selectivity of this feature among all ganglion cell types. They further verified this novel feature selectivity by single-cell patch clamp recording.

      This work is of high impact because it establishes a new paradigm for studying feature selectivity in visual neurons. The data and analysis are of high quality and rigor, and the results are convincing. Overall, this is a timely study that leverages rapidly developing AI tools to tackle the complexity of both natural stimuli and neuronal responses and provides new insights into sensory processing.

      We thank the reviewer for appreciating our study.

    1. Author Response

      Reviewer #3 (Public Review):

      This manuscript uses ASO to inhibit the self-cleaving ribozyme within CPEB intron 3 and test its effect on CPEB3 expression and memory consolidation. The authors conclude that the intronic ribozyme negatively affects CPEB3 mRNA splicing and expression, and suggests its implications for experience-induced gene expression underlying learning and memory.

      The strength of the manuscript is in its exploration of a potentially novel mechanism of regulating CPEB3 expression in learning and memory, a combination of both biochemical and behavioral approaches to gain a wide perspective of this regulatory mechanism, and the application of ASO in this context. The introduction is sufficiently detailed. Statistics are thorough and appropriate. If the results could be more robust, the mechanism would provide a novel target and venue to modify learning and memory paradigm.

      The weakness of the manuscript is that the magnitude of the activity-dependent regulation of ribozyme, the effects of ASOs on CPEB3 expression (mRNA and protein) and downstream target gene expression, in vitro and in vivo, are generally weak, raising concerns about the robustness of the result. This may have caused some of the inconsistencies between the data presentation (see below). Also unclear is whether the ribozyme activity is physiologically regulated by experience without ASO interference.

      While the statistics tests support corresponding figure panels and their conclusions. The manuscript can be significantly strengthened by additional evidence, clarification of some methodologies, and reconciling some inconsistent results.

      The premise of a comparable timescale between transcription and ribozyme activity as the foundation of the whole thesis was based on in vitro measurement of self-scission half-life and a broadly generalized transcription rate (which actually varies significantly between genes). This premise is weak and needs direct experimental support.

      The physiological relevance of the proposed mechanism has yet to be demonstrated without ASO interference.

      Fig2b: how were total and uncleaved Ribozymes measured by qRT-PCR? Where are the primers' locations? If the two products were amplified using different primers, their subtraction to derive % cleavage would not be appropriate.

      We thank the reviewer for the thoughtful review. We measured the levels of the total ribozyme by measuring a 220-bp amplicon that starts 18 nts downstream from the ribozyme cleavage site. The uncleaved ribozyme levels were measured using oligos that amplify a region of the intron that starts 45 nts upstream and ends 238 nts downstream of the ribozyme cleavage site. We added this information to the Table of primers in the manuscript. For all PCR oligos we established independent standard curves and calculated RNA levels independently of other amplicons, as noted in the Methods section and now specified in the Results section as well (Page 15). The measurements were thus appropriate for the calculation of the cleaved ribozyme fractions in the various experiments. The fraction ribozyme cleaved was calculated from the uncleaved fraction as the difference between uncleaved fraction and unity (1 – fraction uncleaved), now specified on page 16 of the manuscript. Fraction uncleaved was calculated as [uncleaved ribozyme]/[total ribozyme], as was done previously (see Salehi-Ashtiani et al. Science 313:1788-1792 or Webb et al. Science 326:953).

      Line 400-403: shouldn't ribozyme-blocking ASO prevent ribozyme self-cleavage, and as a result should further increase ribozyme levels? This would contradict the result in fig3a.

      We showed that the ribozyme is inhibited in vitro (Fig. 1F and 1G) and all our data are consistent with ASO inhibition of the ribozyme in cellulo and in vivo. However, we do not have direct evidence for this ribozyme inhibition in vivo, because such an experiment would require a single-molecule FRET-type sensitivity in cells and this assay has not been developed for ribozyme cleavage in cellulo or in vivo. We measured the ribozyme levels by RT-qPCR and observed lower ribozyme levels in presence of ASO in cultured neurons (Fig. 3A) as well as in vivo (Fig. 5B), which is nominally in contrast to the observations in vitro. However, in these situations we do not measure the co-transcriptional fate of the intron or the ribozyme; rather, we measure the levels of the intron after splicing (evidenced by the increased levels of spliced exons 2–3) when the intron is likely already being degraded. We also do not know what effect the ribozyme ASO has on the intron stability once splicing occurs. Understandably, this is a weakness of the study—and we are fully open about this result— however, given the abundance of evidence that the ribozyme ASO leads to increase of CPEB3 mRNA under all conditions tested, we feel that there is strong, if indirect, evidence that our model for the ribozyme function is correct. Future studies will examine this issue closer, but a definitive experimental investigation for the mechanism and timing of ribozyme inhibition and intron degradation is out of scope of this study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Weakness: Although the cross-links stimulate ATP hydrolysis, further controls are needed to convince me that the TM1 conformations observed in the structures are physiologically relevant, since they have been trapped by "large" substrates covalently-tethered by crosslinks.

      Our response: Reviewer 1 raised concerns about the relatively large size of our covalently attached AAC substrate that would potentially distort TM1 in Pgp. We would like to clarify that AAC has a molecular weight of 462 Da, which, in comparison to many known Pgp substrates ranging from 250 to over 1,000 Da, is not a large compound. For instance, the few other Pgp substrates mentioned in our manuscript all have a comparable or larger size: verapamil, 455 Da; doxorubicin, 544 Da; FK506, 804 Da; valinomycin, 1,111 Da; cyclosporin A, 1,203 Da.

      Furthermore, AAC was strategically attached to a site distant from TM1 in the inwardfacing Pgp conformation. After it was exported to the outward-facing state, several TM helices accommodate the compound. The observation that only TM1 exhibited significant conformational changes suggests its potential role in the transport mechanism. This hypothesis is supported by our findings, where a conservative substitution (G72A) in TM1 resulted in a dramatic loss of transport function for various drug substrates and impaired verapamil-stimulated ATPase activity.

      Reviewer 1 (Recommendations for the Authors):

      I understand the need for an unconventional approach to understanding the translocation pathway. What would help to support this model is to cross-link a much smaller substrate, as the one used is quite large and could potentially distort TM1 in the outward-state when cross-linked.

      Our response: We thank the reviewer for this recommendation, and we have outlined plans for future experiments involving other substrates, including smaller ones, to further investigate our proposed model. However, it is important to acknowledge that conducting these studies will require a significant amount of effort and resources, which we believe extend beyond the scope of our current manuscript.

      In unbiased MD simulations starting from the IF state are there any simulations where the substrate follows the same path as proposed here?

      Our response: All our MD simulations were performed in the outward-facing state to focus on potential substrate release pathways. Starting MD simulations from the inwardfacing state would introduce complexities in capturing the necessary domain motions and nucleotide binding and hydrolysis required for substrate translocations. Therefore, we opted not to perform MD studies starting from the inward-facing state.

      Reviewer 2 (Public Review):

      Weakness: There is much to like about the experimental work here but I am less sanguine on the interpretation. The main idea is to covalently link via disulfide bonds a model tripeptide substrate under different conditions that mimic transport and then image the resulting conformations. The choice of the Pgp cysteine mutants here is critical but also poses questions regarding the interpretation. What seems to be missing, or not reported, is a series of control experiments for further cysteine mutations.

      Our response: Reviewer 2 raised concerns about the interpretation of our results and suggested the need for additional mutant designs to validate our proposed TM1 mechanism. Firstly, we believe that the observed TM1 conformational changes are valid in our cryoEM structures, despite the use of different conditions and several mutants to capture Pgp in the outward-facing state.

      Regarding the G72A mutant, we consider it conclusive that this single point mutation in the TM1 has a profound effect. Importantly, the G72A mutant was readily expressed and purifiable as a stable protein. We were able to resolve a high-resolution structure of the G72A mutant (without the substrate), confirming that the protein is not generally destabilized but properly folded.

      Above all, we appreciate the Reviewer’s suggestion to explore additional mutations and intend to do so in future studies.

      Reviewer 2 (Recommendations for the Authors):

      I am sold on the results regarding TM1 conformational changes as they are evident in the cryoEM structures. However, the set of states compared between mutants are not biochemically equivalent: for 335 and 978 they used an ATP-impaired Pgp whereas for 971 they used what appears to be WT, and the conformation was imaged presumably subsequent to ATP hydrolysis and Vanadate trapping. This is significant if the authors were unable to trap the OF in the impaired mutant background and should be highlighted. I have to believe that they tried that condition but I could be wrong.

      Our response: We acknowledge the point made by the Reviewer about the biochemical equivalence of mutant states and the potential significance of using an ATP-impaired mutant for trapping the outward-facing conformation of 971. We have not yet attempted to use the ATPase-deficient 971C mutant for crosslinking and intend to address this question in future studies.

      In our current approach, we used the ATPase-active 971C for two specific reasons:

      1) Our biochemistry data, as shown in Fig 1C, indicates that 971C only crosslinks in the presence of ATP hydrolysis conditions. Vanadate trapping was employed to stabilize the outward-facing conformation.

      2) Based on our experience, we have observed that the conformations of ATP-bound (mutant) and vanadate-trapped states of an ABC transporter are structurally equivalent at this resolution level of our study (see ref. 21: Hoffmann et al. NATURE 2019).

      The authors propose a new model for substrate translocation. It is based on three mutants and a number of structures. If the authors were not challenging the current dogma I would not have written the next comment. Considering the impact of the findings, I would have designed a couple more cysteine mutants based on their model. For instance, this pathway has a number of stabilizing interactions, can't they make a mutant that preserves conformational switching but eliminates substrate translocation? I like the G97A mutant result but I am worried that the effect could just be a general destabilization or misfolding as part of the cryoEM particles seem to suggest. The authors advance one interpretation of the disorder observed in this mutant but it could easily be my interpretation.

      Our response: We thank the reviewer for the suggestion to design additional mutants to further validate our proposed model for substrate translocation. We agree that this would be highly valuable, considering the potential impact of our findings. However, given the time-intensive nature of our approach, we believe that presenting these additional designs in a future study is a reasonable course of action.

      Regarding the G72A mutation, we believe that our current data fully supports our model and the role of TM1 in regulating the Pgp activity. Importantly, we would like to emphasize that the G72A mutant was readily expressed and purifiable as a stable protein. Additionally, our cryoEM structural determination of the G72A mutant at high resolution confirmed that the protein is not generally destabilized but properly folded.

      There are a couple of troubling methodological questions that I want the authors to address or clarify:

      1. In the methods they report that the final sample for cryoEM was prepared on a SEC devoid of detergent. It is obvious that the sample was folded but I was wondering why the detergent was removed? Was that critical for observing these structures with multiple ligands? Did they observe any lipids in their cryoEM?

      Our response: We avoid detergent in the buffer on final SEC purification. This step is to remove free detergent from the background which helps during cryoEM imaging. Of course, this cannot be done with every detergent but due to the very low CMC of LMNG it is possible. By now, we have verified this method for several other transporters with the same success. While this procedure helps us to obtain better images it is not necessary to obtain specific conformations or ligand bound states, nor does it affect these states or conformations.

      In our cryoEM structures , we did observe multiple cholesterol hemisuccinate (CHS) molecules on the outer transmembrane surface of Pgp.

      1. Can the authors comment on why labeling was carried out in the presence of ATP? Does it matter if the substrate was added prior to ATP and incubated for a few minutes?

      Our response: For every dataset, we first added the substrate to be cross-linked and afterwards added the ATP. In the cases of 335C and 978C, labeling was successful before ATP was added, as evidenced by the inward-facing structures with cross-linked substrate. However, for 971C, cross-linking only occurred after the addition of ATP. We interpret this data to suggest that the 971 site is inaccessible to the substrate in the inward-facing state, and cross-linking can only occur after the transporter transitions to outward-facing state. This is in line with our inward-facing structure which does not show a cross-linked substrate, and our biochemical data shown in Fig 1C, where 971C only crosslinked in the presence of ATP.

      1. I am not an expert on MD simulations and I understand that carrying out simulations at higher temperatures used to be a trick to accelerate the process. Is this still necessary? Why didn't the author use approaches such as WESTPA?

      Our response: Most so-called enhanced sampling methods, including WESTPA, explicitly define a reaction coordinate for the process of interest, usually based on intuition or prior studies. If this coordinate is chosen poorly, enhanced sampling usually fails, either because the sampling becomes inefficient or because the sampling biases the transition pathway (or both). Lacking reliable intuition or prior knowledge on which motions would result in substrate release, we chose temperature to speed up the process. High temperature largely avoids the introduction of an any bias through the definition of a progress coordinate. By contrast, the weighted ensemble method underlying WESTPA is a great method to simulate unbiased dynamics of a process with a known progress coordinate, but unfortunately requires to choose a progress coordinate prior to the simulation and will then mostly sample the process along this progress coordinate, because this is the only direction in which sampling is improved. High temperature MD on the other hand accelerates all processes in the system under study. Indeed, we have now confirmed that the pathway found at high temperature is also feasible at near-ambient conditions.

      In new simulations, we have now observed a similar release pathway at T=330 K. As the only difference, the substrate has not fully dissociated from the protein after 2.5 us, with weak interactions persisting at the top part of TM1 from the extracellular side. Importantly, this is a configuration observed also in higher temperature simulations but with much shorter lifetime.

      In response, we now included these new findings and a new Extended Data Fig. 15 in the revised manuscript.

      1. One way to show that the two substrates binding mode is biochemically relevant is to measure Vmax at different substrate concentrations. One would expect a cooperative transition if that interaction is mechanistically important.<br /> Our response: We have measured Vmax as a function of QZ-Ala concentration in a previous report (ref. 24), supporting positive cooperativity for binding to two sites.

      Reviewer 3 (Public Review):

      We thank Reviewer 3 for recommending the acceptance of our manuscript as is.

      Reviewer 3 (Recommendations for the Authors):

      Page 4, last line: Pgp302 should be Pgp1302. In addition, I can only encourage the authors to add an additional table to the manuscript. Here, the mutation, the obtained structure(s), IF or OF, the resolution, and the main message should be summarized.

      Our response: Following the reviewer’s suggestion, we have added Extended Data Table 2 summarizing the Pgp mutants and respective structural data in the revised manuscript.<br /> We verified that Pgp302 is the correct term on Page 4, last line.

      Pg. 5, section 'Covalent ligand design for Pgp labeling', it is mentioned that even in the presence of Mg2+ATP, Pgp302 could not react with AAC-DNPT. Maybe it would be worthwhile to add the data either in Supplementary Information or state 'data not shown'.

      Our response: We stated ‘data not shown’ in the text.

      Pg. 47, last line : A space is missing between M68, and M74.

      Our response: Space was added.

      Pg. 7, line 2: The authors mention that a single dataset of ATP-bound Pgp335 revealed three different OF conformations: ligand-free, single-ligand-bound, and double-ligandbound. However, the percentage fraction of each dataset sums up to be more than 100%. Would request the authors to recalculate the fraction size of each conformation.

      Our response: We have corrected the error in our calculation, based on the particle distribution in our dataset (OF335-nolig: 1,437,110 particles, 40.4%; OF335-1lig: 1,184,253 particles, 33.3%; and OF335-2lig: 939,924 particles, 26.4%).

      Pg 53, Figure legend of Extended Data Fig. 11: Please include the color coding for the helix TM1 and also the residues colored plum.

      Our response: We added the color coding for TM1 and other residues in the figure legend.

      Pg. 8, line 3: While referring to the structure of OF971-1lig, the authors nicely point towards the conserved residues M74 and F78 which coordinate the ligand. However, in Fig. 3b, residues M74 and F78 should also be indicated.

      Our response: We updated Fig. 3b by adding arrows pointing towards the residues M74 and F78.

      Pg. 54, Extended data Fig. 12: The authors should adopt a single writing style. In some places, Pgp is referred to as P-gp while in others as Pgp.

      Our response: We updated the protein labels in Extended Data Fig. 12.

      Pg. 54, Extended data Fig. 12: The authors should clearly mention which OF335 structure (1st panel) was used for visualizing the interactions.

      Our response: To clarify, we added the following sentences in the figure legend: “Pgp335 OF in the top panel refers to OF335-1lig. In the bottom panel describing OF335-2lig, the left and right diagrams refer to the binding positions of non-covalent and covalent ligand, respectively”.

      Pg. 18, section 'synthesis of dipeptide 8': In the text it is mentioned that for the synthesis of thiazole acid 6, compound 3 was dissolved in a mixture of THF/MeOH/H2O (3:1:1), while in the corresponding figure (Extended Data Fig. 1), the ratio is stated as 5:1:2.

      Our response: 3:1:1 ratio is correct. We made the correction in Extended Data Fig. 1.

      Pg. 19, section 'synthesis of linear tripeptide 10': Same as above for compounds 10 and 4, respectively.

      Our response: We corrected the conditions in the Extended Data Fig. 1 accordingly.

      Pg. 20, section 'Synthesis of cyclic peptide 11': There seems to be a discrepancy in the synthesis protocol between the text and the extended figure 1, especially regarding the use of THF/MeOH/H20, followed by NaOH and TFA or only NaOH and TFA.

      Our response: we further clarified the conditions of using NaOH in THF/MeOH/H2O (3:1:1) and TFA in DCM in the text for synthesis and Extend Data Fig. 1.

      Pg. 40, Extended Data Fig. 1: In the bottom last panel showing the synthesis of peptide 11, the authors have missed showing peptide 10 as the starting material for the reaction.

      Our response: Label for the peptide 10 was added following the suggestion.

      Pg. 26, third last line: 'o' is missing from the last word cry'o'

      Our response: We corrected the typo.

      Pg. 63 and 64, Extended Data Table 1: The Cryo-EM data collection, refinement, and validation statistics for OF971-1lig, IF971-1lig, OF978-1lig, and IF978-2lig are mentioned twice in the table.

      Our response: This was now corrected in the revision.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      The authors have addressed my recommendations in the previous review round in a satisfactory way. I only have one additional comment to the authors:

      In the manuscript abstract lines 31-32, the author state that: "Using NIH data for the period 2006-2022, we report that ~230 K99 awards were made every year, representing ~$25 million annually."-- The "~$25 million" is under-stating the actual funds spent because this sum is just money spent on the first year of some k99s while the NIH is paying years 2,3,4 etc for others for k99 awards (~90% conversion rate to R00) awarded in previous years for a given year. The NIH is actually spending ~$230-$250 million a year on the k99 award mechanism in a given year. so the authors need to amend the stated amount in the manuscript.

      Thank you for pointing this out. The reviewer is correct, that we had incorrectly only calculated the investment $ in new K99 awards made. We have corrected this in the revised manuscript. We appreciate your careful reading of our manuscript and the edits made based on your comments have improved the final version.

      Reviewer #2 (Recommendations for the Authors):

      Thank you for taking the time to revise this important work. I learned a lot reading this paper a second time, and appreciate the improvements you have made.

      My only major thought while re-reading this is that I wish you all had written two papers! I see two themes in this work: one looking at faculty hiring networks from the Wapman et al. dataset, and another at K99/R00 conversions by institution, gender, and researcher mobility and its impact on subsequent funding success. After reading, I felt like I had many follow-up questions about both analyses, but it would be impractical for me to suggest all these follow-up analyses without making your paper unreasonably long.

      Thank you for these comments. We agree that there are 2 general themes in this paper. While we feel that significantly expanding on both themes will be important in future research. Our hope is that this work continues to inspire others to critically examine funding practices and inequity in the same way that the work of Wapman, Pickett, etc. inspired the present work.

      For example, regarding the results that more R00 are activated at different institutions, and that moving institutions improves subsequent funding success, I wonder: Do proportionally more women or men move institutions? Do proportionally more K99 awardees at less-funded places move for their R00, or less? The Cox proportional hazard models illustrate the impact of various characteristics on subsequent funding success, but they do not illustrate disparate impacts of mobility on different groups (if I am understanding them correctly). (You sort of dive into these questions in the very interesting subsection, "K99/R00 awardee self-hires are more common at institutions with top NIH funding." I wanted to read more!)

      Thank you for these kind comments. These are fantastic follow-up questions. We do not feel that we can adequately address them within the present manuscript without potentially splitting it into 2 separate manuscripts. However, we may examine these in future analyses. We are particularly interested in examining additional aspects such as how the K99 MOSAIC funding mechanism may differ from the traditional K99 mechanism. Since the K99 MOSAIC mechanism is newer, there may not be enough K99 MOSAIC awards made for a thorough exploration.

      As another example, for your analysis on faculty hiring networks, the prevalence of self-hiring amongst institutions and regions was one finding. However, this finding seems somewhat at odds with the previous takeaway about how researcher mobility improves subsequent funding success. Are institutions doing themselves a disfavor by hiring their own, then? I suspect there is more to say here about this pattern... maybe there are important differences between PhD institution and postdoc institution and its impact on hiring/subsequent funding success? Or is this a story about upward mobility into the top 25 well-funded NIH institutions?

      Again, these are very insightful comments and follow-up questions. We hope to address these in potential future manuscripts. We also hope that others may become interested in finding answers to these questions by exploring our dataset as well as other publicly available datasets such as the Wapman et al. dataset.

      I can completely understand how combining the faculty hiring network analysis with the K99/R00 conversions would seem like a natural fit, but I personally feel - emphasis on this being a personal opinion - that there would have been benefits to giving more space to the details of both analyses separately. Perhaps this is a "hindsight is 20/20" issue. Or an issue with the current times in which ones' brain can only hold so many main takeaways from a single body of work. (For example, I struggled to summarize your paper in my public review because I find so many takeaways important.)

      I suppose this is all to say that I find your work important enough to warrant additional follow-up work! :)

      Thank you for these very kind remarks. This work evolved over 8-10 months as evidenced by the updates to the biorXiv preprint. With unlimited time and foresight, it would probably be best to have separated the 2 themes into separate manuscripts and expanded both. Given current constraints, we plan to make some changes/updates to the present manuscript and hopefully include more in-depth analyses on each theme in future works. Thank you again for the thoughtful reading and critique of both our original manuscript and the revised version.

      Minor comments/questions:

      "K99 to R00 conversions are increasing in time"

      • Assuming I am interpreting the figures correctly, in my opinion, the most important takeaway is that the number of R00 awards have increased, but only for awardees moving to another institution. This key result, best illustrated by panels A and C of Figure 1, is buried in the long paragraph in this section. The organization of content in this section could be improved and more focused. Consider renaming this subsection to be more declarative: "K99 tR00 conversions have increased, but only for awardees moving to another institution."

      This is a very concise interpretation of this data. We have edited the paragraph referenced by the reviewer, split it into 2 paragraphs, and changed the title to “K99 awardees increasingly move to other institutions for R00 awards from 2008 to 2022” and the final sentence to “Thus, the number of K99 to R00 conversions is consistent over time, but increasingly more R00 awardees have moved to other institutions since 2013”

      • Similarly, I personally found the current title of the subsection, "K99 to R00 conversions are increasing with time" is mildly confusing. An R00 award indicates a successful conversion, so why not simply call this an R00 award instead of saying K99-to-R00 conversion? Also, when I look at Figure 1B and exclude the conversion rates for 2007 and 2008 (because this is a 3 year rolling average), I see that conversion rates (or R00 awards) have remained stagnant. This comment is very much in-the-weeds and is mainly to do with clarity of language.

      Thank you for these comments. We had “K99 to R00 conversion” to highlight the unique nature of this award mechanism that a person can only receive an R00 if they previously had a K99 award. Nevertheless, we have edited the text to “R00 awards” and “R00 awardees” to simplify things. We also want to note that we did not compute a 3-year rolling average. The function we used was: (X/(Y -1))x100 where X is the number of R00 awards made in a year and Y is the number of K99 awards made in a year. We did note an error in our calculation in the previous version of the manuscript. Previously, we included all R00 awards and K99 awards for each year from the NIH Reporter dataset; however, this is a flawed methodology. NIH reporter includes only extramural K99 award data and extramural R00 awards, but intramural K99 awardees can receive extramural R00 awards and thus are only included in the R00 dataset. There were 141 R00 awardees in our dataset from NIH Reporter that did not have K99 data, so we assume these are intramural K99 awards since it is required to have a K99 to be eligible for the R00 award. Since we do not know the awarding year for intramural K99 awardees or have data on intramural K99 awardees that fail to activate the R00 award (or stay internal at NIH), we have excluded these 141 R00 awardees. In the previous version, this mis-calculation exaggerated rolling conversion rate (we had correctly calculated the 78% total conversion rate). We re-analyzed our rolling conversion rate and found the average is 81.8% (excluding the first 2 years of the K99 program and the last 2 years).

      This is a long explanation, but essentially, we overestimated the number of R00 awards which inadvertently increased the rolling conversion rate. We have corrected this and simplified the first 2 paragraphs of the Results section.

      • I was also mildly confused looking at Figure 1c. The caption says that the percentages represent the K99 awardees that stayed at the same institution for the R00 activation, but the percentages are next to the solid circles which the legend labels as "different institution." Perhaps another or different way to show this is a stacked bar chart, where one bar represents the percentage of R00 awards activated at the same institution and another bar represents the percentage of R00 awards activated at a different institution. The bars always add to 100% but the change in proportions illustrates that proportionally fewer awards are being made to those remaining at the same institution.

      Great idea. We have included a stacked bar chart here. Since the stacked bar chart is percentages, we felt it was important to also show the total numbers so we still included the previous chart also but removed the percentage numbers from it. We also changed the departmental analysis to stacked bar charts. This shows the stark difference between 2008-2012 and 2013 onward. These changes were made in the revised Fig. 1.

      • Minor question: I would love to see Table 3 and Table 4 as a time-series. Has the proportion of recipients at various institution types changed with time?

      This is a great suggestion and we felt it fit best in Figure 5, so we’ve added it there.

      • Table 3 is useful but only indirectly addresses my first "Recommendation to the Authors" from my previous review. I did some number crunching myself from the data provided. Assuming I did this correctly: If you're a K99 awardee at a private institute, you had a 76.3% change of getting an R00 compared to 80.4% for a K99 awardee at a public institution. If you're a K99 awardee at a top-funded institution, you had a 76.8% chance of R00 compared to 78.6% for a lower-funded institution. I would have liked to see more figures and tables to illustrate conversion rates by institution type in this way. Interestingly, to me, these data suggest that there are not enormous conversion rate differences by institution type (though looking at these now, I am confused at the 89% statistic in line 174 and where that comes form, since it is much higher than what I've calculated).

      Thank you for this suggestion and these comments. Please see above where we describe how we incorrectly overestimated the 89% statistic. This has been corrected. As the reviewer suggested, we now show yearly percent of grants to specific institution types in the revised Figure 5. We agree with the reviewer that showing the conversion rate by institution type is interesting; however, it is fairly obvious from the new panels in Figure 5 that there is not much difference in conversion rate. Thus, to avoid crowding too many panels into the figure, we opted to keep the stacked bar plot.

      Reviewer #3 (Recommendations for the Authors):

      -One minor change to Figure 1C would be to switch the color coding for the lines so that they match with 1D whereby "same institution" would be white circles, or whatever the authors decide would be best for consistency since they are similar comparisons.

      Thank you for this suggestion. We have corrected this to be consistent.

      -Minor note for lines 459-461: I would suggest changing the wording to "intersectional inequalities" as it is not that a scientist's identities impact their careers as much as how those identities are positioned within an unequal opportunity structure and differentially treated that produce varying career trajectories and experiences of marginalization and cumulative (dis)advantages.

      Thank you and we agree with you. We have made this correction.

      -To carry forward a suggestion for the authors in my previous review, future research that more fully explores the research infrastructure of institutions for how top NIH funded institutions continue to be top funded institutions year after year could help clarify some of the career mobility and same/similar institution hiring found in the data. Rather than hand coding institutions for some of the infrastructure, the National Center for Education Statistics' Integrated Postsecondary Education Data System (IPEDS) has data on colleges and universities including whether they operate a hospital, have a medical degree, and many other interesting data about student and faculty demographics, institutional expenditures (including research budgets), and degrees awarded in different fields of study (undergrad and grad) that may be helpful to the authors as they continue their research stream in this area.

      Thank you very much. We will look into this data set as we continue our investigations in this area.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The discussion seems to imply that the ball-and-chain peptide is or is related to the common gate. (Although it isn't stated explicitly, it is implied based on the presentation of the gating model in Figure 8 immediately after the discussion of common gating, and the simultaneous opening of both pores in Figure 8). What does the asymmetric structure say about the relationship between the N-term peptide and common gating in ClC-2? It seems like this structure suggests that the CTDs can independently rotate, and independently bind N-terminal peptide, which might not be expected to impact both pores. Some additional clarification and/or discussion of these ideas could be helpful here.

      We thank the reviewer for raising these very important points. We agree we should have been more explicit and have now expanded our discussion on this topic, highlighting the independent movement of the N-term peptide and CTDs and clarifying that it is currently unknown whether CLC-2 has a common gate (lines 431484).

      Discussion of "Revised Framework for CLC-2 gating": I think this would be a little easier to follow if most of the legend from Figure 8 was in the main text at the end of that section. Also, additional labels in Figure 8 (of the glutamates, the N-terminal peptide, and what the CTD arrows represent).

      We have revised this section of the text and added labels to the (revised) Figure as suggested.

      Line 261: typo, misspelling of "hydrogen"

      Fixed. (Now line 279.)

      Figure 6 - supplement 2B: Looks like an error in numbering y-axis - should be 90/120/150, I think. Can you show the three data points for the WT initial current rectification? Can you clarify whether the 3 that you are analyzing are the ones where AK42 the AK42 "zero current" level is not more than the initial positive current?

      We apologize for this error, which arose from the Y-axis label overlapping the tick labels, so 90/120/150 showed as 90/20/50. We have fixed this error and have added a new panel (C) to show three data points for the WT initial current rectification. In the Figure legend to panel C, we clarify that the 3 experiments we analyzed are the ones where the AK-42 current level is not more than the initial current at 80 mV.

      Reviewer #2 (Recommendations For The Authors):

      1. It appears from a close inspection of Figure 2 that the TM dimer is not quite symmetric, but I couldn't tell for sure from the figures as presented. No comment is made in the methods about symmetry imposed, and the authors explicitly comment on asymmetry in the cytoplasmic domain. It would be useful to have an explicit discussion of the TM dimer symmetry.

      We have now explicitly stated that the TM dimer is symmetric, and we have clarified the wording in the Methods:

      Main text, line 81: "The TM region of CLC-2 displays a typical CLC family symmetric homodimeric structure, with each subunit containing an independent Cl– pathway (Figure 2A, B)."

      Methods (lines 557-558): "The following ab initio reconstruction and 3D refinement (for all structures presented in this paper) were performed with C1 symmetry (no symmetry imposed)."

      1. For the simulations in Figure 5 Supplement 2, the N terminus flexibility is shown, but this of course can't be compared to a control. However, given the structural results, one might expect the JK helix to show changes in flexibility/mobility in the apo vs inactivated structures. Is this observed?

      We agree that the structures strongly suggest the JK-helix is not as stable without the N-terminus bound. We did not perform comparative simulations on the JK helix in the apo vs inactivated structures. While we agree this could be of interest, we don’t think it is essential to our conclusions, and the simulations might need to be quite long to adequately capture dynamics of the JK helix. [In the simulation results shown in Figure 5 Supplement 2, our aim was to test the validity of the structure by determining whether the N-terminus remains bound to the channel in simulations. The plot shows that the N-terminus stays in the same binding pose with an average RMSD (to the initial structure) of less than 2 angstroms, which is generally considered to be relatively stable.]

      1. I find the section "revised framework for ClC-2 gating" to be wanting. The ideas are illustrated in the cartoon, but should also be laid out in the text. In what ways are you revising the framework, and in what aspects are you carrying through ideas already proposed?

      Thank you for raising this point, which was also raised by Reviewer 1. We have revised this section and the accompanying Figure (Figure 8 and Lines 431-484).

      1. The authors mention in passing the idea that the hairpin could contribute to inward rectification (lines 227/8), but also suggest a role for the gating glutamate in this process. They also mention the idea of a common gate, but don't flesh out its function very much. These possibilities are very interesting and should be substantially fleshed out in the "framework" section, even if they cannot be fully answered yet.

      We have expanded on these points in the “framework” section.

      1. Figure 6E. points representing individual experiments should be shown.

      We added points representing individual experiments for Delta N (normalized to WT) in the surface-expression experiments in Figure 6E. Individual data points for the electrophysiology experiments are in panel C; we did not replot these in panel E because some of the points would have been off scale.

      1. The density in Figure 2A is hard to see, is there a better way to display it? Also, the orientation of the rightmost panel in Figure 2C is difficult to interpret.

      We revised 2A to make the density easier to see. We revised Figure 2C so that the middle and rightmost panels have the same orientation.

      1. P6. Line 87. This sentence is a little confusing, and perhaps could be a little clearer-the density is consistent with a Cl- ion, but no experiments have been done to support this, no?

      We have clarified the wording as suggested (now line 89) and added references supporting Clˉ binding to the Sext site in CLCs (line 90).

      1. P6 lines 89-98. Two lines of evidence, the conformation of the gate and the pinch point, both point to the structure representing a closed state. The wording as presented is a little hard to follow.

      We have revised the wording in this paragraph (lines 92-111)

      1. It's hard to distinguish water protons and oxygens in the lower right panel (QQQ).

      We revised this panel (in Figure 3 – figure supplement 2) to better distinguish the water protons and oxygens.

      Reviewer #3 (Recommendations For The Authors):

      A few points to consider for improving the manuscript

      1. It is intriguing that in the AK-42 structure, there is no density for the hairpin loop even though the CTD is in a symmetrical conformation as the apo. The authors could perhaps comment on whether there is any difference in the rectification properties of currents (or run-up) upon unblocking of AK-42 which may suggest that the hairpin binding is prevented by AK-42.

      We have not yet performed the suggested experiment nor any experiments to examine state-dependence, though we agree such experiments would be informative. We have added a note on this point in the discussion, lines 334-337.

      1. Although the conformation-dependent placement of the hairpin loop is convincing based on the density, the sequence assigned to this region is not conclusive.

      To strengthen our conclusion concerning the hairpin assignment, we investigated fits of peptide segments from the disordered sections of the C-terminal cytoplasmic domain to the hairpin density. We found that these fits are not as good as that with the N-terminal peptide. This analysis is described in lines 179-181 and a new figure (Figure 5 – figure supplement 1). We appreciate the reviewer’s point that it is extremely difficult to conclusively assign residues that are not contiguous with the rest of the structure. Nevertheless, given the wide variety of evidence all pointing to the conclusion that the hairpin loop corresponds to residues 14-28, we think the assignment is on strong footing. We respectfully ask that you consider removing this criticism from the public review, as we think it will hinder the casual reader from recognizing the strength of the evidence: (1) of unresolved regions in CLC-2, residues 14-28 fit best; (2) residues 14-28 were previously identified as part of the ball blocking region (lines 158-161); (3) MD simulations support that the N-terminal residues stay stably bound (Figure 5 – figure supplement 4) (4) gain-of-function disease causing mutations map onto either the Nterminal residues or interacting residues on the TM domain (Figure 5 – figure supplement 6). Thank you for considering this request.

      1. The authors should comment on the physiological relevance of the CBS domain rearrangements during gating.

      We have added this sentence (lines 131-133): “The physiological relevance of C-terminal domain rearrangements is suggested by disease-causing mutations that alter channel gating (Estevez et al., 2004; Brenes et al., 2023).”

      1. For the figures with cryo-EM maps, indicate the contour levels.

      Contour levels are now indicated in the Figure legends.

      1. It will be useful to the electrostatic map of the N-terminal peptide and the docking site.

      This is now shown in Figure 5 – figure supplement 3 and Video 5.

      1. Include a comment on the recent CLC-2 /AK-42 structure and if there are any differences in the structural features.

      We added this text to lines 273-274: “The RMSD between our CLC2-TM-AK42 structure and that of Ma et al. is 0.655 Å, and the RMSD between the apo TM structures is 0.756 Å.”

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The paper contains some useful analysis of existing data but there are concerns regarding the conclusion that there might be alternative mechanisms for determining the location of origins of DNA replication in human cells compared to the well known mechanism known from many eukaryotic systems, including yeast, Xenopus, C. elegans and Drosophila. The lack of overlap between binding sites for ORC1 and ORC2, which are known to form a complex in human cells, is a particular concern and points to the evidence for the accurate localization of their binding sites in the genome being incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the best genetically and biochemically understood model of eukaryotic DNA replication, the budding yeast, Saccharomyces cerevisiae, the genomic locations at which DNA replication initiates are determined by a specific sequence motif. These motifs, or ARS elements, are bound by the origin recognition complex (ORC). ORC is required for loading of the initially inactive MCM helicase during origin licensing in G1. In human cells, ORC does not have a specific sequence binding domain and origin specification is not specified by a defined motif. There have thus been great efforts over many years to try to understand the determinants of DNA replication initiation in human cells using a variety of approaches, which have gradually become more refined over time.

      In this manuscript Tian et al. combine data from multiple previous studies using a range of techniques for identifying sites of replication initiation to identify conserved features of replication origins and to examine the relationship between origins and sites of ORC binding in the human genome. The authors identify a) conserved features of replication origins e.g. association with GC-rich sequences, open chromatin, promoters and CTCF binding sites. These associations have already been described in multiple earlier studies. They also examine the relationship of their determined origins and ORC binding sites and conclude that there is no relationship between sites of ORC binding and DNA replication initiation. While the conclusions concerning genomic features of origins are not novel, if true, a clear lack of colocalization of ORC and origins would be a striking finding. However, the majority of the datasets used do not report replication origins, but rather broad zones in which replication origins fire. Rather than refining the localisation of origins, the approach of combining diverse methods that monitor different objects related to DNA replication leads to a base dataset that is highly flawed and cannot support the conclusions that are drawn, as explained in more detail below.

      Response: We are using the narrowly defined SNS-seq peaks as the gold standard origins and making sure to focus in on those that fall within the initiation zones defined by other methods. The objective is to make a list of the most reproducible origins. Unlike what the reviewer states, this actually refines the dataset to focus on the SNS origins that have also been reproduced by the other methods in multiple cell lines. We have changed the last box of Fig. 1A to make this clearer: Shared origins = reproducible SNS-seq origins that are contained in initiation zones defined by Repli-seq, OK-seq and Bubble-seq. This and the Fig. 2B (as it is) will make our strategy clearer.

      Methods to determine sites at which DNA replication is initiated can be divided into two groups based on the genomic resolution at which they operate. Techniques such as bubble-seq, ok-seq can localise zones of replication initiation in the range ~50kb. Such zones may contain many replication origins. Conversely, techniques such as SNS-seq and ini-seq can localise replication origins down to less than 1kb. Indeed, the application of these different approaches has led to a degree of controversy in the field about whether human replication does indeed initiate at discrete sites (origins), or whether it initiates randomly in large zones with no recurrent sites being used. However, more recent work has shown that elements of both models are correct i.e. there are recurrent and efficient sites of replication initiation in the human genome, but these tend to be clustered and correspond to the demonstrated initiation zones (Guilbaud et al., 2022).

      These different scales and methodologies are important when considering the approach of Tian et al. The premise that combining all available data from five techniques will increase accuracy and confidence in identifying the most important origins is flawed for two principal reasons. First, as noted above, of the different techniques combined in this manuscript, only SNS-seq can actually identify origins rather than initiation zones. It is the former that matters when comparing sites of ORC binding with replication origin sites, if a conclusion is to be drawn that the two do not co-localise.

      Response: We agree. So the reviewer should agree that our method of finding SNS-seq peaks that fall within initiation zones actually refines the origins to find the most reproducible origins. We are not losing the spatial precision of the SNS-seq peaks.

      Second, the authors give equal weight to all datasets. Certainly, in the case of SNS-seq, this is not appropriate. The technique has evolved over the years and some earlier versions have significantly different technical designs that may impact the reliability and/or resolution of the results e.g. in Foulk et al. (Foulk et al., 2015), lambda exonuclease was added to single stranded DNA from a total genomic preparation rather than purified nascent strands), which may lead to significantly different digestion patterns (ie underdigestion). Curiously, the authors do not make the best use of the largest SNS-seq dataset (Akerman et al., 2020) by ignoring these authors separation of core and stochastic origins. By blending all data together any separation of signal and noise is lost. Further, I am surprised that the authors have chosen not to use data and analysis from a recent study that provides subsets of the most highly used and efficient origins in the human genome, at high resolution (Guilbaud et al., 2022).

      Response: 1) We are using the data from Akerman et al., 2020: Dataset GSE128477 in Supplemental Table 1. We have now separately examined the core origins defined by the authors to check its overlap with ORC binding (Supplementary Fig. S8b)

      2) To take into account the refinement of the SNS-seq methods through the years, we actually included in our study only those SNS-seq studies after 2018, well after the lambda exonuclease method was introduced. Indeed, all 66 of SNS-seq datasets we used were obtained after the lambda exonuclease digestion step. To reiterate, we recognize that there may be many false positives in the individual origin mapping datasets. Our focus is on the True positives, the SNS-seq peaks that have some support from multiple SNS-seq studies AND fall within the initiation zones defined by the independent means of origin mapping (described in Fig. 1A and 2B). These True positives are most likely to be real and reproducible origins and should be expected to be near ORC binding sites.

      We have changed the last box of Fig. 1A to make this clearer: Shared origins = reproducible SNS-seq origins that are contained in initiation zones defined by Repli-seq, OK-seq or Bubble-seq.

      Ini-seq by Torsten Krude and co-workers (Guillbaud, 2022) does NOT use Lambda exonuclease digestion. So using Ini-seq defined origins is at odds with the suggestion above that we focus only on SNS-seq datasets that use Lambda exonuclease. However, Ini-seq identifies a much smaller subset of SNS-seq origins, so, as requested, we have also done the analysis with just that smaller set of origins, and it does show a better proximity to ORC binding sites, though even then the ORC proximate origins account for only 30% of the Ini-seq2 origins (Supplementary Fig. S8d). Note Ini-seq2 identifies DNA replication initiation sites seen in vitro on isolated nuclei.

      Update in response to authors' comments on the original review:

      While the authors have clarified their approach to some aspects of their analysis, I believe they and I are just going to have to disagree about the methodology and conclusions of this work. I do not find the authors responses sufficiently compelling to change my mind about the significance of the study or veracity of the conclusions. In my opinion, the method for identification of strong origins is not robust and of insufficient resolution. In addition, the resolution and the overlap of the MCM Chip-seq datasets is poor. While the conclusion of the paper would indeed be striking and surprising if true, I am not at all persuaded that it is based on the presented data.

      Reviewer #2 (Public Review):

      Tian et al. performed a meta-analysis of 113 genome-wide origin profile datasets in humans to assess the reproducibility of experimental techniques and shared genomics features of origins. Techniques to map DNA replication sites have quickly evolved over the last decade, yet little is known about how these methods fare against each other (pros and cons), nor how consistent their maps are. The authors show that high-confidence origins recapitulate several known features of origins (e.g., correspondence with open chromatin, overlap with transcriptional promoters, CTCF binding sites). However, surprisingly, they find little overlap between ORC/MCM binding sites and origin locations.

      Overall, this meta-analysis provides the field with a good assessment of the current state of experimental techniques and their reproducibility, but I am worried about: (a) whether we've learned any new biology from this analysis; (b) how binding sites and origin locations can be so mismatched, in light of numerous studies that suggest otherwise; and (c) some methodological details described below.

      • I understand better the inclusion/exclusion logic for the samples. But I'm still not sure about the fragments. As the authors wrote, there is both noise and stochasticity; the former is not important but the latter is essential to include. How can these two be differentiated, and what may be the expected overlap as a function of different stochasticity rates?

      It is difficult to separate the effect of noise from the effect of stochastic firing of origins. We therefore took the simplest approach: focus only on the most reproducible origins (shared origins) and ignore the non-reproducible origins. At least the most reproducible origins can be used to test the hypotheses regarding origin firing.

      • Many of the major genomic features analyzed have already been found to be associated with origin sites. For example, the correspondence with TSS has been reported before:

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6320713/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547456/

      • Line 250: The most surprising finding is that there is little overlap between ORC/MCM binding sites and origin locations. The authors speculate that the overlap between ORC1 and ORC2 could be low because they come from different cell types. Equally concerning is the lack of overlap with MCM. If true, these are potentially major discoveries that butts heads with numerous other studies that have suggested otherwise.

      The key missing dataset is ORC1 and ORC2 CHiP-seq from the same cell type. This shouldn't be too expensive to perform, and I hope someone performs this test soon. Without this, I remain on the fence about how much existing datasets are "junk" vs how much the prevailing hypothesis about replication needs to be revisited. Nonetheless, the authors do perform a nice analysis showing that existing techniques should be carefully used and interpreted.

      We agree that a thorough set of ChIP-seq data (with multiple antibodies or with equivalent techniques that do not use antibodies) for all six subunits of ORC in mammalian cells will be very useful for the field. Note, though, that just by simple cell lysis, it is very easy to divide human ORC into at least three different parts: ORC1, ORC2-5, and ORC6. The subunits do not form as robust a complex as seen in the yeasts and in flies.

      Reviewer #3 (Public Review):

      Summary: The authors present a thought-provoking and comprehensive re-analysis of previously published human cell genomics data that seeks to understand the relationship between the sites where the Origin Recognition Complex (ORC) binds chromatin, where the replicative helicase (Mcm2-7) is loaded, and where DNA replication actually beings (origins). The view that these should coincide is influenced by studies in yeast where ORC binds site-specifically to dedicated nucleosome-free origins where Mcm2-7 can be loaded and remains stably positioned for subsequent replication initiation. However, this is most certainly not the case in metazoans where it has already been reported that chromatin bindings sites of ORC and Mcm2-7 do not necessarily overlap, nor do they always overlap with origins. This is likely due to Mcm2-7 possessing linear mobility on DNA (i.e., it can slide) such that other chromatin-contextualized processes can displace it from the site in which it was originally loaded. Additionally, Mcm2-7 is loaded in excess and thus only a fraction of Mcm2-7 would be predicted to coincide with replication start sites. This study reaches a very similar conclusion of these previous studies: they find a high degree of discordance between ORC, Mcm2-7, and origin positions in human cells.

      Strengths: The strength of this work is its comprehensive and unbiased analysis of all relevant genomics datasets. To my knowledge, this is the first attempt to integrate these observations. It also is an important cautionary tale to not confuse replication factor binding sites with the genomic loci where replication actually begins, although this point is already widely appreciated in the field. Response: Thank you for recognizing the comprehensive and unbiased nature of our analysis. Our findings will prevent the unwise adoption of ORC or MCM binding sites as surrogate markers of origins and will stimulate the field to try and improve methods of identifying ORC or MCM binding until the binding sites are found to be proximal to the most reproducible origins. The last possibility is that there are ORC- or MCM-independent modes of defining origins, but we have no evidence of that.

      Weaknesses: The major weakness of this paper is the lack of novel biological insight and that the comprehensive approach taken failed to provide any additional mechanistic insight regarding how and why ORC, Mcm2-7, and origin sites are selected or why they may not coincide.

      Response: we agree that we cannot provide a novel biological insight from this kind of meta-analysis. The importance of this study is in highlighting that there is either significant problems with the data collected till now (preventing the co-localization of ORC or MCM binding sites with the most reproducible origins) or ORC and MCM binding sites are often far away from where the most reproducible origins fire, which should make us consider ways in which origins could be activated kilobases away from ORC and MCM binding sites.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      All suggestions and recommendations were described in a previous review.

      Reviewer #3 (Recommendations For The Authors):

      The most significant omission is a contextualization of the results in the discussion and an explanation of why these results matter for the biology of replication, disease, and/or our confidence in the genomic techniques reported on in this study. As written, the discussion simply restates the results without any interpretation towards novel insight. I suggest that the authors revise their discussion to fill this important gap.

      A second important, unresolved point is whether replication origins identified by the various methods differ due to technical reasons or because different cell types were analyzed. Given the correlation between TSS and origins (reported in this study but many others too), it is somewhat expected that origins will differ between cell types as each will have a distinct transcriptional program. This critique is partly addressed in Figure S1C. However, given the conclusion that the techniques are only rarely in agreement (only 0.27% origins reproducibly detected by the four techniques), a more in-depth analysis of cell type specific data is warranted. Specifically, I would suggest that cell type-specific data be reported wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets. This type of analysis may also inform on whether one or more techniques produces the highest (or lowest) quality list of true origins.

      We have done what has been suggested: used K562 cell type-specific data because here the origins have been defined by at least two methods in the same cell type and reported the percent of shared origins amongst the datasets (Supp. Fig. S4).

      Other MINOR comments include:

      • Line 215: the authors show that shared origins overlap with TF binding hotspots more often than union origins, which they claim suggests "that they are more likely to interact with transcription factors." As written, it sounds like the authors are proposing that ORC may have some direct physical interaction with transcription factors. Is this intended? If so, what support is there for this claim?

      The reviewer is correct. We have rephrased because we have no experimental support for this claim.

      • In the text, Figure 3G is discussed before Figure 3F. I suggest switching the order of these panels in Figure 3.

      Done.

      • It's not clear what Figure 5H to Figure 6 accomplishes. What specifically is added to the story by including these data? Is there something unique about the high confidence origins? If there is nothing noteworthy, I would suggest removing these data.

      We want to keep them to highlight the small number of origins that meet the hypothesis that ORC and MCM must bind at or near reproducible origins. These would be the origins that the field can focus in on for testing the hypothesis rigorously. They also show the danger of evaluating proximity between ORC or MCM binding sites with origins based on a few browser shots. If we only showed this figure, we could conclude that ORC and MCM binding sites are very close to reproducible origins.

      • Line 394: "Since ORC is an early factor for initiating DNA replication, we expected that shared human origins will be proximate to the reproducible ORC binding sites." This is only expected if one disbelieves the prior literature that shows that ORC and origins are not, in many cases, proximal. This statement should be revised, or the previous literature should be cited, and an explanation provided about why this prior work may have missed the mark.

      We do not know of any genome-wide study in mammalian cell lines where ORC binding sites and MCM binding have been compared to highly reproducible origins, or that show that these binding sites and highly reproducible origins are mostly not proximal to each other. Most studies cherry pick a few origins and show by ChIP-PCR that ORC and/or MCM bind near those sites. Alternatively, studies sometimes show a selected browser shot, without a quantitative measure of the overlap genome wide and without doing a permutation test to determine if the observed overlap or proximity is higher than what would be expected at random with similar numbers of sites of similar lengths. In the revised manuscript we have discussed Dellino, 2013; Kirstein, 2021; Wang, 2017; Mas, 2023. None of them have addressed what we are addressing, is the small subset of the most reproducible origins proximal to ORC or MCM binding sites?

      • Line 402-404: given the lack of agreement between ORC binding sites and origins the authors suggest as an explanation that "MCM2-7 loaded at the ORC binding sites move much further away to initiate origins far from the ORC binding sites, or that there are as yet unexplored mechanisms of origin specification in human cancer cells". The first part of this statement has been shown to be true (Mcm2-7 movement) and should be cited. But what do the authors mean by the second suggestion of "unexplored mechanisms"? Please expand.

      We have addressed this point in the revised manuscript.

      • The authors should better reference and discuss the previous literature that relates to their work, some of these include Gros et al., 2015 Mol Cell, Powell et al., 2015 EMBO J, Miotto et al., 2016 PNAS, but likely there are many others.

      We have addressed this point in the revised manuscript.

      Note for authors:

      Line 107: The introduction discusses the mechanism for yeast ORC recognizes specific origins and discusses the Orc4 contribution, but it is known that Orc2 also binds DNA on a base-specific manner (see PMID 33056978). Thus Lee et al. did not "humanize ORC" as stated.

      Done

      Lines 117-119: Two of the cited papers are on endo-reduplication and not on initiation in a normal cell cycle and this should be pointed out. Second, there is contradictory evidence that ORC is essential in human cells and this should be cited (PMID 33522487)

      Done

    1. Author Response

      The following is the authors’ response to the original reviews.

      Based on the reviewer comments (see below) and subsequent discussion between the reviewers and the Reviewing Editor, I would like to invite the authors to make major revisions, including new experiments. However, if major new experiments are not feasible, as may be the case, then at a minimum, I would urge the authors to:

      1. Tone down the language regarding a causative role for changes in GH/IGF-I signaling in mediating the effects of Tmem63 on the skeleton, and also be very open in acknowledging the lack of mechanistic insight into how Tmem regulates GH signaling.

      Response: We toned down the language as suggested and also acknowledged the lack of mechanistic insights into how Tmem263 regulates GH signaling.

      1. Revise/redo or if not possible, then delete the problematic experiment in Fig. 5E.

      Response: We have included additional Western blot data in Figure 5 from control WT and KO male mice without exogenous GH injection. In the absence of GH injection, we could not detect Jak2 and Stat5 phosphorylation in the liver of male WT and KO mice.

      1. Address the comments about liver feminization.

      Response: We have performed additional analysis as suggested by reviewer # 3. We have now included additional data to address the issue of liver feminization (new Fig. 6G-I and Figure 6-figure supplement 1). We plan to expand on this very topic in future studies as this is an interesting transcriptional phenomenon.

      1. Revise the manuscript to address as many of the recommendations for the authors as possible, many of which can be addressed by textual edits. Response: We have addressed as many of the textual changes as suggested in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      TMEM263 has been suggested to be associated with bone mineral density and growth in humans and mice, but the functional role of this transmembrane protein in the regulation of bone metabolism is unknown. With the knockout mouse approach, this manuscript demonstrates that Tmem263 is essential for longitudinal bone growth in the mouse as deletion of Tmem263 in knockout (KO) mice developed severe postnatal growth impairment and proportional dwarfism. It is determined that the dwarfism was caused by a substantial reduction in liver expression of growth hormone receptor (GHR), a slight increase in serum GH, and a reduction in serum IGF-I, which resulted in disruptive of GH/IGF-I regulatory axis of endochondral bone formation.

      The study was relatively well designed, and the results in general are supportive of the conclusions. While this study discloses new and intriguing functional information about a novel cytoplasmic membrane gene, there are a few minor issues that the authors may wish to address. These issues are listed in the following:

      1. One of the intriguing findings of this manuscript is that deletion of a gene encoding a small cytoplasmic membrane protein could cause a substantial reduction in the expression and protein levels of GHR. Inasmuch as a couple of potential explanations were offered in the Discussion section (first complete paragraph of page 10), there has been no attempt to test any of the suggested causes, since many of these potential mechanisms can readily be tested experimentally. Accordingly, the lack of mechanistic investigation into this intriguing effect renders the manuscript largely descriptive in nature.

      Response: The point made by the reviewer is well taken. We do plan to have follow up studies to establish which among the mechanisms we highlighted in the discussion is contributing to the reduction in GHR transcript and protein level. Our present study is the first functional characterization of this enigmatic novel membrane protein. We anticipate that multiple follow-up studies are needed to gain a deeper understanding of the biology of Tmem263. We believe that our present study represents an important first step.

      1. Because a major conclusion is that the bone phenotype of Tmem263 KO mice was caused by deficient hepatic expression and/or action of GHR, it would be helpful to (or strengthen) the conclusion if a brief comparison of the bone phenotype between GHR KO mice and Tmem263 KO mice is included in the Discussion section.

      Response: We have now included this information in the revised manuscript.

      1. In Figure 3, the cortical bone parameters (i.e., Tt.Ar, Ct.Ar, and Ct.Th), but none of the trabecular bone parameters (i.e., BV/TV, Tb.N, Tb.Th), were normalized against femur length. The authors did not provide a rationale for this differential treatment with the cortical bone parameters from the trabecular bone parameters. If the reason to normalize the cortical bone parameters against bone length was to demonstrate that the reduced cortical bone mass in mutants was related to the impaired longitudinal bone growth, then why did the authors not also assess whether the observed reduction in these trabecular bone parameters in KO mutants was proportional to reduced longitudinal bone growth?

      Response: We actually made the exact adjustments that the reviewer refers to, as stated in the methods section. Please see page 14. The regions of interest (ROIs) of both the trabecular bone analysis and the cortical analysis in the mutants was reduced proportional to the length of the bone (40% smaller). The normalization to Tt Ar to femur length in Figure 3I was only meant to show that the reduction in Tt Ar in the mutants was proportional. We have modified the text in our result section for clarity.

      1. Elements described in Fig. 5A have been well documented. Therefore, Fig. 5A is unnecessary and can be deleted.

      Response: We felt that Figure 5A should remain. It helps orient readers that are not familiar with the literature to be aware that both liver- and bone-derived IGF-1 contribute to longitudinal bone growth.

      1. Figure 6 was performed with male KO mice. Were the altered gene expression profiles in female KO mice any different from male KO mice?

      Response: We plan to perform RNA-seq in female mouse liver in our follow-up studies. We do not know, at present, whether and to what extent the liver transcriptomic profile would be different between male and female KO mice. As far as dwarfism and deficiency in skeletal acquisition, both male and female KO mice showed the same phenotypes.

      1. The number of animals (or samples) per group in some of the Figures (i.e., Fig. 2G, 2I, 2J, 3A to J, the entire Fig. 4, 5D, 5F, and Suppl Fig. 1) is needed to be provided in the legends.

      Response: We have included this information in the figure legends.

      Reviewer #3 (Recommendations for The Authors):

      1. Explain the discrepancy between the impact of KO on serum Igfbp3 (= decreased) vs. hepatic Igfbp3 (= unchanged).

      Response: We do not have a plausible mechanism, at present, that can explain the reduction in circulating serum Igfbp3 level without an apparent reduction in Igfbp3 transcript level in the liver. In human studies, typically only serum IGFBP3 levels are measured but not the hepatic IGFBP3 transcript level. Therefore, it is unclear whether the circulating levels of IGFBP3 is being regulated at the posttranscriptional level, an issue that can be explored in future studies.

      1. Line 215, 221, and elsewhere - Foxa1 does not show significant male-biased expression in mouse liver.

      Response: We have removed Foxa1 from the text.

      1. Line 225- According to the abstract of Ref. #45, Cux2 regulates a subset of sex-biased genes in the liver. The authors should compare the genes dysregulated by TMEM263-KO (Fig. 6) to those altered by Cux2 loss (Ref. #45) to ascertain whether the results of Fig. 6 are partially or entirely explained by Cux2 overexpression.

      Response: We agree that this is a great area of future study. We do feel this, however, would be better explored in a more in-depth follow-up article. We felt, given the current direction of the paper it made more sense to include differential expression comparisons of male vs female, hypophysectomized vs sham control, and Stat5b-KO vs WT mouse liver gene expression data. Our future work will explore the transcriptomes of male and female WT and Tmem263-KO liver gene expression in the context of the observed physiology.

      1. Line 262- "lower transcription of Ghr gene". A decrease in mRNA levels does NOT equate with a decrease in transcription per se. Altered mRNA splicing, poly A, export, cytoplasmic stability, etc. are all potential contributors.

      Response: We have included these possibilities highlighted by the reviewer in our revised Discussion section.

      1. Line 273, "TMEM263... most highly expressed in liver" Not correct - see Fig. 1C for TMEM263 RNA levels in mouse tissues.

      Response: We have corrected the text on page 11.

      1. Line 425 - Include GEO accession number.

      Response: We have already uploaded our RNA-seq data to the NCBI Sequence Read Archive (SRA), and the data can be accessed under accession number # PRJNA938158.

      1. Fig. 6 - Line 796 - Specify the age and sex of mice analyzed.

      Response: We have included the information in the revised figure 6 legend.

      1. Fig.2 - Suppl 1- Specify age of mice.

      Response: We have included the information in the revised Figure 2-figure supplement 2.

      1. Fig.2G -Specify the sex of the mice.

      Response: For the P1 to P21 pups’ data, we did not separate by sex, as gender determination of pups at P1 and P7 can be challenging. We now indicated this in the figure legend.

      1. Fig. 6A and 6C-6F: Which of these genes shows sex-dependent expression in wild-type liver? Use color to highlight gene names for genes that show male-biased or female-biased expression.

      Response: We agree with the reviewer that additional labels on Figure 6A and 6C-F would be helpful to show genes of sex-bias. However, this is not the primary point of the paper. This topic deserves a much more in-depth analysis in follow up studies focused on defining the exact type and degree of transcript feminization in the liver of Tmem263-KO mice, as well as, its physiologic consequences. For readers interested in this topic, we have included the subfigures G-I in Figure 6 and for greater transcript level detail, figure 6 supplement 1.